Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

C# & Perl


Oct01: C# & Perl

Talbott is an instructor teaching Java and Perl for Sun Microsystems, and Java and other languages for Boston University's Corporate Education Center. He can be contacted at [email protected].


Although C# has evolved from C/C++ very much like Java, it also borrows from languages such as Delphi, Pascal, Visual Basic, and Smalltalk. Interestingly, C# also has facilities familiar to Perl, including join and split, foreach loops, regular expressions, and the like. In this article, I'll focus on common Perl scripts that you can implement in C#. In the process, I will also discuss the difference between instance and static methods in C#.

Most of the examples in this article relate to C#'s use of the .NET Framework. C# was designed to heavily depend on the .NET Framework, a universal type library similar to the standard Perl modules or the Java class libraries. The main difference between them is that these classes are accessible to every language on the .NET platform (VB, C++, JScript, Perl, and so on). Consequently, it is not exactly C# that is similar to Perl, but rather the .NET Framework — which C# relies on — that provides many seemingly common facilities.

Join

For instance, Join is a Perl command used to join elements of an array with a delimiter. Listing One (a) is a tab-delimited string of fruit using the built-in Perl join function. The key section of code is the command to join the fruit array (@fruit) using a tab ("\t") between each element of the array: join("\t", @fruit).

In C#, the join function is actually a static method of the string class: string.Join("\t", fruit). Because C# is a strongly typed language, you must declare your variable of string or string array type; see Listing One (b).

Split

Split can be viewed as the opposite of join. Split accepts a single string and returns an array of elements separated by a delimiter. Split is also a built-in Perl function, but in C# it is a string instance method. In Listing Two (a), I use a string with tabs called beverages to create an array of drinks. The key section of code is the command to split the beverage string ($beverages) using a tab ("\t") as the delimiter: split("\t", $beverages).

In C#, the split function is in fact an instance method of every string object. Since the split function belongs to the object, I call it directly from the item that I want to split: beverages.Split(separator). The separator must be a character array, so you can declare the character array earlier or you can create it on-the-fly. The latter would look like this: beverages.Split(new char[] {'\t'}).

We use the foreach construct to print each item in the array. Listing Two (b) is C# code to replace the Perl code.

One difference between Perl and C# is that while Perl has built-in functions, C# does not. C# uses the .NET Framework's definition of a string to do the splitting and joining. Because of this, the C# language is separated from the utility functions. Another item to be aware of when learning C# is the difference between a static method and an instance method. Join is a static method, whereas Split is an instance method. They both come from the .NET Framework String class.

Static versus Instance

The difference between static and instance methods is where they exist conceptually — not physically. Physically, both are defined in the same class definition. Conceptually, instance methods belong to or are encapsulated inside the string object (instance), and static methods belong to or are encapsulated inside the String class. What this means is that the object reference has no access to the shared static method. In C#, you must use the class name to invoke a static method (unlike Java, which allows an instance reference to invoke static methods). Listing Three demonstrates the conceptual distinction between an instance and static method.

In Listing Three (a), you must recognize that asking the greeting string "hello" to join two unrelated objects (the string \t and a string array of fruit) is silly and distracting. By forcing you to ask the String class to join the two objects, the code becomes more readable and creates a mental divide between belonging to the class (static member) and belonging to the object (instance member).

Split, on the other hand, is an instance method. So, conceptually, every string object has the ability to split itself and return an array. In C#, you never ask the String class to split up a string, instead you ask any string object to split up its own private data; see Listing Three (b).

Foreach

The foreach keyword is a built-in construct of both Perl and C# that operates in a similar fashion. Listing Four (a) uses the foreach construct in Perl to loop through an array. The local variable $drink contains each item from the array and prints each drink in a separate iteration of the loop. This foreach loop invokes the print function on four separate iterations. The $drink variable is, in fact, a reference pointer to the array and moves forward one element per iteration.

In C#, you use a string reference to refer to the elements in the array; see Listing Four (b). Just like Perl, the variable drink points to an element in the array and moves forward through the array at each iteration. Because C# is strongly typed, you must declare the data type of the local variable drink as a string. The foreach is a bit more powerful in C# because not only can it iterate through arrays, but it can also iterate through objects that implement the IEnumerable interface, such as the various collection classes Queue, Stack, and Hashtable.

Regular Expressions

Both Perl and C# have regular expression facilities. Regular expressions (regex) are built into the Perl syntax, but C# uses the System.Text.RegularExpressions namespace from the .NET Framework. Some .NET classes from this namespace are: Regex, Match, Capture, and Group. Here, I'll focus on the Regex class.

Unfortunately, C# is not as terse as Perl because regular expressions are not part of the syntax, but rather a group of classes. Listing Five (a) is a Perl script that uses regular expressions. The key section of code here is the regular expression /ee/ following the match operator =~ (equals, tilde), which returns True if there is a match between the expression ee and the string $drink: if $drink =~ /ee/.

Basically, Perl's if modifier conditionally executes the statement preceding it if the result of the clause containing the regular expression is True. In this case, that's if two e characters appear in sequence together in the $drink string. This should match on "root beer" and "coffee". The syntax for regular expressions in Perl is simply a forward slash followed by characters, and regex metacharacters terminated by a slash. This expression example only uses characters, a similar example using metacharacters might be: if $drink =~ /e../. The dot (period) is a regex metacharacter that represents a wildcard for any single character. Therefore, this would match any e character followed by any two characters. This clause would return True for "orange juice" and "root beer". It would not match "coffee" because the first "e" in coffee is only followed by one character and zero characters follow the second "e."

Look at the same code in C#. First you want to have a using statement before the class declaration:

using System.Text.RegularExpressions;

next, you instantiate the Regex class:

Regex regex = new Regex("ee");

Finally, you check for a match by calling an instance method on our regex object asking it the True/False question is there a match between the expression and the drink:

regex.IsMatch(drink)

Listing Five (b) is C# code to replace the equivalent Perl in Listing Five (a). In Perl, the special variable $_ can be used in place of $drink. The $_ variable is implicitly used by regular expressions, the print function, and foreach loops to make the Perl source code even more terse; see Listing Five (c).

As you can see, C# is more verbose than Perl, but the regex functionality makes C# a nice language to resolve many problems Perl is also suited to solve. Another note about the IsMatch method of the Regex class in the .NET Framework is that there are two versions — the instance method and the static method. If you don't want to instantiate a Regex object, you could use the static method as in Listing Six.

DDJ

Listing One

(a)

# Perl@
fruit = ("apples", "pears", "bananas", "oranges");
# turn the array into one scalar string
$line = join("\t", @fruit);
print $line, "\n";

<b>(b)</b>
<pre>// C#
string[] fruit = {"apples", "pears", "bananas", "oranges"};
// turn the array into a single string
string line = String.Join("\t", fruit);
Console.WriteLine(line);

Back to Article

Listing Two

(a)

# Perl
$beverages = "orange juice\troot beer\tcoffee\twater";
# turn a string into an array@
drinks = split("\t", $beverages);
foreach $drink (@drinks) {
   print $drink, "\n";
}

<b>(b)</b>
<pre>// C#
string beverages = "orange juice\troot beer\tcoffee\twater";
char[] separator = {'\t'};
// turn a string into an array
string[] drinks = beverages.Split(separator);
foreach (string drink in drinks) {
   Console.WriteLine(drink);
}

Back to Article

Listing Three

(a)

// C#
string greeting = "hello";
// the following is not allowed: Join is not an instance method
greeting.Join("\t", fruit); // error

// must use the String class name to invoke the static Join method
String.Join("\t", fruit);  // OK

<b>(b)</b>
<pre>// C#
string beverages = "orange juice\troot beer\tcoffee\twater";

// the following is not allowed: Split is not a static method
String.Split(separator, beverages); // error

// must use a String object reference to invoke the Split method
beverages.Split(separator);         // OK

Back to Article

Listing Four

(a)

# Perl@
drinks = ("orange juice", "root beer", "coffee", "water");
foreach $drink (@drinks) {
   print $drink, "\n";
}

<b>(b)</b>
<pre>// C#
string[] drinks = {"orange juice", "root beer", "coffee", "water"};
foreach (string drink in drinks) {
   Console.WriteLine(drink);
}

Back to Article

Listing Five

(a)

# Perl@
drinks = ("orange juice", "root beer", "coffee", "water");
foreach $drink (@drinks) {
   print $drink, "\n" if $drink =~ /ee/;
}

<b>(b)</b>
<pre>// C#
string[] drinks = {"orange juice", "root beer", "coffee", "water"};     
Regex regex = new Regex("ee");
foreach (string drink in drinks) {
   if (regex.IsMatch(drink)) {
      Console.WriteLine(drink);
   }
}

<b>(c)</b>
<pre># Perl@
drinks = ("orange juice", "root beer", "coffee", "water");
foreach (@drinks) {
   print if /ee/;
}

Back to Article

Listing Six

// C#
string[] drinks = {"orange juice", "root beer", "coffee", "water"};     
foreach (string drink in drinks) {
   if (Regex.IsMatch(drink, "ee")) {
      Console.WriteLine(drink);
   }
}


Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.