Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

On Perl as a Natural Language


April, 2004: On Perl as a Natural Language

Russell is a Perl programmer, MySQL developer, and web designer living and working on a consulting basis in New Orleans. He is also an adjunct instructor at a local college where he teaches Linux and other open-source software. He can be reached at [email protected].


With a formal education in linguistics, Larry Wall created the computer programming language Perl in 1987. Given his liberal attitudes, Wall has minimally guided the Perl community's growth for more than 15 years now. He has, however, retained control over the committee responsible for Perl's linguistic development so that he may ensure that Perl adheres to his linguistic vision. Part of Wall's linguistic philosophy appears in his famous essay entitled "Natural Language Principles in Perl." In his essay, Wall argues that Perl is superior and very different from all other computer languages because it is a natural language, much like English. In this article, I will explore Wall's natural language principles as they relate to Perl, expand on them, and give code examples, as well as some analogies to English where appropriate.

Learning Perl

The first principle that Wall cites as proof of Perl as a natural language is the fact that with Perl, one can be very expressive. "You learn a natural language once and use it many times" (Wall). Unlike the designers of the programming language BASIC, which was designed to make it easy for beginners to learn the language fully, Perl designers haven't shied away from complexities in order to make the language easier to learn. This would take away from the possibilities of Perl. For instance, if you only know the language English and you're trying to learn Italian, you might find it difficult when faced with the various inflections of the definite article (that is to say, the). In Italian, you use lo, l', il, la, gli, i, or le, depending on the gender of the associated noun and whether it's singular or plural, as well as the first one or two letters of the noun.

In English, it's the in all cases, regardless of quantity or gender or the starting letters of the noun that follows. Eliminating inflections from Italian would make it easier for Americans to learn the language, but it would also eliminate some of its richness and functionality.

In Perl, although a flow-control statement such as while may be a little difficult for a newcomer to programming to comprehend, once you learn it, you are able to use it many times and never outgrow it. You can feel some comfort as you begin each new script knowing that you have already acquired some skills.

The English language is immense. The second edition of the Oxford English Dictionary contains about 290,000 entries, with about 616,500 word forms. This is an enormous number of words for anyone to learn. However, "an educated person has a vocabulary of about 20,000 words and uses about 2,000 in a week's conversation" (Wilton). These kind of numbers are common to most users of natural languages. As a result, Wall says, "Nobody has ever learned any natural language completely." In Perl, there are hundreds of built-in functions and hundreds of commands and operators. Also, there are tens of thousands of extension objects and modules. CPAN, the depository of public Perl modules, boasts over 5500 modules. Each module has its own commands and methods. This adds up to well over 10,000 "words," half of what an average educated person knows in a native spoken language. With these kind of numbers, no one can be expected to learn every command and nuance of Perl. Not even Larry Wall knows it all—it's not practical. Therefore, you're not expected to learn all of Perl. To start with, you just need to learn what you need to accomplish your tasks. With Perl, if your vocabulary is minimal and your methods are simple, although your code might not be very tight, it can still be useful in communicating what you want. And that makes it valid.

The third principle of a natural language that Larry Wall suggests about Perl is related to the previous one. It has to do with the acceptance of many levels of competence in Perl programmers by the Perl community. Wall says that "if a language is designed so that you can 'learn as you go,' then the expectation is that everyone is learning, and that's okay." It is this mature attitude that accounts for the success of Perl community web sites like The Perl Monastery (http://perlmonks.org/). Perl monks of all levels (from novice to monk to saint) can post questions on the site about Perl and not feel ignorant, nor worry about being ridiculed for asking for help. Again, it's understood that we're all learning Perl and that we're all at different levels of competence, improving at our own pace.

The same attitude about many levels of competence exists to some extent in spoken human languages. People don't tend to poke fun at a child for having a small vocabulary and for writing simple sentences. Nor do people of polite company ridicule people of lesser formal education for their limited vocabulary and weak grammar skills. We are aware of these shortcomings and we react to them at times, but we don't necessarily find fault in the speakers because of them. We accept them as they are and interface with them on their terms as best we can. What's important is each individual's need to express himself.

Influences and Borrowings

A phrase that is often repeated in the Perl community is, "there's more than one way to do things in Perl." This is another aspect of the expressiveness of Perl. There's no one way or "right" way to write a Perl script. Each programmer has an individual style and way of communicating through code. Some use strict and declare all variables. Some programmers are sloppier and don't allow for error checking. Some take a couple hundred lines of code to say what could be said more efficiently in fewer than 50 lines. And some use one- and two-letter names for variables while others use lengthier descriptive names. These deviations are because of the flexibility inherent in Perl and because Perl and the members of the Perl community have grown out of several different programming languages and backgrounds. A programmer who learned the structured language of C before learning Perl will write a different program to accomplish the same task as someone who first learned the object-oriented language of Java. Neither method is the right way; which method is better is debatable. You have to do what works for you, what conforms to your background and your skills.

One of the reasons for the large number of words and seemingly inconsistent pronunciations in English is due to the fact that many words have been borrowed from other languages and eventually became part of English. For instance, many English words come directly from French (e.g., entrepreneur) and German (e.g., kindergarten). In Perl, it's very much the same. Perl has borrowed commands and methods from C, sed, awk, Lisp, Python, shell, and a few others, in addition to English. In fact, Perl continues to get ideas from other languages. For instance, while there is the command case in C, until the Perl module Switch (by Jarkko Hietaniemi) added case to Perl by extension, one had to use a series of if and elsif statements to accomplish the same effects. With Version 6 of Perl, though, case will become part of the vocabulary of Perl without the use of a module. Whereas it was felt unnecessary over the years by some of the designers to include a command for case in Perl, an individual on his own decided to create a module to make it available for all and, thereby, has proved the need for it to be part of standard Perl. Otherwise, "efforts to maintain the 'purity' of a language only succeed in establishing an elite class of people" (Wall) and do nothing for the development of the language and thereby the service of the community that uses the language. Without changes coming from the community through modules and usage, Perl would be extremely limited and probably would not survive.

Roughness and Ambiguity

Coming from the previous French colony of New Orleans, Louisiana, I can appreciate the roughness allowed in Perl. In the old Northern European colonies, many believe that planning leads to success. Here in New Orleans, there is a cultural attitude that out of chaos comes creativity and then order and thereby greater success than one could have planned. With Perl, I can choose to use strict or not. I can ignore errors or I can stomp them out. More importantly, I can write a program in Perl that will work under basic and expected conditions to get started and I can refine the details later. Besides functionality, because there is more than one way to do things in Perl, I can write a program in a simple manner to solve an immediate problem and then go back later to tighten the code to allow for more possibilities and to improve performance. "In terms of [a written] language [like English], you say something that gets close to what you want to say, and then you start refining it around the edges" (Wall). This feature of flexibility and expandability (or rather contractability) makes Perl easier to learn and use, and it encourages true creativity.

Since a variety of styles are possible, each programmer can take pride in his programming style or he can strive strictly for functionality. "Natural languages are used by people who for the most part don't give a rip how elegant the design of their language is. Ordinary folks scatter all sorts of redundancy throughout their communication to make sure of being understood" (Wall). I know that there are times when I could use the default variable of $_, but choose to name a variable for clarity. There are times when a public module or one of my own private libraries could be cleaner and take up less code. However, I will choose to hack through my own code within the script I'm working on just to keep things more straightforward and obvious for myself and anyone who comes behind me. "Stylistic limits should be self-imposed, or at most be policed by consensus among your buddies" (Wall).

One flexible component of Perl that many outsiders seem to dislike is the allowing of local ambiguity. "Generally, within a natural language, ambiguity is resolved rapidly using recently spoken words and topics" (Wall). In English, if one is careful of syntactical elements (i.e., the word order and proximity), pronouns can be used and reused without risk of confusing the listener. For instance, consider this sentence: "My brother said that his boss told him that he needed him to work late even though he wanted to leave early." Without asking me any questions or reading any other related sentences, you probably know who I'm referring to each time I say "him" and "he." The same situation exists in Perl. "There are a number of pronouns in Perl: $_ means 'it', and @_ tends to mean 'them'" (Wall). Therefore, we can write a script like this:

#!/usr/bin/perl -w

use strict;

@_ = qw(one two three);
$_ = 'test';

print $_;

foreach $_(@_){
   print $_;
}

print $_;

exit;

After the obligatory initial lines, we set our initial variables. We load the default array (@_) with the words, one, two, and three. We then set the default scalar variable ($_) to a value of test. Before starting the flow control statement, the script prints the value of $_ to the screen. The foreach statement reads through each element of the array, places each in $_, and then prints it out before going onto the next element. When all of the elements of the array have been processed, the script leaves the scope of the foreach and then prints the value of $_, which has now reverted back to test due to its context and without being instructed. The results from running this script are as follows:

testonetwothreetest

Perl understands, in both instances, that $_ outside of the foreach statement means test and that within it means the value of each element of @_. That's just like a pronoun in English and it's something that other programming languages don't do. It's this flexibility that makes Perl better and makes it difficult for programmers coming from a more rigid programming language to learn Perl. By the way, with the exception of the fourth line, the $_ could be eliminated everywhere else and the script will still work:

#!/usr/bin/perl -w

use strict;

@_ = qw(one two three);
$_ = 'test';

print;

foreach (@_){
  print;
}

print;

exit;

In this tighter script, we nod to what we want and Perl knows that we mean the pronoun and knows what its value should be in each context.

Clarifications

"Part of the reason a language can get away with certain local ambiguities is that other ambiguities are suppressed by various mechanisms. English uses number and word order..." (Wall). English also uses syntax to give the listener signals as to what is meant by each use of a pronoun. "Similarly Perl has number markers on its nouns" (Wall). So $color contains one element (one color) and is singular, while the array @color potentially contains more than one element and is plural. "So $ and @ are a little like 'this' and 'these' in English" (Wall)—$color basically means this color and @color means these colors. While case is not a factor with Perl, it does make distinctions sometimes by word order. The example Wall gives in his essay is that sub use starts a subroutine named "use" and use sub calls a module named "sub."

Another signal to the listener for clarifying and reducing ambiguity is the use of topicalizers. "A topicalizer simply introduces the subject you're intending to talk about" (Wall). In English, this can be done with just a few words at the beginning of a sentence: "As for myself" and "With regard to the students" are both topicalizers to orient the listener to the context by which the words that follow will be made. In Perl, for and foreach statements are topicalizers. For instance, foreach(@color) { print; } will print the pronoun $_ based on the context of the @color elements, not what $_ meant before and after the @color topic is introduced and discussed.

Bigger Pictures and Conclusion

With artificial languages, rules about discourse structures can be rigid. The order in which code is laid out is sometimes irrelevant. For instance, you could group related lines of code and consider them to be paragraphs. These paragraphs can be dropped into different subroutines or functions. By using functions, one could greatly mix up the order of a script. "Perl tends to be pretty free about what order you put your statements, except that it's rather Aristotelian in requiring you to provide an explicit beginning and end for larger structures, using curlies" (Wall).

"Because a language is designed by many people, any language inevitably diverges into dialects" (Wall). In a sense, some of the Perl modules can represent specialized vocabularies and grammars; they can be said to be dialectal differences in Perl. For instance, Perl web developers often speak in CGI.pm terms. MySQL devlopers are typically well versed in DBI.pm. In some ways, this is analogous to comparing speakers of various New Orleans dialects and speakers of the dialects of New York City. Speakers of both sets of regional dialects have a commonality in English—they both know standard English. The same is true for CGI and DBI Perl programmers: While they have their specialized nouns and verbs associated with each module, they also know the basic commands of Perl (i.e., if and print), which could be considered the koine dialect. On the other hand, referring to the use of a module as a dialect may not be appropriate: "Differences in language that depend on who we are constitute dialect. Differences in language that depend on where, why, or how we are using language are matters of register" (Pyles). One could argue that a web developer uses the CGI module because she's a web developer and that would make CGI.pm a dialect. One could also argue that she uses CGI.pm because she's trying to be more efficient in web development and, therefore, the module is a register.

Ultimately, what makes a language natural is its growth that comes out of the community, out of its users diverging and creating new words and new grammar rules. This kind of growth cannot be controlled by anyone effectively. Creativity comes naturally from the chaos and anyone and everyone can have an effect. "We all contribute to the design of our language by our borrowings and coinages, by copying what we think is cool and eschewing what we think is obfuscational" (Wall). Perl is not Larry Wall's language, it's our language. Unlike a Microsoft language, its source code is open to all of us and anyone can be part of the design process for Perl. Just by using it, we are helping it to grow and are adding to its status as a natural language.

References

Pyles, Thomas and John Algeo. The Origins and Development of the English Language, 4th ed. 1993. Harcourt Brace & Company. Fort Worth, Texas.

Wall, Larry. "Natural Language Principles in Perl." http://www .wall.org/~larry/.

Wilton, David. "How many words are there in the English language?" May 2003. http://www.wordorigins.org/.

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.