Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Scoping: Letting Perl Do the Work for You


September, 2004: Scoping: Letting Perl Do the Work for You

Dave may be reached via e-mail at [email protected] or on http://www.perlmonks.org/ as davido.


In 1982, my parents took a comedian's advice and bought a Texas Instruments TI-99/4a computer. It doesn't matter two decades later what computer I grew up with, so much as where it led. I embraced BASIC, not because it was spectacular, but because it was what was immediately available. And there was a lot of support for it out there back then. Armed with articles from Compute! and a few books, I began my adventure. Despite the primitive and limited nature of early BASIC implementations, I learned a thought process and developed a passion.

Those of you who remember early BASICs probably have fond memories of developing little gee-whiz programs that presented simple games and solved trivial problems. But despite its popularity, BASIC lacked many of the refinements that the previous 20 years of computer science had developed, refinements that, in the 20 years since, have become de facto standards. Subroutines in primitive BASICs consisted of segments of code accessed via gosub. There were no such things as user-defined functions, nor were there user subroutines with parameter lists. All variables were global, all code was on the same playing field, and as you may recall, that was a recipe for confusion as programs grew in size. It didn't help that it was common for us novice programmers to use such descriptive variable names as x and a. After a few lines of code, it became difficult to remember what different symbols stood for, and even more difficult to remember which symbols were already in use. This situation was mitigated by the fact that computers of the day usually were a little short on memory; a constraint that inhibited the creation of disastrously long BASIC programs. Languages such as Pascal overcame many of BASIC's shortcomings, but were considered beyond the reach of the hobbyists who took to BASIC.

Perl has had the benefit of growing up the brainchild of some very bright minds from our generation of computer science. But Perl isn't too proud to borrow an idea or two from the bright minds behind other programming languages, as well. It comes as no surprise then, that Perl implements many of the ideas of modularization and scoping found in languages like Pascal, C, Modula-II, and C++, ideas reinforced in my computer-science courses. Of course, Perl is equally likely to ignore some of the rules these languages established. But we're going to focus on a feature that Perl has (in my opinion) truly excelled at: scoping.

Scope

Perl provides a rich set of tools to control scope that can be broken into three categories: package scope, dynamic scope, and lexical scope. First, let's talk about packages. Packages house namespaces implemented as glorified hashes. It is a common idiom for there to exist a 1:1 relationship between packages and files, but this is only a convention, and it's entirely possible to find packages that span files and files that contain multiple packages. None of that really matters: A package is a namespace and that's all we really need to remember.

A variable that lives in a package's namespace is called a "package global." Often the word "package" is left off, and we talk about "global variables." We almost always mean package globals. Even the main portion of a Perl script exists within a package: package main. Perl isn't particularly strict about who can look at a package's globals; other packages can inspect each others' package globals, and these variables can even be exported into other packages, including package main. But the general idea is that package globals are scoped to their package. Because of all the tricks we can play with package globals, they're incredibly useful in implementing Perl's modules. But this article is about letting scoping do the work for you. To this end, I want to shift the focus away from package globals. Their overuse also tends to lead to the sort of code we tried to leave behind 20 years ago. So let's move on to discuss the primary subject: lexicals.

I mentioned earlier that Perl provides three primary types of scope: package (the global symbol tables); dynamic (a means of manipulating the global symbol table); and lexical. Lexical scope is built and controlled by blocks. Blocks can exist in the form of files, packages, bare { ... } blocks, loops, conditionals, eval blocks, code blocks (as in map { ... } @array or sort { $b <= $a } @array), and subroutines. Where package globals are visible to other packages via the use of fully qualified names (i.e., $MyPackage::varname), lexically scoped variables are not visible to the world outside of their scope. But Perl is a little smarter about how this works than a language such as C. "Auto" variables in C should never have a pointer to them passed outside of their scope. But with Perl, it's perfectly OK to pass references to a lexically scoped variable to the world outside of that variable's scope. More on this in a minute.

Lexical variables are declared using my(). They can be initialized at time of declaration. I'm going to assume that this is enough information on how to declare a lexical. If not, have a look at perldoc -f my and perldoc perlsub for more information. It's a good read.

Reference Counting

Perl, unlike C, handles garbage collection by itself with the reference-counting technique. When a lexical variable is declared, its reference count increments to one. When it falls from scope, the reference count is decremented. If it drops to zero, the variable is garbage collected. This is a bit of an oversimplification, but sufficient for our discussion. Now, what happens if in addition to the named variable, a reference to that variable also exists, perhaps at a broader scope? See the following example:

my $ref;
{		# Create a block-bound lexical scope.
    my $var = 10;
    print $var, "\n";
    $ref = \$var;	# Create a reference to $var, with a variable
		# that is declared at broader scope.
}		# Close the block-bound lexical scope.
print $var, "\n";	# Nothing prints. $var is out of scope.
print $$ref, "\n";	# Dereference $ref, and thus, print '10'.

You can see from this example that though $var has passed out of scope, $ref, which holds a reference to $var, is keeping the contents of $var alive. $var is inaccessible by name, but by reference its value is still available. If $ref also passed out of scope, the reference count would be decremented again, and reaching zero would result in garbage collection.

Maintainable Style

One of the primary advantages of lexical scoping is that it promotes a maintainable programming style. If a programmer keeps the variable scope narrow, it becomes less important to worry about whether $idx is being used elsewhere in a script, so long as any preexisting use isn't needed within the current lexical scope. For example:

use strict;
use warnings;
my $var = 10;
{
    my $var = 20;
    print $var, "\n";
}
print $var, "\n";

The output will be 20, and then 10. This is because within the narrower lexical block, we've declared and defined a new $var, which means that whatever we do to or with $var, in that lexical block, the $var existing at a broader scope is unaffected. A declaration at narrower scope masks variables of the same name and type that exist at broader scopes. That means that each lexical scope can, if needed, become a new private namespace. my() is aptly named; just think of a lexical scope being the person talking: "My $var equals 10" (as opposed to some other scope's $var).

Lexical scoping obviously may be nested. Narrower scopes will have access to all the variables declared at broader scopes, so long as those variables haven't been masked by a declaration at the narrower scope. But completely separate scopes that aren't nested won't have such access. For example:

{
    my $this = 10;
}
{
    print $this, "\n";
}

Here we have two separate lexical blocks. They're not nested, thus $this is unavailable to be printed.

Destructors

When a lexical variable passes out of scope and its reference count drops to zero, it is garbage collected or destroyed. Normally, this has the simple effect that the memory consumed by the value of the variable is relinquished back to Perl, and the name that accessed it becomes inaccessible. But sometimes there are other side effects. Lexical scoping can be put to work taking advantage of those side effects for your benefit.

One example of the destructor doing more than just reclaiming memory is with the open command. If the filehandle being opened is a scalar with undefined value, it becomes a lexical filehandle. What happens when a lexical filehandle falls out of scope? The file gets closed implicitly.

my $filename = 'somefile.txt';
my $linecount = 0;
{
    open my $fh, '<', $filename or die $!;
    while( my $line = <$fh> ) {
        print $line;
        $linecount++;
    }
}
print $linecount, "\n";

This is a complete snippet (though contrived, and not really all that useful). By complete I mean that the filehandle held in $fh gets closed as soon as its enclosing lexical block falls out of scope. This leads to a common Perl file-slurping idiom that relies on lexical scoping to close the file being read:

my @lines = do{ open my $fh, '<', $filename or die $!;
                <$fh>;
              };

Now a file has been opened, its contents slurped into @lines, and its handle implicitly closed as the do{...} block finishes. The one caveat is that if you open a file for output and let the filehandle close implicitly via the magic of lexical scoping, you won't be able to perform the or die $!; error checking on the close() function.

Destructors also apply to object-oriented programming. If you define a DESTROY() method as part of your object, whatever code exists in that method will be executed as the object's entity reference falls out of scope. For example:

package MyPack;
sub new {
    my $class = shift;
    bless \my $self, $class;
}
sub DESTROY {
    print "Goodbye.\n";
}
1;

package main;
{
    my $obj = MyPack->new();
    print "The object was created.\n";
    print "Now we're going to let it fall out of scope.\n";
}
print "See, it was just destroyed.\n";

As you see, when the object's last reference falls out of scope, it is destroyed, and the DESTROY() method is invoked prior to the final garbage collection. This is useful anytime you have cleanup that needs to take place when an object disappears from existence.

DESTROY() also applies to tied entities. That means that if you use tie to tie a scalar to a class, you can define the DESTROY() method to carry out some task when the tied scalar falls out of scope.

Closures

No discussion of lexical scoping would be complete without mention of closures. This topic took me a little time to pick up, but it's really not all that complicated.

A closure is a situation where a lexical scope has closed and a reference to a sub defined within that scope is passed to the outside world. That closure sub still has access to the lexical variables that existed within the scope that just ended. Confusing? Look at this:

my $subref;
{
    my $value = 100;
    $subref = sub { return ++$value; }
}
print $subref->(), "\n";
print $subref->(), "\n";

Now is it a little clearer? $value is inaccessible directly from outside its narrowly defined scope. Yet the reference to the sub created in that scope exists at a broader scope. The sub it refers to has full access to whatever variables existed in the scope in which it was defined. Thus, any time you call up that sub referred to by $subref to do its duty, it is able to act upon $value.

Accessors and Setters

In the object-oriented world, an accessor is an object method that accesses (or returns) data internal to the object. A setter is an object method that sets data internal to the object. Here's an object-oriented example:

package MyPack;

sub new {
    bless {}, shift;
}

sub setter {
    my( $self, $val ) = @_;
    $self->{VALUE} = $val;
}

sub accessor {
    my $self = shift;
    return $self->{VALUE};
}
1;

package main;

my $obj = MyPack->new();
$obj->setter("Hello world\n");
my $phrase = $obj->accessor();
print $phrase;

Though the topic of setters and accessors isn't strictly a discussion about lexical scoping, I provided that example as a means of introducing the fact that setters and accessors may also apply to closures:

my $setref;
my $getref;
{
    my $value;
    $setref = sub { $value = shift; };
    $getref = sub { return $value;  };
}
$setref->( 100 );
my $closure_val = $getref->();
print $closure_val, "\n";

Putting the Pieces Together

Lexical scoping can and should be used to constrain a variable to the narrowest useful scope. This practice will aid in writing maintainable code; code where a minor change here won't ripple into a major disaster somewhere else, and code where it is easy (or at least possible) for someone to come along after the fact and understand what use a particular variable has.

I also hope to have illustrated that lexical scoping can be used to perform complex tasks through the use of destructors and closures. I encourage you to proceed from here to Perl's POD. In particular, perlsub and perlref will assist you in gaining a firmer grasp on what lexical scoping is all about. Finally, I hope that its use will help you to get more out of Perl.

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.