Channels ▼
RSS

Web Development

Eight Million Ways to die


May, 2004: Eight Million Ways to die

Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at merlyn@stonehenge.com.


What is that old saying? "The best-laid plans of mice and men..." And like that old saying, sometimes your programs don't go the way you expect.

For example, a user might not enter a number between one and five, even though your prompt carefully suggests that they do. Or maybe the file you expected to create in that directory can't be created. Or the database connection fails to connect (and was it because the system was down or because you were given a bad password?). Or the module you needed for a particular part of the application failed to load (maybe because it was never installed).

Usually, when things go wrong, you want to know about it and do something in your program in response. For example, consider a program that updates a data file, incrementing a value:

open OLD, "counter";
open NEW, ">counter.tmp";
print NEW <OLD> + 1;
close OLD;
close NEW;
rename "counter.tmp", "counter";

Note that we have six lines of code, any of which could fail. Let's take the simplest ones first. If the first open fails, we'll be using a closed filehandle in the third line, which will look like a 0 to the add 1 operation, and we'll get a "1" value in the final file.

Now, this might actually make sense for this application: The first invocation of the program yields a 1 value. But, if we have warnings enabled, we'll get a warning when we attempt to read from a closed filehandle on the third line because that's generally considered bad style, if not a more serious error. We could notice the return value from that open and rewrite the code like this:

my $old_value = 0;
if (open OLD, "counter") {
$old_value = <OLD>;
close OLD;
}
open NEW, ">counter.tmp";
print NEW $old + 1;
close NEW;
rename "counter.tmp", "counter";

and this does solve the extraneous warning. We now take an alternate execution path if the file is not initially present, thus, nicely sidestepping the warning.

But what if the open failure is from something more serious than "file not found"? My UNIX open(2) manpage lists about a dozen different reasons for a failure, including esoteric things such as "a symbolic link loops back onto itself." How do we distinguish those?

The error variable $! starts to look pretty interesting. For example, we can distinguish between good, file not found, and everything else with a three-way branch:

if (open OLD, "counter") {
  # good
} elsif ($! =~ /file.*not found/) {
  # not found, default to 0
} else {
  # everything else
}

Because I'm using $! in a string context, I get to see the string-ish error message. This is fairly operating-system specific, but if you're not trying to be portable across a wide variety of systems, you can get away with such matches.

Note that I'm testing $! only when I've had a failure and immediately afterward. This is the only time I can be sure that there's really an error in there because, although an operating-system request failure sets $!, nothing normally resets it. Thus, this code is broken:

## BAD CODE DO NOT USE
open OLD, "counter";
if ($! =~ /file.*not found/) { # file not found
  ..
} elsif ($!) { # other error
  ..
} else { # everything OK
  ..
}

We're not necessarily testing the failed open call here—any prior failed call might give us a false positive.

But now, we need to decide what to do if we get that unexpected error. There's an old joke among programmers: "Don't test for anything you aren't willing to handle," but we can no longer plead ignorance here. The most common solution is to abort the entire program and let the sysadmin on duty watch take care of it, and that's easy with die. Let's redesign our program so that a missing counter file is considered a bad, bad thing, and abort the program if that first open fails:

unless (open OLD, "counter") {
  die "Cannot open counter";
}

In this case, a false return value (for any reason) from open triggers the die, which aborts our program immediately. The error message is sent to STDERR (rather than STDOUT) to ensure that the message is not lost in a typical redirection to a file or pipe. In addition, the filename and the line number are automatically appended to the message, unless the message string ends in a newline. This helps us find the source of the die among many modules and files.

Note that the error message contains the attempted operation as well. Again, this helps the debugging a bit, other than the cryptic "died at line 14" from the default message. This is especially handy when the filename for the operation might have come from another source:

chomp(my $filename = <SOMEOTHERFILE>);
unless (open OLD, $filename) {
  die "Cannot open $filename";
}

Before making it a habit to include such information in my die messages, I was occasionally confused about why my program was failing because I had presumed that a variable contained something other than what it did. Always echo the input parameters in the error message!

Another thing to include is the $! I mentioned earlier. That can help us figure out the kind of failure:

unless (open OLD, "counter") {
  die "Cannot open counter: $!";
}

And finally, this is too much typing. The or operator executes its right operand only when the left operand is False, so we can shorten this to the traditional:

open OLD, "counter"
  or die "Cannot open counter; $!";

So, to fully instrument my original program, I could add or die to each of the steps that might fail:

open OLD, "counter" or die;
open NEW, ">counter.tmp" or die;
print NEW <OLD> + 1 or die;
close OLD or die;
close NEW or die;
rename "counter.tmp", "counter" or die;

Wait a second? Why am I checking the return value from print? And from close? Those can't fail, can they? Certainly they can, although this is probably one of the few times you'll see any program that tests for them. The print can fail if the filehandle is closed or if there's an I/O error, such as a disk being full. And the close can fail if the filehandle is closed or if the final buffer being flushed at the time of the close couldn't be written (again, typically from a full disk).

This seems like a lot of typing. Can we reduce this? Sure, with the Fatal module, part of the Perl core for recent versions of Perl. We simply list the subroutines that should have an automatic or die added, and away we go:

use Fatal qw(open close rename);
open OLD, "counter";
open NEW, ">counter.tmp";
print NEW <OLD> + 1;
close OLD;
close NEW;
rename "counter.tmp", "counter";

Now we have (nearly) the same program with a lot less typing. The downside to this approach is that we don't really get to say what the error message is, other than the default Died. To get a bit more control, I could add :void to that argument list, and then any of those calls that have explicit testing for the return value will no longer be fatal:

use Fatal qw(:void open close rename);
open OLD, "counter" or warn "old value unavailable, presuming 0\n";
open NEW, ">counter.tmp";
print NEW <OLD> + 1;
close OLD or "ignore";
close NEW;
rename "counter.tmp", "counter";

Why didn't I list print here? Well, Fatal uses some magic behind the scenes, and print resists this magic. Oops. We'll have to do that one by hand.

The die operator is fatal to the program unless it is enclosed within an eval block (or by a __DIE__ handler, but I digress). Once safely within the eval block, any die aborts the block, not the program. Immediately following the block, we check the $@ variable, which is guaranteed to be empty if the block executed to completion (or the text message that would have been sent to STDERR if we would have otherwise aborted). Time for an example:

use Fatal qw(:void open close rename);
for my $file (qw(counter1 counter2 counter3)) {
  eval {
    open OLD, "$file" or warn "old value unavailable, presuming 0\n";
    open NEW, ">$file.tmp";
    print NEW <OLD> + 1;
    close OLD or "ignore";
    close NEW;
    rename "$file.tmp", "$file";
  };
  if ($@) {
    print "ignored error on $file (continuing): $@";
  }
}

Here, I've put the previous code inside the eval block, using $file in place of the literal filenames. If any of the steps within the eval block fail, we skip immediately to the end of the block. The message ends up in $@. If the message is present, we note it on STDOUT. Whether there was an error or not, we're continuing the loop.

Now, suppose we conclude that any permission-denied message inside the eval block is likely to mean we're not going to get much further on the rest of the program. We can take different actions based on the value within $@. For example:

use Fatal qw(:void open close rename);
for my $file (qw(counter1 counter2 counter3)) {
  eval {
    open OLD, "$file" or warn "old value unavailable, presuming 0\n";
    open NEW, ">$file.tmp";
    print NEW <OLD> + 1;
    close OLD or "ignore";
    close NEW;
    rename "$file.tmp", "$file";
  };
  if ($@ =~ /permission denied/i) {
    die $@; # rethrow $@
  } elsif ($@) {
    print "ignored error on $file (continuing): $@";
  }
}

If the message in $@ after the loop matches permission denied, we rethrow the error. In this case, there's no outer eval block, so the program aborts. However, had there been an outer eval block, we'd simply pop out one more level. In turn, that outer block could handle the error or rethrow it again to the next level (if any), and so on.

Matching the specific text of error messages can be a bit problematic, especially when you have to change the text for internationalization of your program. Fortunately, modern versions of Perl permit the die parameter to be an object, not just a text message. When an object value is thrown with die, the $@ value contains that object as well. Not only does this let us pass structured data up the exception-handling logic, we can also create hierarchies of error classifications to quickly sort entire groups of errors apart.

The best framework I've seen for creating such error categories is Exception::Class found in CPAN. Let's restructure our program to use exception objects rather than text testing (see Listing 1).

The first lines (invoking Exception::Class with parameters) create a hierarchy of classes, starting with my E class (selected because the name is short). From E, I break errors into two categories: user-related errors and file-related errors. File-related errors are further categorized into various file operations. The isa parameter defines the base class for each derived class, permitting the use of normal isa tests for quick categorization.

Now, inside the eval, instead of a simple die, I use the throw method of an appropriate error class, with a specific error message. I won't need to include $! here because I'll know that every error in the E::File category was system-call related, and I can put that just once in the error handler.

Finally, the error-handling logic just past the end of the eval block is also changed. If $@ is an object derived from my E, then I sort out what kind of error it might be. Note that I've chosen to handle all E::Create errors as relatively "fatal" to my loop (although they might in turn be caught by some outer eval block not shown here). User errors are distinguished from E::File errors, with the latter displaying the $! value automatically. Also note that any legacy errors (from an ordinary die or maybe a reference or object not within my hierarchy of classes) simply get rethrown as well.

This framework is actually quite flexible, permitting additional structured attributes to be carried along in the error object, as well as having objects inherit from multiple class hierarchies to distinguish multiple traits (file versus database, fatal versus recoverable, and so on). If you're building a complex application, you should definitely look into using Exception::Class or something similar. Until next time, enjoy!

TPJ



Listing 1

use Exception::Class (
  E => { description => "my base error class" },
  E::User => { description => "user-related errors", isa => qw(E) },
  E::File => { description => "file-related errors", isa => qw(E) },
  E::Open => { description => "cannot open", isa => qw(E::File) },
  E::Create => { description => "cannot create", isa => qw(E::File) },
  E::Rename => { description => "cannot rename", isa => qw(E::File) },
  E::IO => { description => "other IO", isa => qw(E::File) },
);
for my $name (@ARGV) {
  eval {
    $name =~ /^\w+$/ or E::User->throw("bad file name for $name");
    open IN, $name or E::Open->throw("reading $name");
    open OUT, ">$name.tmp" or E::Create->throw("creating $name.tmp");
    print OUT <IN> + 1 or E::IO->throw("writing $name.tmp");
    close IN or E::IO->throw("closing $name");
    close OUT or E::IO->throw("closing $name.tmp");
    rename "$name.tmp", $name or E::Rename->throw("renaming $name.tmp to $name");
  };
  if (UNIVERSAL::isa($@, "E")) { # an object error from my tree
    if ($@->isa("E::User")) {
      warn "Pilot error: $@"; # warn and continue
    } elsif ($@->isa("E::Create")) {
      $@->rethrow; # same as die $@
    } elsif ($@->isa("E::File")) { # other IO errors
      warn "File error: $@: $!";
    } else {
      warn "Uncategorized error: $@"; # warn and continue
    }
  } elsif ($@) { # a legacy die error
    die $@; # abort (possibly caught by outer eval
  } # else everything went ok
}
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV