Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Web Error Reporting


October, 2003: Web Error Reporting

Web Error Reporting

The Perl Journal October, 2003

By Randal Schwartz

Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at [email protected].


Recently, a discussion occurred on the Perl CGI beginners mailing list about using CGI::Carp, presumably in the fatalsToBrowser configuration. This core module is a very nice tool for developing Perl CGI applications, as it can be used to redirect errors or at least make them safe to display in a web browser. For example, to make all of your fatal errors turn into a message to your browser instead, you simply add:

use CGI::Carp qw(fatalsToBrowser);

to the beginning of your program. Even syntax errors occurring later in the same file will show up in the browser, which is especially nice if you're using a shared hosting environment and have difficult or no access to the web server's error logs.

But as great as such setup is for development, it is exactly the wrong thing to do for a web site that will be accessed by the general public. An error message displayed to the browser of a random visitor is generally frowned upon (no matter how infrequently it might appear).

Worse yet, visitors are unlikely to report the problem, or if they do, their report will likely be paraphrased or incomplete, making diagnosis difficult. (Can you tell that I've had experience working in help desks?)

But the biggest problem is the potential leak of security information. I've encountered many ill-designed web sites that (upon the failure of some necessary component) showed me, in intricate detail, the pathnames being used, operating system details, values of internal program variables, and even the SQL and database table names for a query! This is better than dumpster-diving as a means of finding a method of intrusion, because the potential intruder can immediately vary the query and see how it affects the error messages.

But what else should we do? If we don't have any error trapping, we give the user the unpleasant "500 error" experience. If we trap the error, where do we write it?

One solution that has occurred to me is to log the error (even to the standard web server error log), but also generate a virtual "trouble ticket" that will be shown to the user, with a message that is essentially, "Something went wrong, we know about it, and here's our trouble ticket number if you want to reference this problem." A simple $SIG{__DIE__} or eval handler will take care of catching the error, and then a bit of massaging of the error text will ensure a clean splatting into the error log.

But we can even take this one step further. Instead of simply telling the user that an error has occurred and has been logged, let's give the user a text form to describe what they were trying to do. Using a hidden field, we can integrate the incident number into the form submission to correlate their description of the problem with our known error messages, which we log into a separate file. Later, we can create trouble tickets by correlating the log files using the ticket numbers. We can also give the user the option of including a contact e-mail address or phone number for further information, and log that as well.

The CGI script in Listing 1 has been modified to trap its errors in such a fashion. Although the code is included explicitly in this script in the first BEGIN block, you'd generally want to factor that out into a site-wide module for consistency and ease of upgrading, starting every local CGI script with something like:

use lib "/my/site/lib";
use MySite::ErrorWrapper;

But for simplicity, let's look at the code in place.

Lines 1 through 3 start nearly every CGI program I write, turning on the normal compiler restrictions, taint checking, warnings, and unbuffering STDOUT.

Lines 5 through 44 define the BEGIN block that will set up our handler. Because it's in a BEGIN block, even syntax errors occurring later in the file are handled properly.

Lines 6 through 43 define a die handler for all untrapped fatal errors. The text of the error message will be passed to this subroutine as its first parameter.

Line 7 creates an incident number. Here, I'm taking the current epoch time (integer seconds since 1-Jan-1970 midnight UTC) and the process ID, and concatenating them into a 12-character hex string. This is reasonably unique per host (unless you can cycle 30,000 processes in one second), but might collide if several web servers are working in a load-balancing situation. You might consider the File::CounterFile module from the CPAN for generating monotonically increasing identifiers, or perhaps a real database connection using DBI to get the next incident report number.

Line 9 takes the incoming error message text from the argument list. Line 12 logs this message to the web server error log, ensuring that every line of the error message is prefixed by the incident number in a relatively distinct pattern for later extraction and processing. For example, if die "one\ntwo\nthree\n" is triggered, the resulting error messages might look like:

crash 3f71df085c4f: one
crash 3f71df085c4f: two
crash 3f71df085c4f: three

with the hex string naturally containing the timestamp and process ID.

Lines 15 to 41 present the user with a simple form, consisting of a text area for the description of the problem, a text field for the e-mail address, and a submit button. In addition, a hidden text field includes the incident number so that we may later correlate this submission with the trigger.

Line 42 causes the CGI program to successfully exit after having delivered the error form. This keeps the script from generating a "500 error."

The URL for the form has to point to a second script to capture the submitted information. A sample implementation for this script is given in Listing 2. But we'll get to that in a minute.

Following the BEGIN block, we continue the rest of our program as we normally would. As a sample program for testing, I included some text that had a syntax error (line 49)—the beginning of a regular expression without the matching end.

The error reporting program in Listing 2 should be as simple as possible. In this case, I'm pulling in CGI.pm to decode the parameters and generate the response HTML. Again, lines 1 through 3 are like the previous program, and line 5 pulls in CGI.pm.

Lines 7 to 14 use the CGI.pm shortcut methods to generate a simple response using small soothing words.

Line 18 opens up an error log for appending. I put the error log here in /tmp for simplicity during my coding, but I'd probably move this to a more secure location for production, typically alongside the web error logs in the same directory.

Again, in the "keep it simple" motif, I wanted to store all the environment variables in the response, in case the user agent or source IP address or referrer page might be useful. I'm doing this by using the CGI.pm's param function to pretend that additional parameters were passed as the form values. Each param is named like the Perl-style access to that environment variable. For example, a param name of $ENV{HOME} stores the value for the HOME variable.

Finally, in line 20, the contents of the form fields are dumped in CGI.pm's dump format into the log file. By using this dump format, we'll be able to restore the data very cleanly, regardless of how many odd characters were included in the form submission or environment.

Note that it might be a good idea to flock this filehandle before writing to it, but as long as the writes are under 8 KB or so, we probably won't have any problems.

Once we've set up all of our CGI scripts with the BEGIN block from Listing 1, and the common CGI crash-reporting script in Listing 2, we're ready to watch for errors. As the errors occur, they'll be logged into the web server error log, and the user will be shown the simple form. If the user fills out and submits the form, the resulting form values (including our added %ENV values) are placed into the separate crash reporting log.

All we need to do now is extract the values from the crash reporting log and correlate those with the web server's error log. Luckily, CGI.pm can do most of the hard work for us again. In Listing 3, I have a short snippet of a program to extract the data.

Please note that this is not a CGI script and will not be placed into the CGI directories of my web server, even though it contains use CGI in line 3. I'm pulling in the CGI module only to parse the CGI dump-format file. Line 5 defines the location of that file, which is opened in line 9.

Line 7 defines a top-level data structure that will hold all of the entries in the file, so that I can dump them in lines 20 and 21.

Lines 10 through 18 process through the file by having the CGI module construct new CGI objects from the file until the file is at end-of-file (tested in line 10).

Line 11 assigns the newly constructed CGI object (pulled from the file in the CGI dump format) into the special $CGI::Q variable. This variable is the default CGI object for objectless CGI calls, so that I can just say param() later instead of $CGI:: Q->param().

Line 12 creates a hash that will hold all of the params of this particular form submission.

Lines 13 to 16 go through each param, pulling out its values into @p in line 14. If there are fewer than two values, then $p[0] is used as the final value.

Line 17 pushes a reference to this param hash onto the @all data structure.

Lines 20 and 21 use Data::Dumper to show us what the various error reports would have looked like after parsing. Had this been an actual code, the incident values of each report would be keyed into some database correlated with incident values from the web server error log. But that's beyond the scope of what I wanted to show in these examples.

So, there you have it. Don't show error messages to your web users. Put them someplace where you can get to them, but let the user annotate with their own comments to help you diagnose your problem. And as always, until next time, have the appropriate amount of fun!

TPJ



Listing 1

=1=     #!/usr/bin/perl -Tw
=2=     use strict;
=3=     $|++;
=4=     
=5=     BEGIN {
=6=       $SIG{__DIE__} = sub {
=7=         my $incident = unpack "H*", pack "NS", time, $$;
=8=     
=9=         my $message = shift;
=10=    
=11=        ## record reason for death into the error log
=12=        warn join "", map "crash $incident: $_\n", split /\n+/, $message;
=13=    
=14=        ## now give the user a place to record their experience
=15=        print <<END_OF_HTML;
=16=    Content-type: text/html
=17=    
=18=    <hr>
=19=    <h1>An error has occurred</h1>
=20=    <p>An error has occurred. Relevant details have already been 
=21=    logged for us, but if you want to tell us what you were doing at
=22=    the time, it might help us determine the cause more accurately 
        to fix the problem more easily.</p>
=23=    <form action="/cgi/crash-reporter">
=24=    <input type="hidden" name="incident" value="$incident">
=25=    <table>
=26=    <tr><th>
=27=      Email address (optional):
=28=    </th><td>
=29=      <input type="text" name="email" value="foo\@example.com" size="60"/>


=30=    </td></tr>
=31=    <tr><th>

=32=      Comments:
=33=    </th><td><textarea name="comments" rows="10" cols="60">
=34=    Describe your circumstances here.  What were you trying to do?
=35=    </textarea></td></tr></table>
=36=    <input type="submit" />
=37=    </form>
=38=    <p>Or, you can contact us at 1-800-555-5555 and reference incident
=39=    number $incident.</p>
=40=    <hr />
=41=    END_OF_HTML
=42=        exit 0;
=43=      };
=44=    }
=45=    
=46=    use CGI qw(:all);
=47=    print start_header, start_html("Buggy Program");
=48=    
=49=    / bad syntax
=50=    
=51=    print end_html;
Back to article


Listing 2
=1=     #!/usr/bin/perl -Tw
=2=     use strict;
=3=     $|++;
=4=     
=5=     use CGI qw(:all);
=6=     
=7=     print
=8=       header,
=9=       start_html("Incident Reporter"),
=10=      h1("Thank you!"),
=11=      p("Thank you for your incident report.",
=12=        "The appropriate authorities have been notified.",
=13=        "Remain calm."),
=14=      end_html;
=15=    
=16=    ## save the parameters and environment
=17=    
=18=    open ERRLOG, ">>/tmp/crash-reporter.log";
=19=    param("\$ENV{$_}", $ENV{$_}) for keys %ENV;
=20=    $CGI::Q->save(\*ERRLOG);
Back to article


Listing 3
=1=     #!/usr/bin/perl -w
=2=     
=3=     use CGI qw(param);
=4=     
=5=     my $FILE = "/tmp/crash-reporter.log";
=6=     
=7=     my @all;
=8=     
=9=     open FILE, $FILE or die;
=10=    until (eof FILE) {
=11=      $CGI::Q = CGI->new(\*FILE);
=12=      my %p;
=13=      for (param) {
=14=        my @p = param($_);
=15=        $p{$_} = @p < 2 ? $p[0] : \@p;
=16=      }
=17=      push @all, \%p;
=18=    }
=19=    
=20=    use Data::Dumper;
=21=    print Dumper \@all;
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.