Channels ▼
RSS

Web Development

A mod_perl 2 Primer


December, 2004: A mod_perl 2 Primer

Pete works for Community Internet Ltd. (http://www.community.co.uk/) and can be contacted at pete@clueball.com.


In the beginning, there were web servers. You had a nice transactionless way to allow remote users to browse areas of your computer's filesystem. But this wasn't enough—what if you wanted to allow remote users to execute local applications? Thus, CGI was born as a protocol for allowing remote users to pass arguments to applications on your machine. Suddenly, you could write guestbook applications and all sorts of other monstrosities, and make your web site "dynamic." Great.

If your application was written in Perl, however, your web server had to fire up Perl, run your script, and tear Perl back down again each time someone wanted to run your code. This was somewhat processor and memory intensive, especially if your web server was trying to run your application for more than one user at once.

But, mod_perl embeds a persistent Perl interpreter into Apache, allowing you to intercept requests and handle them with a Perl module. Thus, you can rewrite your application to be wrapped up in a Perl module, and execute it over and over again without having to initialize a new Perl instance and load up all the other libraries your application may use each time you want to run some code.

mod_perl allows you to use Perl to control other parts of the Apache request process, too, such as HTTP authentication—these other uses, however, are outside of the scope of this article. This article is intended to give reasonably competent Perl programmers a quick-and-dirty introduction to mod_perl 2, and to give those already familiar with mod_perl 1 an idea of how mod_perl 2 is different.

What Is mod_perl 2?

mod_perl has been around for a while, and many people are familiar with it. However, Apache 2 includes many features not available in Apache 1, and presents an opportunity for mod_perl developers to fix concepts that were "broken" in the original mod_perl.

However, mod_perl 2 isn't finished yet. Its stability and fitness for use in a production environment are discussed briefly later, but the fact remains that the latest release (at the time of writing) is on Version 1.99_17—a dead giveaway that it's still beta. Therefore, when we refer to mod_perl 2, this is what we're talking about—hopefully, in the near future, we'll see a mod_perl with a version number of 2 or above, but that's not the case today.

Is mod_perl 2 Ready to Use?

The low version number is an oft-cited reason people aren't comfortable with using mod_perl 2 in a production environment yet. Geoffrey Young, coauthor of mod_perl Developer's Cookbook (Pearson Education, 2002) and a mod_perl developer, paints a different picture:

"mod_perl 2.0 is as stable as a piece of software can be without large-scale deployment fleshing out the issues that only large-scale deployment can expose...in our minds, 2.0 will be released when we consider the API to be frozen and immutable. Not releasing an official 2.0 is just us committing to the user-base that we promise to not mess with how the official API looks [once released]. But really, we're at the immutable stage with 99% of the stuff at this point, so unless you're doing really funky stuff (specifically stuff you couldn't do in [mod_perl 1]) you probably wouldn't notice."

We're not going to be doing anything funky here, so we should be safe.

Setting Up Apache

Fully configuring Apache is outside of the scope of this article. You can find complete details at: http://perl.apache.org/docs/2.0/ user/config/config.html.

However, it's worth pointing out that all the following examples are run on my server using the following lines in the Apache configuration file, and if you're reasonably comfortable with the Apache configuration, you can probably steal and adapt these:

# Tell Apache where to find our Perl modules...

PerlSwitches -I/Users/sheriff/modules/

# Specify the location the mod_perl handler applies to

<Location /DisplayPage/>
  SetHandler perl-script

  # Specify the module to use to handle this 'location'
  PerlResponseHandler Module::Name

</Location>

When Apache invokes a handler from a request, it passes that handler a bundle of information about the request in the form of a C struct (a bit like a hash) called the "request record." This is passed to your handler in the form of an Apache::RequestRec object; you are not, of course, expected to interact directly with the request record—Apache::RequestRec's API takes care of all that for you. The function called handler in your Perl library will be invoked with this object as the first argument.

Apache::RequestIO and Apache::Const

So far we've touched briefly on Apache::RequestRec for interacting with Apache. There are two other modules that are worth knowing about before we attempt a Hello World module.

Apache::RequestRec allows us to retrieve data about the incoming request. Apache::RequestIO provides us with IO methods we can use on the request object—for example, it allows us to print data to the user. Finally, Apache::Const sets up some constants for us to use for returning HTTP status codes from our handler routine.

Here's a very basic example:

01:  package TestModules::Hello;
02:  use strict;
03:  use Apache::RequestRec (); 
04:  use Apache::RequestIO (); 
05:  use Apache::Const -compile => qw(:common);
06:  sub handler {
07:    my $r = shift;
08:    # Grab the query string... 
09:    my $query_string = $r->args;
10:    # Print out some info about this... 
11:    $r->content_type( 'text/plain'); 
12:    $r->print( "Hello world! Your query string 
                     was: $query_string\n" );
13:    return Apache::OK;
14:  } 
15:  1;

The first line declares the package name of our handler, and the second sets the strict pragma, forcing us to predeclare our variables and helping to make any mistakes that we make more visible.

The inclusion of the third line may strike some readers as odd, but we'll come back to that in a moment—for the time being, let's say (accurately) that it loads the Apache::RequestRec library. Line 4 loads the Apache::RequestIO library, and line 5 loads Apache:: Const and asks it to load up the "common" set of constants.

handler is the name of the function that mod_perl calls when it wants to pass a request to your library. It passes the request object to this handler as its first argument. Traditionally, people save the request object to the scalar $r.

Line 9 retrieves the query string of the URL that led to our handler being called—the args method is provided by Apache::RequestRec. Line 11 sets the content-type of our response, and line 12 outputs some data to the user, including the query string they sent us. Finally, line 13 tells Apache that we finished what we wanted to do successfully and that an HTTP code of "200" (which indicates success) should be sent back to the user.

So if we're being passed an Apache::RequestRec object, why do we need to explicitly load this library, too? This is a somewhat controversial design decision—you're being passed an object that's blessed into a class that may not exist in memory yet. Some people may consider this a crime against nature. On the other hand, it does help to keep your code footprint down if you're not intending to use any of the methods provided by Apache::RequestRec itself—other modules, such as Apache::RequestIO, add methods to the Apache::RequestRec namespace.

You can find documentation for all these modules at http:// perl.apache.org/docs/2.0/api/index.html.

Redirect Script

So we've successfully written our first mod_perl 2 handler. However, it doesn't really do much that's very useful. Next, I'll describe a very simple redirect handler. For this, you'll need to know something about Apache::URI and APR::Table.

Headers and APR::Table

To do a simple redirect, we're going to need to read in the URL, decide how to redirect based on that, construct a new URL, add a redirect header, and send it to the user. Let's talk about setting a redirect header first.

HTTP allows you to set more than one header of the same name, which means mod_perl needs to store the headers you want to send in a way that reflects this. A simple hash-based method for storage won't work—you can't easily assign more than one value to a hash without messing around with array references. Thus, headers are represented by APR::Table objects, which hide all this behind a nice, tidy API.

Apache::RequestRec gives us the method headers_out, which returns an APR::Table object. We need to add a "Location" header, so we can use the set method on this object, which adds or overwrites a key's value. For a more comprehensive discussion of APR::Table, see http://perl.apache.org/docs/2.0/api/APR/Table.html.

Essentially, what we need to do is:

# Retrieve the out-going headers APR::Table object
my $headers = $r->headers_out;

# Set/overwrite the 'Location' key's value
$headers->set( Location => 'http://whatever/' );

Or more conscisely:

$r->headers_out->set( Location => 'http://whatever/' );

So we're almost there. Now we just have to construct our URL to send out. We could just create a simple scalar and use that, but we might be hosting requests on a variety of hosts and ports, so let's be a little more intelligent about it.

Apache::URI

Apache::URI gives us a nice way to construct URLs using the requested URL as a base. Apache::URI brings to $r the construct_url method (among others—see http://perl.apache.org/docs/2.0/api/ Apache/URI.html). This allows us to create a fully qualified URL from a relative path. So, for example, we could say:

my $new_url = $r->construct_url( '/foo' );

Which, assuming the request had been for "http://wherever :9000/asdf/," would give us "http://wherever:9000/foo." Excellent.

Putting it all together, we get:

01:  package TestModules::Which;
02:  use strict; 
03:  use Apache::RequestRec (); 
04:  use APR::Table ();
05:  # We only need to load the 
     # REDIRECT status constant...
06:  use Apache::Const -compile => qw( REDIRECT );
07:  sub handler {
08:    my $r = shift;
09:    my $url = $r->construct_url('/new_location/');
10:    $r->headers_out->set( Location => $url );
11:    return Apache::REDIRECT;
12:  }
13:  1;

Using CGI.pm

CGI is a rather ambiguous term in this context. CGI stands for Common Gateway Interface, and describes how to pass data to code being executed by a web server. Due to the use of the phrase "CGI script" to mean an application executed by a web server, people tend to talk about CGI or mod_perl. However, CGI is the way to get data to your mod_perl handler, so it's appropriate that recent versions of CGI.pm (2.92 and above) allow you to interact with mod_perl.

Essentially, you can use CGI.pm in your mod_perl handlers in almost the same way that you would in your CGI scripts. All you have to do is initialize your CGI object using the Apache::RequestRec object:

my $cgi = CGI->new( $r );
my $value = $cgi->param( 'key' );

CGI::Cookie can also be used in a mod_perl environment—see the CGI::Cookie documentation for more information.

Sensible Database Access

Presumably, you don't want to be reconnecting to your database through DBI for each and every request—you want to create your database handle outside of the handler sub. But every Apache process will have its own copy of your handler library in memory, and presumably, you're not going to want each of those to have an open connection to your database if you're not using it.

This is where Apache::DBI comes in handy—use it before you start invoking DBI and it'll transparently maintain a pool of database connections that all the different instances of your handler can use. It even works transparently with Class::DBI.

Final Example

To bring all these ideas together, we'll write a handler that potentially does something useful. Sites like http://xrl.us/ and http://www.tinyurl.com/ allow you to generate short URLs that link to longer ones. We're going to write a handler that will provide redirects in this manner—when a URL is requested, we'll extract the key, search the database for a stored URL, and redirect the user as appropriate.

The only thing we've not seen yet that we're going to introduce with this example is the use of the path_info method. What exactly this returns is somewhat complicated—but if we set up a handler to match the location "/x/," and someone asks for "/x/abc," the path_info method will return "abc." The code for this is shown in Listing 1.

Hopefully, this short tutorial will get you started writing your own mod_perl handlers—for many applications, you will not need to venture outside of the toolbox presented here.

TPJ



Listing 1

package TestModule::Redirect;

  use strict;
  use warnings;
  
  use Apache::DBI;
  use Apache::RequestRec ();

# We only need two of the status code constants...

  use Apache::Const -compile => qw(
    NOT_FOUND
    REDIRECT
  );

# Connect to the database

my $dbh = DBI->connect(
  'dbd::mysql::redirect',
'user',
'password',
);

# Prepare our SQL query

my $sql = $dbh->prepare("
  SELECT url FROM redirects WHERE key = ?
");

# And finally our handler...

sub handler {

  my $r = shift;

# Get the part of the URL we want (see notes)

  my $key = $r->path_info;

# Strip out anything we're not expecting...

  $key =~ s/[^a-z]//g;
  
# Which might leave us without a key at all,
# so we check and give an error if this is the
# case
  
  return Apache::NOT_FOUND unless $key;

# Grab the URL in the database corresponding
# to the key

  my $result = $sql->execute( $key );
  my $url = $result->fetchrow_arrayref;

# If there's no entry for the key in the database,
# then let the user know

  return Apache::NOT_FOUND unless $url;

# Set the new URL, and redirect appropriately

  $r->headers_out->set( Location => $url->[0] );
  return Apache::REDIRECT;

}

1;
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV