Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Designing for Pluggability


July, 2004: Designing for Pluggability

Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.


Someone once said that the mark of a great piece of software is that it can be used to do things that its author never anticipated. The key to this is building sufficient flexibility into the application and, in this article, I'm going to show you some of the modules and techniques in my toolbox for doing just that, starting from relatively humble beginnings and ending with a new way of creating extensible database applications that I'm quite proud of.

UNIVERSAL::require

We begin with the UNIVERSAL::require module. This isn't so much related to extensibility itself, but it will be used as a building block for many of the other techniques we'll look at.

UNIVERSAL::require is a simple module and does a simple job. When you need to load some code at runtime (the essence of pluggable design), you find there are several ways to do it in Perl. You can use do or string eval if you know where the code is coming from, but what if you have a module name instead of a file name? You can't use use because that takes place at compile-time and you can't use require $module_name because require with a variable, rather than a string constant, expects a filename, not a module name.

So, if we're trying to programmatically load an extension module at runtime—again, something we'll be doing a lot when developing pluggable software—we end up writing fudges like:

eval "require $module_name";

UNIVERSAL::require exists purely to tidy up this very case. It adds a require method to the UNIVERSAL namespace, meaning that we can call require on any class:

My::Module->require;

This method just does the eval "require My::Module" fudge with a little better error checking, so we can now say:

use UNIVERSAL::require;
$module_name->require;

This is the first step on the road to building our own extensible applications.

Do-It-Yourself Pluggability

The Perl module Mail::Miner analyzes a piece of e-mail for various features, which it stores in a database table. It does this by calling a set of "recognizers," which are its plug-in modules. Here's how we load up the plug-in modules:

use File::Spec::Functions qw(:DEFAULT splitdir);

my @files = grep length, map { glob(catfile($_,"*.pm"))  }
    grep { -d $_ }
    map { catdir($_, "Mail", "Miner", "Recogniser") }
    exists $INC{"blib.pm"} ? grep {/blib/} @INC : @INC;

my %seen;
@files = grep {
  my $key = $_;
  $key =~ s|.*Mail/Miner/Recogniser||;
   !$seen{$key}++
} @files;

require $_ for @files;

This is quite horrible but it's instructive to look at. We're trying to find all the files called "Mail/Miner/Recogniser/*something*.pm" in the include path, @INC, and the first @files = statement does this: It adds "Mail/Miner/Recogniser" to the end of each include path and checks to see if it is a directory. If it is, then we look for all the *.pm files in that directory.

The blib.pm bit is to be used for testing new recognizers. If we've said use blib somewhere, then we're in a test suite and we're only interested in the recognizer modules underneath the blib staging directory. This allows us to ensure that we're loading up the new modules instead of already installed ones. When we say use blib, or indeed any other module, Perl turns the module name into a short filename (blib.pm, say, or "Mail/Miner/Mail.pm") and puts this in the %INC hash with the value being the full path of the module file. Hence, looking in %INC is a good way of telling whether a particular module is loaded.

Next, we make sure we only have one copy of a given recognizer; this avoids problems when a module is installed in multiple places. Finally, we have a file name, so we can pass it to require and Perl will load the module.

So now we have loaded up all the Mail::Miner::Recogniser::* modules that we can find on the system. That's solved one problem. The second problem is, now that we have them, what do we do with them? How do they relate to the rest of the system?

The way Mail::Miner opted to do this was to have each of the plug-in modules write into a hash when they load and supply metadata about what they do:

package Mail::Miner::Recogniser::Phone;
$Mail::Miner::recognisers{"".__PACKAGE__} =
  {
   title => "Phone numbers",
   help  => "Match messages which contain a phone number",
   keyword => "phone"
};

Now Mail::Miner can look at the packages it has available in %recognisers, and call a particular interface on each one of them:

sub modules { sort keys %recognisers };

for my $module (Mail::Miner->modules()) {
  # ...
  $module->process(%hash);
}

This way, Mail::Miner can call out to additional installed modules without the author (me) knowing what plug-ins the user (you) has installed. Anyone can write a Mail::Miner::Recogniser::Meeting module; for instance, to attempt to identify meeting locations and times in an e-mail. Once it's installed in a Perl include path, it'll be automatically picked up and its process method will be called to examine an e-mail.

Module::Pluggable

As I said, that's how I used to do it—until Module::Pluggable appeared. We've seen the two problems involved in developing pluggable applications: first, finding the plug-ins; and second, working out what to do with them. Module::Pluggable helps with the first. It does away with all the nasty code we saw earlier. Now, to find all the recognizers installed in the Mail::Miner::Recogniser namespace, I can rewrite my code as follows:

package Mail::Miner;
use Module::Pluggable search_path => ['Mail::Miner::Recogniser'];

This gives me a plugins method that returns a list of class names just like the modules method did in our original code. If I wanted to make it completely compatible, I could also change the name of the method to modules with the sub_name configuration parameter:

use Module::Pluggable search_path => ['Mail::Miner::Recogniser'],
            sub_name => "modules";

This doesn't actually cause the modules to be loaded, so we could say:

$_->require for Mail::Miner->modules;

but we can also have the modules method itself load up the plug-ins by passing another configuration parameter:

use Module::Pluggable search_path => ['Mail::Miner::Recogniser'],
            sub_name => "modules",
            require => 1;

This is a drop-in—and much simpler—replacement for all the messing about with paths and @INC we saw earlier; it even handles the test case when blib is loaded. But I haven't replaced Mail::Miner's plug-in system with this and we'll see why later.

Making Callbacks with Class::Trigger

First, though, another CPAN module that handles the second problem—knowing what to do with your plug-ins when you have them. In Mail::Miner, we called a method that was assumed to be defined in all the plug-ins—whether or not they wanted it.

Sometimes this is the right way to do it, but often, an individual plug-in will want more control about what it responds to, especially if you're going to be calling your plug-ins on several different occasions for different things.

In these cases, you might find the CPAN module Class::Trigger a better fit. Class::Trigger allows you to add "trigger points" to your objects or classes to which third parties can attach code to be called.

For instance, if we have a method that displays some status information about an object, we could declare a trigger before printing the information out:

sub display_status {
  my $object = shift;
  my $message = $object->status;
  $object->call_trigger("display_status", \$message)
  print $message;
}

Now individual plug-in modules can register with the class subroutine references to be called when the trigger is called. For instance, a module might want to modify the message because it's going to be sent out as HTML:

use HTML::Entities;
MyClass->add_trigger( display_status => sub  {
  my ($obj, $message) = @_;
  $$message = encode_entities($$message);
});

Notice how we pass in a reference to the message, so that we can modify the message. Another plug-in could provide links for all the URIs it finds in the message:

use URI::Find::Simple qw(change_uris);
MyClass->add_trigger( display_status => sub  {
  my ($obj, $message) = @_;
  $$message = change_uris( $$message, 
               sub { qq{<a href="$_[0]">$_[0]</a>} } );
});

And now we come to a problem—how do we know that the "mark up URIs" trigger is going to be loaded after the "escape HTML entities" trigger? If we can't guarantee the ordering of the two triggers, we could end up with our link tags denatured by the entity escaping.

This was a problem that I came up against, albeit from a slightly different angle.

Pluggable Callbacks in Email::Store

You see, the reason I haven't rewritten Mail::Miner in the new plug-in style with Module::Pluggable is that I've been working on a much more extensible and advanced framework for storing and data-mining e-mail, which I've called Email::Store. In the way Email::Store works, pretty much everything is a plug-in.

When you store e-mail, Email::Store itself loads up the Email::Store::Mail plug-in, which sets up a placeholder database row for the mail. Then Email::Store::Mail calls all of the other plug-ins to examine the mail and file away the things they want to note about it—what mailing lists it came from, what attachments it has, and so on.

However, we also want these plug-ins to specify some kind of relative order in which they're called. For example, it's more efficient if the attachment handler strips the e-mail of its attachments before other plug-ins poke around in the e-mail body because, once you've gotten rid of the attachments, there's less e-mail body to poke around in.

All great ideas have been had before, of course, and this made me think of the UNIX System V init process. When a UNIX machine starts up, it consults files in an "rc" directory to start up particular services. These files are named in a particular way so that, when the initialization process looks at the directory, it sees the services in the order that they should be started up. For instance, S10sysklogd means "start the system logger at position 10," and S91apache means "start the Apache web server at position 91;" the logger gets started first and Apache later. Now, this isn't perfect because there can be several things in position 10, and they get run in alphabetical order; and besides, nobody's policing the numbers anyway. If you think S01foo means "very early" and someone else comes along and installs S00bar, theirs gets run first. But it gives you a rough way of providing an order to the process.

What I wanted to do was give my plug-ins a similar rough ordering: Attachment handling had to happen at position 1; working out the mailing list an e-mail came from was a low priority task and could happen at position 90 towards the end; everything else could go somewhere in the middle.

I also didn't really like the Class::Trigger approach of specifying a subroutine reference to be called. I prefer just writing methods. So, plug-ins that want to influence the way an e-mail gets indexed can provide two methods:

package Email::Store::Summary;

sub on_store_order { 80 }
sub on_store {
  my ($self, $mail) = @_
  # ...
}

on_store_order is the position in which we'll be called by the indexing process; on_store is what we do when we get called. This is implemented in the ::Mail class like so:

use Module::Pluggable::Ordered search_path => ["Email::Store"];

sub store {
  my ($class, $rfc822) = @_;
  my $simple = Email::Simple->new($rfc822);
  my $msgid = $class->fix_msg_id($simple);
  my $self;
  $self = $class->create ({ message_id => $msgid,
                message    => $rfc822,
                simple     => $simple });
  $class->call_plugins("on_store", $self);
  $self;
}

Module::Pluggable::Ordered provides the same functionality as Module::Pluggable but also provides a call_plugins method: You give it a name of a trigger and some parameters and it looks through your plug-ins, finds those that provide that method, orders them by their positions, and then calls them. In our normal Email::Store case, that one line would be the equivalent of:

Email::Store::Attachment->on_store($self);
Email::Store::Entity->on_store($self);
Email::Store::Summary->on_store($self);
Email::Store::List->on_store($self);

As new modules are developed and dropped into place, they're ordered by their on_store_order if they provide an on_store method and then placed into the list of on_store calls—all without Email::Store::Mail needing to know about them. The single call_plugins line combines both locating plug-ins and calling triggers to provide a facility for extending the indexing process.

Mixing Plug-Ins with Databases

Let's now go on to write the rest of the Email::Store::Summary class that we looked at earlier. This is going to store summary information about an e-mail so that it can be displayed in a friendly way—we'll store the subject of the mail and the first line of original content; that is, the first thing we see after removing an attribution and a quote. These will go in the summary database table, so we need to inherit from Email::Store::DBI the Class::DBI class that knows about the current database, and we need to tell it about the table's columns:

package Email::Store::Summary;
use base 'Email::Store::DBI';
Email::Store::Summary->table("summary");
Email::Store::Summary->columns(All => qw/mail subject original/);
Email::Store::Summary->columns(Primary => qw/mail/);

We'll use Text::Original, a module extracted from the code of the Mariachi mail archiver, which hunts out the first piece of original text in a message body:

use Text::Original qw(first_sentence);

sub on_store_order { 80 }
sub on_store {
  my ($self, $mail) = @_;
  my $simple = $mail->simple;
  Email::Store::Summary->create({
    mail => $mail->id,
    subject => scalar($simple->header("Subject")),
    original => first_sentence($simple->body)
  });
}

When e-mail is indexed, the on_store callback is called and it receives a copy of the Email::Store::Mail object that's being indexed. The simple method returns an Email::Simple object, which we use to extract the subject header and the body of the e-mail. Then we create a row in the summary table for this e-mail.

Next, for this to be useful, we need to tell Email::Store::Mail how this summary information relates to the mail:

Email::Store::Summary->has_a(mail => "Email::Store::Mail");
Email::Store::Mail->might_have( 
  summary => "Email::Store::Summary" => qw(subject original) 
);

Now an Email::Store::Mail object has two new methods—which, of course, we'll highlight in the documentation for our module. subject will return the first subject header and original will return the first sentence of original text. We use might_have to consider the summary table an extension of the mail table.

But now comes the clever bit. If this is truly to be a drop-in plug-in module, where is the summary table going to come from? It's one thing to be able to add concepts to a database-backed application, but these new concepts have to be supported by tables in the database. For the plug-in module to be completely self contained, it must also contain information about the table's schema. And this is precisely what Email::Store plug-ins do. In the DATA section of Email::Store::Summary, we'll put:

__DATA__
CREATE TABLE IF NOT EXISTS summary (
  mail varchar(255) NOT NULL PRIMARY KEY,
  subject varchar(255),
  original text
);

There's a mix-in module called Class::DBI::DATA::Schema, which is used by Email::Store::DBI (and hence anything that inherits from it), which provides the run_data_sql method. As its name implies, this method runs any SQL it finds in the DATA section of a class. So all we need to do is go through all of our plug-ins and run run_data_sql on them to create their tables:

sub setup {
  for (shift->plugins()) {
    $_->require or next;
    if ($_->can("run_data_sql")) {
      warn "Setting up database in $_\n";
      $_->run_data_sql ;
    }
  }
}

With this in place, a plug-in module is truly self contained: It specifies what to do at trigger points like on_store, it specifies the relationships that tie it in to the rest of the Email::Store application, and it specifies how to create the database table that it relates to.

There's one more slight niggle—because the end user specifies what SQL database to use and because not all databases use the same variant of SQL, what if the schema in a DATA section isn't appropriate for what the end user is using? Class::DBI::DATA::Schema handles this, too, by using SQL::Translator to automatically translate the schema to a different variant of SQL. We can say

use Class::DBI::DATA::Schema (translate => [ "MySQL" => "SQLite"] );

and write our DATA schemas in MySQL's SQL—except that we don't know at compile time that the end user is going to choose SQLite for his database; in fact, we don't know until the database is set up. So we end up doing something like this:

package Email::Store::DBI;
use base 'Class::DBI';
require Class::DBI::DATA::Schema;

sub import {
  my ($self, @params) = @_;
  if (@params) {
    $self->set_db(Main => @params);
    Class::DBI::DATA::Schema->import( translate => 
      [ "MySQL" => $self->__driver ]
    );
  }
}

When I say use Email::Store 'dbi:SQLite:mailstore.db', Email::Store::DBI first sets up the database, and then it imports CDBI::DATA::Schema, telling it to translate between MySQL and SQLite, the __driver for our database. The reality is slightly more complex than this because we use DBD::Pg and SQL::Translator expects it to be called not "Pg" but "PostgreSQL." But the basics are there. See the source to Email::Store::DBI for the full story.

We've looked at various tools to increase the pluggability of our applications, from merely requiring classes at runtime to using modules to help us find plug-ins and provide trigger points or callbacks for extensions to influence the behavior of a process. We put all these together in Module::Pluggable::Ordered, which also allows us to specify a rough ordering for the extension modules, and we added the concept of extending a database-based application by using Class::DBI::DATA::Schema to allow us to write fully self-contained database-backed plug-ins.

Making your applications pluggable is an excellent way of reducing the complexity of a design—Email::Store::Mail hardly does anything itself but delegates to plug-ins for almost all of its functionality. Module::Pluggable::Ordered and the database techniques we've looked at provide a low-effort way of doing that and allow your applications to be stretched and expanded in ways you might not imagine!

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.