Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Scripts as Modules


November, 2004: Scripts as Modules

brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].


Lately, I've been writing scripts that look more like modules. Some of this comes naturally from writing so many modules, but I've also discovered that these scripts are easier to manage and test. The result is a Perl file that acts like a module when I use it like a module, and acts like a script when I use it like a script.

When I first started writing Perl, I mostly wrote the usual type of script: I started at the top of the page and wrote statements that I expected to execute in order as I moved down the page. I could read it like a movie script, going from one line to the next.

Later, as I learned more Perl and got more programming experience (even with other languages such as C, Java, and Smalltalk), I started using more functions. The code, however, was still very procedural and I could follow it linearly down the page, even if I had to look at a function definition every so often.

In the past couple of years, I've gotten on the testing bandwagon. I want to test everything. The Test::More module made that very easy, and there has been an explosion in the number of Test::* modules to check various things (and I'm responsible for a couple of them).

Scripts are hard to test, though. I can choose the input at the start, such as the environment variables, command-line arguments, and standard input. After that I have to wait for the script's output to see if things went right. If things didn't go right, I have to figure out where, between the first line and the last line, things went wrong.

A module is relatively easy to test. A good module author breaks down everything into methods (or plain old functions) that do one task or small bits of a task. As long as the methods don't use side effects (like looking at global variables, including nonlocal versions of Perl's special variables), I can give each method some arguments and check its return values. Once I test each of the methods, I can confidently use them knowing that they do what I expect. When things go wrong, I have a lot less to search through to find the problem.

The Core Structure

To create this sort of script, which I have been calling a "modulino," I take Perl back a step. Remember the main() function of a C program, and how Perl made the whole script file main() and called it main::? I need that back, but I'm going to call it the run() method:

#!/usr/bin/perl
package Local::Modulino;

__PACKAGE__->run( @ARGV ) unless caller();

sub run { print "I'm a script!\n" }

__END__

My modulino no longer assumes that it is in the main:: namespace and that the whole file is the script. I need to put everything that I want to do in the run() method, just like I would do with C's main(). As a start, my modulino just prints a short message.

The script-or-module magic works in the third line, which checks the result of the caller() Perl built-in function. If something else in the Perl script calls the file, the caller() function (in scalar context) returns the calling package name. That's true if another Perl file loads this one with use() or require(). If I run my modulino as a script, there is no other file loading it, so caller() returns undef. If caller() returns a false value, then I execute the __PACKAGE__->run() method (the Perl compiler replaces the __PACKAGE__ token with the current package name).

That's it. That's the core of the dual-duty Perl modulino. Everything else is just programming.

I save this file as Modulino.pm and execute it in a couple of different ways. From the command line, I can call it like a script. The caller() expression returns False, so Perl executes the run(), which prints out my short message:

prompt% perl Modulino.pm
I'm a script!
prompt%

When I load the file as a module using the -M switch (which works like use()), the caller() expression returns a true value (extra credit for knowing what the actual value is), so unless() doesn't evaluate the rest of the statement. The run() never runs, and I don't get any output:

prompt% perl -MModulino.pm -e 1
prompt%

I can still get output though—I just need to call the run() method myself:

prompt% perl -MModulino -e 'Local::Modulino->run()';
I'm a script!
prompt%

The Rest of the Story

Now that I have the basic structure of the modulino, I need to apply it to something useful, but that's probably too long for the space I have left in this article. Well, maybe not. For a while, I've wanted a little tool to download the RSS feed from The Perl Journal and print a table of contents. Unlike the PDF files I get for each issue, I always know where the RSS file is: It's the same URL every time.

I wrote a modulino to download, parse, and display the table of contents of The Perl Journal. I could have written this as a script and gone through each of those steps in sequence, but with a modulino, I have a bit more flexibility; and when I decide to test it, I should be able to find problems easier and faster.

In Local::Modulino::RSS (see Listing 1), I display the table of contents as text, which works just fine for me in my terminal window. However, since I structured the code as a module, I could very easily do something else. Perhaps I want to convert the table of contents into HTML so I can display it on my personal home page. Since only my run() knows anything about the data presentation, I just have to override it, which I show later.

The rest of the modulino is a collection of very short functions doing a very specific task. I can easily write some testing code to make sure each of the small functions does what I think they should. I skip that part here since the topic has been covered so well in other articles.

On line 1, I start with a shebang line. If I want to run this as a Perl script without specifying "perl" on the command line, the operating system needs to know which interpreter I intend to use.

Next, I define the package name and invoke the run() method if I call the file as a script. If I use this file as a module, caller() returns true and I don't call the run() method.

On line 8, I define my run() method. On line 11, I take the first argument, which is the package name, off of the argument list. Each method does this, so I can subclass the task. I call each function as a class method so inheritance works out right. The methods will always know who is calling them, even if it is a derived package.

Most of the complexity of the task is hidden behind functions. The fetch_items() method is composed of the get_url(), get_data(), and get_items() methods that do most of the actual work. My run() method simply gets the parts it needs. This way, when I want to write another run() method, I won't have to do so much work.

On line 13, I go through each item and extract the information for that issue. The get_issue() function returns the title of the item, which turns out to be the month and year of publication, along with the articles in that issue as a list of article title, author anonymous array pairs. It's the data, so up to this point, I can still do just about anything I like, but once I have the text for the title and the articles for the latest issue, I simply print them to the terminal as plain text, as I show in Listing 2.

Some of you might have noticed the start of a model-view-controller (MVC) design (although I don't have much controller going on). The data handling and the presentation don't depend on each other. The MVC design, which may sound fancy or exotic, naturally pops up when I use a lot of small functions to do single tasks. The only part of my script that deals with the presentation of the data is the run() method, and that's easy to override with a subclass.

In fact, it's so easy to subclass that I might as well do it here. In Listing 3, I create the Local::Modulino::RSS::HTML modulino, although it only overrides the run() method by defining its own version. I have to tell it that it is a subclass of Local::Modulino::RSS with the use base declaration so it looks in that class for methods it does not define, such as fetch_items() and get_issue(). I also require "RSS.pm" because I didn't bother to install these files as proper modules, so I don't want my modulino to look in Local/Modulino/RSS.pm to find the file. I show the new output format in Listing 4.

Code

By creating a modulino, I get my Perl scripts to do double duty as scripts and as modules. If I structure the code as a module, I can reuse and override it just like a module. Since I broke everything down to small functions instead of using a procedural style, I also make things easier to test.

TPJ



Listing 1

  1 #!/usr/bin/perl
  2 package Local::Modulino::RSS;
  3 
  4 __PACKAGE__->run() unless caller();
  5 
  6 use HTML::Entities;
  7 use Data::Dumper;
  8 
  9 sub run
 10   {
 11   my $class = shift;
 12   
 13   foreach my $item ( $class->fetch_items )
 14     {
 15     my( $title, @articles ) = $class->get_issue( $item );
 16     
 17     print "\n$title\n------------------\n";
 18     printf "%-45s %-30s\n", @$_ foreach ( @articles );
 19     }
 20     
 21   }
 22 
 23 sub fetch_items
 24   {
 25   my $class    = shift;
 26   
 27   my $url      = $class->get_url();
 28   my $data     = $class->get_data( $url );
 29   my @items    = $class->get_items( $$data ); 
 30   }
 31   
 32 sub get_issue
 33   {
 34   my $class    = shift;
 35   
 36   my $title    = $class->get_title( $_[0] );
 37   my @articles = $class->get_articles( $_[0] );
 38   
 39   return ( $title, @articles );
 40   }
 41   
 42 sub get_articles
 43   {
 44   my $class    = shift;
 45   
 46   my $d = $class->get_description( $_[0] );
 47 
 48   my @b = split /<br>\s*<br>/, $d;
 49   my @articles = ();
 50   
 51   foreach my $b ( @b )
 52     {
 53     my @bits = split /<br>/, $b;
 54     $author  = pop @bits;
 55     
 56     my $title = join " ", @bits;
 57     
 58     $class->_normalize( $author, $title );
 59     push @articles, [ $title, $author ];
 60     }
 61     
 62   @articles;
 63   }
 64   
 65 sub get_description { $_[0]->_field( $_[1], 'description' ) }
 66 sub get_title       { $_[0]->_field( $_[1], 'title'       ) }
 67 sub get_items       { $_[0]->_field( $_[1], 'item'        ) }
 68 
 69 sub _normalize 
 70   { 
 71   my $class    = shift;
 72   
 73   foreach ( 0 .. $#_ )
 74     {
 75     $_[$_] =~ s/^\s*|\s*$//g;
 76     $_[$_] =~ s|</?b>||g;
 77     $_[$_] =~ s|\s+| |g;
 78     }
 79   }
 80   
 81 sub _field
 82   {
 83   my $data = $_[1];
 84     
 85   HTML::Entities::decode_entities( $data );
 86   
 87   my @matches = $data =~ m|<\Q$_[2]\E>(.*?)</\Q$_[2]\E>|sig;
 88     
 89   wantarray ? @matches : $matches[0];
 90   }
 91   
 92 sub get_data 
 93   { 
 94   my $class    = shift;
 95   
 96   require LWP::Simple; 
 97   my $data = LWP::Simple::get( $_[0] );
 98   defined $data ? \$data : $data;
 99   }
100   
101 sub get_url { 
102   "http://syndication.sdmediagroup.com/" .
103     "feeds/public/the_perl_journal.xml"
104     }
    
Back to article


Listing 2
September 2004 PDF
------------------
Objective Perl: Objective-C-Style Syntax And Runtime for Perl Kyle Dawkins                  
Scoping: Letting Perl Do the Work for You     David Oswald                  
Secure Your Code With Taint Checking          Andy Lester                   
Detaching Attachments                         brian d foy                   
Unicode in Perl                               Simon Cozens                  
PLUS Letter from the Editor Perl News         Source Code Appendix          

August 2004 PDF
------------------
Regex Arcana                                  Jeff Pinyan                   
XML Subversion                                Curtis Lee Fulton             
OSCON 2004 Round-Up                           Andy Lester                   
Molecular Biology in Perl                     Simon Cozens                  
Pipelines and E-mail Addresses                brian d foy                   
PLUS Letter from the Editor Perl News         Source Code Appendix          
Back to article


Listing 3
#!/usr/bin/perl
package Local::Modulino::RSS::HTML;

use base qw( Local::Modulino::RSS );

require "RSS.pm";

__PACKAGE__->run() unless caller();

use HTML::Entities;

sub run
  {
  my $class    = shift;
  
  foreach my $item ( $class->fetch_items )
    {
    my( $title, @articles ) = $class->get_issue( $item );
    
    print "\n<h3>$title</h3>\n\n<ul>\n";
    printf "<li><b>%s</b>, %s\n", @$_ foreach ( @articles );
    print "</ul>\n";
    }
  }
Back to article


Listing 4
<h3>September 2004 PDF</h3>

<ul>
<li><b>Objective Perl: Objective-C-Style Syntax And Runtime for Perl</b>, Kyle Dawkins
<li><b>Scoping: Letting Perl Do the Work for You</b>, David Oswald
<li><b>Secure Your Code With Taint Checking</b>, Andy Lester
<li><b>Detaching Attachments</b>, brian d foy
<li><b>Unicode in Perl</b>, Simon Cozens
<li><b>PLUS Letter from the Editor Perl News</b>, Source Code Appendix
</ul>

<h3>August 2004 PDF</h3>

<ul>
<li><b>Regex Arcana</b>, Jeff Pinyan
<li><b>XML Subversion</b>, Curtis Lee Fulton
<li><b>OSCON 2004 Round-Up</b>, Andy Lester
<li><b>Molecular Biology in Perl</b>, Simon Cozens
<li><b>Pipelines and E-mail Addresses</b>, brian d foy
<li><b>PLUS Letter from the Editor Perl News</b>, Source Code Appendix
</ul>
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.