Web Development

Playing with Pod

By brian d foy, December 01, 2005

Using Pod::Simple, the recently reworked pod tools module from Sean Burke, Allison Randal wrote a simple module to manipulate O'Reilly's PseudoPod format. brian presents it here.

The Perl Journal: Playing with Pod

brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].

Plain Old Documentation, or simply Pod, is a simple text format for embedded documentation. The pod format is described in the perlpod man page, so if you aren't already familiar with it, check out its documentation before you read on. You can translate the pod format to man pages, LaTeX, rich text, and several other formats with programs that come with Perl. You don't have to stop there, though, if you want to write your own pod translator. In this article, I'll write a simple translator to illustrate the process. The trick is to start with something that somebody has already done.

Sean Burke recently rewrote the pod tools as modules, making it extremely easy to write your own translator. His Pod::Simple module handles all of the parsing. If you want to do something different, you subclass Pod::Simple by following the instructions in Pod::Simple::Subclassing. Sean certainly went the extra mile by providing three modules to parse pod: event-based using Pod::Simple::Methody, token-based using Pod::Simple::PullParser, and XML-like-parser Pod::Simple::SimpleTree. One of these ways is going to work for you, and if they don't, there are some specialized subclasses you may want to use, but I don't discuss those here.

Randal Schwartz, Tom Phoenix, and I recently updated Learning Perl to its fourth edition. Being that Randal Schwartz is the original author, the sources are in Pod, although they are in a special sort of Pod called PseudoPod that O'Reilly Media uses. It has extra features to handle footnotes (you'll notice quite a bit of those in Learning Perl), cross-references, index entries, and a few other things mentioned in Pod::PseudoPod::Tutorial. O'Reilly editor Allison Randal wrote the Pod::Simple subclass Pod::PseudoPod as a base class for PseudoPod translators.

Let's start with a basic PseudoPod document. I've taken this pod directly from the Learning Perl sources. It's part of Chapter 11, which covers the file test operators. Notice the N<> sequence which denotes a footnote.

	# $Id: ch11.pod 108 2005-04-04 21:31:46Z brian $
	
	=pod
	
	=head0 File Tests
	
	Earlier, we showed how to open a filehandle for output. Normally, that
	will create a new file, wiping out any existing file with the same
	name.  Perhaps you want to check that there isn't a file by that name.
	Perhaps you need to know how old a given file is. Or perhaps you want
	to go through a list of files to find which ones are larger than a
	certain number of bytes and not accessed for a certain amount of time.
	Perl has a complete set of tests you can use to find out information
	about files.
	
	=head1 File Test Operators
	
	The third example is more complex. Here, let's say that disk space is
	filling up and rather than buy more disks, we've decided to move any
	large, useless files to the backup tapes. So let's go through our list
	of filesN<It's more likely that, instead of having the list of files
	in an array, as our example shows, you'll read it directly from the
	filesystem using a glob or directory handle, as we show in Chapter 12.
	Since you haven't seen that yet, we'll just start with the list and go
	from there.> to see which of them are larger than 100 K. But even if a
	file is large, we shouldn't move it to the backup tapes unless it
	hasn't been accessed in the last 90 days (so we know that it's not
	used too often):N<There's a way to make this example more efficient,
	as you'll see by the end of the chapter.>
	
	=cut

I want to translate this to something else. If I want to translate it to HTML, like I did when I wanted to provide the reviewers with something a bit easier to read, most of my work is already done. I use Pod::PseudoPod::HTML, set a few options, and tell it where to send the output.

	#!/usr/bin/perl
	use strict;
	
	use Pod::PseudoPod::HTML;
	
	foreach my $file ( @ARGV )
		{
		my $parser = Pod::PseudoPod::HTML->new();
	
		$parser->no_errata_section(1); # don't put errors in doc output
		$parser->complain_stderr(1);   # output errors on STDERR instead
	
		unless( -e $file )
			{
			warn "Unable to open '$file': $!\n";
			next;
			}
	
		$parser->output_fh( *STDOUT );
		$parser->parse_file( $file );
		}

Using the basic script, I get some simple HTML. It's nothing really fancy, but it gets the job done. Notice that there isn't any HTML <HEAD> section or opening <BODY> tag, the footnotes are actually inline with the body text, and there is nothing at the end (there are Pod::PseudoPod::HTML options to fix this, but I'm going to change all that so I'll skip talking about those).

	<h1>File Tests</h1>
	
	<p>Earlier, we showed how to open a filehandle for output. Normally, that
	will create a new file, wiping out any existing file with the same name.
	Perhaps you want to check that there isn't a file by that name. Perhaps you
	need to know how old a given file is. Or perhaps you want to go through a
	list of files to find which ones are larger than a certain number of bytes
	and not accessed for a certain amount of time. Perl has a complete set of
	tests you can use to find out information about files.</p>
	
	<h2>File Test Operators</h2>
		
	<p>The third example is more complex. Here, let's say that disk space is
	filling up and rather than buy more disks, we've decided to move any large,
	useless files to the backup tapes. So let's go through our list of files
	(footnote: It's more likely that, instead of having the list of files in an
	array, as our example shows, you'll read it directly from the filesystem
	using a glob or directory handle, as we show in Chapter 12. Since you
	haven't seen that yet, we'll just start with the list and go from there.)
	to see which of them are larger than 100 K. But even if a file is large, we
	shouldn't move it to the backup tapes unless it hasn't been accessed in the
	last 90 days (so we know that it's not used too often): (footnote: There's
	a way to make this example more efficient, as you'll see by the end of the
	chapter.)</p>

Now I want to change the output. I don't want the stuff that Pod::PseudoPod::HTML gives me, so I need to override some of its behavior. First, I want to change the header and the footer. I'll create my own subclass, Pod::PseudoPod::MyHTML that does this. My class will inherit from Pod::PseudoPod::HTML and replace just the bits that I want. Anything I don't replace in my new subclass still does it the Pod::PseudoPod::HTML way.

The beginning and ending portions of Pod::PseudoPod::HTML's output are decided by the two methods start_Document and end_Document. It's using the event-like processing where each event has its own method to handle it, and defining a new document as a sort of event. I pulled the method sources directly from Pod::PseudoPod::HTML 0.12. Each method adds text to a scratchpad called 'scratch,' then sends it to the output channel by calling emit(), which also clears the scratchpad.

	# Pod::PseudoPod::HTML
	sub start_Document { 
	  my ($self) = @_;
	  if ($self->{'body_tags'}) {
		$self->{'scratch'} .= "<html>\n<body>";
		$self->{'scratch'} .= "\n<link rel='stylesheet' href='style.css' 
                              type='text/css'>" if $self->{'css_tags'}; 
		$self->emit('nowrap');
	  }
	}
	
	sub end_Document   { 
	  my ($self) = @_;
	  if ($self->{'body_tags'}) {
		$self->{'scratch'} .= "</body>\n</html>";
		$self->emit('nowrap');
	  }
	}

I'll change start_Document to output something that's a bit better. I'll include a document type declaration, a proper <HEAD> section, and some other goodies.

	# Pod::PseudoPod::MyHTML
	sub start_Document 
		{ 
		my ($self) = @_;
		
		$self->{'scratch'} .= <<"HTML";
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN">
	<html>
	
	<head>
	<title>This is a page</title>
	</head>
	
	<body>
	HTML
			
		$self->emit('nowrap');
		}

I'll change end_Document so I can add a "last modified" and copyright statement to the end of each page.

	# Pod::PseudoPod::MyHTML
	sub end_Document   
		{ 
		my ($self) = @_;
			
		$self->{scratch} .= "<hr />\n";
	
		$self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n";
		$self->{scratch} .= "Copyright (c) brian d foy\n";
	
		$self->{scratch} .= "</body></html>\n";
		
		$self->emit('nowrap');
		}

Good enough. I changed a couple of short methods to do what I wanted, and I get different output. I can do the same for the other pod parts I run into. For instance, by default, Pod::PseudoPod::HTML turns the =head directives into the equivalent HTML header tags. Text after =head1 is wrapped by an <H2> tag in the output (and yes, that's off by one since an HTML document should only have one <H1> tag, but Pod usually has many =head1 directives, so everything moves down a level for the HTML). The start_head1() method builds up text in the scratchpad, and that text sticks around until the end of the =head1 event when end_head1() emits the text.

	# Pod::PseudoPod::HTML
	sub start_head1 { $_[0]{'scratch'} = '<h2>' }

	sub end_head1   { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() }

Instead of a simple <H2> tag, I want to add some stylesheet information. It doesn't really matter how I add the style information: if you don't like my way you now know how to make it happen your way.

	# Pod::PseudoPod::MyHTML
	sub start_head1 { $_[0]{'scratch'} = '<h2 class="main_header">' }
	
	sub end_head1   { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() }

That works for the pod directives, but what about the escape sequences like E<>, L<>, and so on? The PseudoPod format defines a new N<> escape sequence for footnotes. When I encounter that sequence, the start_N() method gets control. When I finish it, the end_N() sequence takes over. It doesn't emit the text because we're in the middle of handling text. Somebody else will decide what to do. The default action of Pod::PseudoPod::HTML is to simply put the footnotes inline with the text.

	# Pod::PseudoPod::HTML
	sub start_N {
	  my ($self) = @_;
	  $self->{'scratch'} .= '<font class="footnote">' if 
                             ($self->{'css_tags'});
	  $self->{'scratch'} .= ' (footnote: '; 
	}
	sub end_N {
	  my ($self) = @_;
	  $self->{'scratch'} .= ')'; 
	  $self->{'scratch'} .= '</font>' if $self->{'css_tags'};
	}

I want to put the footnotes at the end of the text. I already know how to change the ending of the document, so I know I can handle the footnotes there. To make them show up at the end of the page, I need to store them until I am ready for them. I'll create a new object data member (which I probably shouldn't be looking at since it breaks encapsulation) to hold the footnotes. I'll initialize this accumulator in the constructor and add a method to add the footnotes.

	# Pod::PseudoPod::MyHTML
	sub new {
	  my $self = shift;
	  my $new = $self->SUPER::new();

	  $new->{'footnotes'} = [];
	  
	  return $new;
	}

	sub push_footnote { 
		my $self = shift;
		
		push @{ $self->{footnotes} }, @_;
		}

Once I have that, I modify the start_N() and end_N() to set a flag telling me that I'm in the middle of a footnote. I'll insert a footnote link into the scratchpad in start_N(). It gets a bit tricky here since handle_text() is going to get control inside the N<> sequence, but I need to handle the paragraph and footnote text separately. I'll have to look at the flag for the footnote and use that in handle_text() to decide what to do, and do that without interfering with the normal paragraph processing. I'll leave the paragraph handling as is and build up the footnotes in a separate scratchpad. When I end the N<> sequence, I'll push the footnote onto my stack for later and clear the footnote flag and scratchpad.

	sub start_N 
		{
		my ($self) = @_;
	
		$_[0]{'footnote_flag'} = 1;
		my $fn = ++$_[0]{'footnote_count'};
	  
		$_[0]{'scratch'} .= qq|<sup><a 
                        href="#f$fn">[$fn]</a></sup>|; 
		}
	
	sub end_N 
		{
		my ($self) = @_;
	
		$self->push_footnote();
	
		$_[0]{'footnote_flag'} = 0;
		$_[0]{'footnote_text'} = '';
		}
		
	sub handle_text 
		{
		my $scratch = $_[0]{footnote_flag} ? 'footnote_text' : 'scratch';
		
		$_[0]{$scratch} .= $_[0]{'in_verbatim'} ? encode_entities( $_[1] ) : $_[1]
		}

Finally, when I'm at the end of the document and ready to print footnotes, I use format_footnotes() to format the data I saved in the footnote stack. I modify my end_Document() method to call my footnote formatter and wrap some text around it.

	sub format_footnotes 
		{
		$_[0]{'scratch'} .= "<h2>Footnotes</h2>\n\n<ol>\n";
			
		my $fn = 0;
	
		foreach my $footnote ( @{ $_[0]{'footnotes'} } )
			{
			$fn++;		
			$_[0]{'scratch'} .= qq|\t<li><a name="f$fn">$footnote</a></li>\n|;
			}
			
		$_[0]{'scratch'} .= "</ol>\n\n";
		}
	
	sub end_Document   
		{ 
		my ($self) = @_;
	
		$self->{scratch} .= "<hr />\n";
		
		$self->add_footnotes;
	
		$self->{scratch} .= "<hr />\n";
	
		$self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n";
		$self->{scratch} .= "Copyright (c) brian d foy\n";
	
		$self->{scratch} .= "</body></html>\n";
		
		$self->emit('nowrap');
		}

Putting all of that together gives me my little Pod::PseudoPod::MyHTML module.

	package Pod::PseudoPod::MyHTML;
	use strict;
	
	use base 'Pod::PseudoPod::HTML';
	
	use HTML::Entities qw(encode_entities);
	
	sub new 
		{
		my $self = shift;
		my $new = $self->SUPER::new();
	
		$new->{'footnotes'} = [];
	  
		return $new;
		}
	
	sub push_footnote { 
		my $self = shift;
		
		push @{ $self->{'footnotes'} }, $self->{'footnote_text'};
		}
	
	sub format_footnotes 
		{
		$_[0]{'scratch'} .= "<h2>Footnotes</h2>\n\n<ol>\n";
			
		my $fn = 0;
	
		require Data::Dumper;
		#$_[0]{'scratch'} .= Data::Dumper::Dumper( $_[0] );
		
		foreach my $footnote ( @{ $_[0]{'footnotes'} } )
			{
			$fn++;		
			$_[0]{'scratch'} .= qq|\t<li><a 
                          name="f$fn">$footnote</a></li>\n|;
			}
			
		$_[0]{'scratch'} .= "</ol>\n\n";
		}
		
	sub start_Document 
		{ 
		my ($self) = @_;
		
		$self->{'scratch'} .= <<"HTML";
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN">
	<html>
	
	<head>
	<title>This is a page</title>
	</head>
	
	<body>
	HTML
			
		$self->emit('nowrap');
		}
	
	sub end_Document   
		{ 
		my ($self) = @_;
	
		$self->{scratch} .= "<hr />\n";
		
		$self->format_footnotes;
	
		$self->{scratch} .= "<hr />\n";
	
		$self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n";
		$self->{scratch} .= "Copyright (c) brian d foy\n";
	
		$self->{scratch} .= "</body></html>\n";
		
		$self->emit('nowrap');
		}
	
	sub start_head1 { $_[0]{'scratch'} = '<h2 class="main_header">' }
	
	sub end_head1   { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() }
	
	sub start_N 
		{
		my ($self) = @_;
	
		$_[0]{'footnote_flag'} = 1;
		my $fn = ++$_[0]{'footnote_count'};
	  
		$_[0]{'scratch'} .= qq|<sup><a href="#f$fn">[$fn]</a></sup>|; 
		}
	
	sub end_N 
		{
		my ($self) = @_;
	
		$self->push_footnote();
	
		$_[0]{'footnote_flag'} = 0;
		$_[0]{'footnote_text'} = '';
		}
	
	sub handle_text 
		{
		my $scratch = $_[0]{footnote_flag} ? 'footnote_text' : 'scratch';
		
		$_[0]{$scratch} .= $_[0]{'in_verbatim'} ? encode_entities( $_[1] ) : $_[1]
		}
	
	1;

And here's my little script that uses my module. The script is almost identical to the one I showed you before save the different parser module name. All of the good stuff is in the module Pod::PseudoPod::MyHTML.

	#!/usr/bin/perl
	use strict;
	
	require "MyHTML.pm";
	
	my $parser = Pod::PseudoPod::MyHTML->new();
	
	foreach my $file ( @ARGV )
		{
		$parser->no_errata_section(1);
		$parser->complain_stderr(1);
	
		unless( -e $file )
			{
			warn "Unable to open $file: $!\n";
			next;
			}
	
		$parser->output_fh( *STDOUT );
	
		$parser->parse_file( $file );
		}

The rest is the same. For whatever you want to do, simply take the appropriate method and make it happen. It's SMOP (a Simple Matter of Programming). All of the hard work is already done for you by Pod::Simple and some of its subclasses. Now that you've made it this far in the article, you should be able to modify a Pod::Simple subclass to do just about anything you want. Good luck!

PS: In my last article I mentioned a one argument form of open(). Did anyone bother to find out what it was? With an explicit filename to open, open() looks in the scalar package variable with the same name as the filehandle (meaning you have to name the filehandle and not use a variable). For instance, if you say open FILE;, perl will look in the scalar variable $FILE for the filename. Remember all that one-liner magic for reading from files? The current file name shows up in $ARGV and perl uses the ARGV filehandle.

TPJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Web Development

Playing with Pod

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Web Development Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Web Development

Playing with Pod

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Web Development Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content