Dr. Dobb's | Shelling the Pod | December 12, 2006

Shelling the Pod

David uses PHP to dynamically create web pages that describe podcasts—and specific episodes of the podcast.

December 12, 2006
URL:http://www.drdobbs.com/web-development/shelling-the-pod/196603517

David works as a musician, writer, and software engineer. You can contact him at www.summersong.net.

It used to be that aspiring stars had to rely on a long line of producers, directors, and the like to achieve their ambitions. With advances in technology, ways of gaining 15 minutes of fame are proliferating like well-fed rabbits. Companies and individuals with products to sell are also looking for new ways to communicate with their current and potential customers. On the flip side, those of us seeking entertainment, plain old information, or a combination of both can satisfy our desires in more different ways than there are programming languages. One of the most effective ways of communicating is "podcasting."

Podcasting is an unfortunate term for this phenomenon, since it has nothing specific to do with Apple's iPods. I do use iTunes on a PC, but even that is not required. There are many different podcast clients available, and Yahoo has an online mechanism for listening and subscribing to podcasts (podcasts.yahoo.com).

Podcasters present their content via RSS feeds—XML files that describe the podcast, as well as individual episodes of podcasts. At this writing, the current version is RSS 2.0. Its specification is maintained by the Berkman Center for Internet & Society at Harvard Law School (blogs.law.harvard.edu/tech/rss).

With the popularity of podcasting comes the inevitable problem of separating the wheat from the chaff, or of simply finding podcasts that meet your interests. From the podcaster's point of view, the challenge is to make their offering stand out. In this article, I use PHP to dynamically create a web page that describes podcasts, and a separate page that describes any particular episode of the podcast. Because these web pages get their information from the RSS feed, the pages will automatically update when the feed is updated with changes or additions of new episodes.

The code listings include the RssInfo class that retrieves all the required information from an RSS feed, and two HTML/PHP pages that demonstrate how to use RssInfo. The first web page describes the podcast itself and displays a list of the individual episodes in the form of web links. When a link is selected, users are taken to the second page, which describes the selected episode.

Feeeeed Meee

RSS feeds are XML files that conform to the RSS specification. I think of RSS feeds as if they're describing a TV channel. The feed provides information about the channel itself, as well as information about individual programs that are available on the channel.

This way of thinking about the RSS feed is facilitated by the CHANNEL tag, which encompasses all the other tags in the file. Carrying the analogy further, each program on your TV channel is described via the ITEM tag. In podcasting terms, the ITEM tag encompasses all the information about a particular episode of the podcast.

The top-level RSS tag lets you specify XML namespaces as attributes. These namespaces are codified using XML Document Type Definition (DTD) files.

At this writing, there are two namespaces you almost certainly want to include in your feed—one from iTunes and the other from Yahoo. If you want to add your podcast to the Yahoo podcast directory, you must include the Media DTD as an attribute of the RSS tag, even if you are not using any of the tags from the DTD. Otherwise, Yahoo won't accept your feed for inclusion in its directory.

You can place your RSS feed anywhere you like on your site. When deciding on a location, keep in mind that the URL to your feed will not only be used by the podcast aggregators but may also be passed around by your growing legion of listeners, so you probably want to keep the feed URL as short as possible.

When you've uploaded your initial feed and after every update to the feed, it makes sense to validate the entire feed. Fortunately, this is made easy by an online RSS feed validator at feedvalidator.org. Simply type in the feed URL, and the validator lets you know if you're ready to roll.

All You Need Is Feed

PHP by default includes XML parser functions (see us2.php.net/manual/en/ref.xml.php) that let you create a parser and specify callback functions that are called when the parser reaches the start tags, data within a tag, and the end tags in the XML file.

To encapsulate the process of obtaining information from the RSS feed, I've created the RSSInfo class (Listing One). The member data consists of two arrays—one for CHANNEL data and one for ITEM data. The ITEM data will be a 2D array representing the 0 or more items that may be present in an RSS feed.

<?php
// Get RSS info using passed in feed URL.
class RSSInfo
{
  // Constructor, parse passed in XML file
  function RSSInfo($url)
  {
    $xml_parser = xml_parser_create();
   xml_set_element_handler($xml_parser, "StartTag", "EndTag");
   xml_set_character_data_handler($xml_parser, "TagData");
    $fp = fopen($url, "r") or 
               sprintf("Could not open XML input from file %s.", $url);
   while ($data = fread($fp, 4096)) 
   {
     xml_parse($xml_parser, $data, feof($fp)) or 
       die(sprintf("XML error: %s at line %d",
                   xml_error_string(xml_get_error_code($xml_parser)),
                   xml_get_current_line_number($xml_parser)));
  }
   xml_parser_free($xml_parser);
    global $channel;
    global $allItems;
    $this->channel = $channel;
    $this->allItems = $allItems;
  } // end constructor
  
  // Member data, to hold channel and item data.
  // item data will be a 2D array.
  var $channel = array();
  var $allItems = array();
}
// globals for call backs
$channel = array();
$allItems = array();
$currentItem = 0;
$inItem = 0;
$inImage = 0;
// XML parser call backs
// Start tag.
function StartTag($parser,$tagName,$tagAttributes)
{
  global $currentTag; 
  $currentTag = $tagName;
  global $inItem;
  global $inImage;
  if($tagName == 'ITEM')
    $inItem = 1;
  elseif($tagName == 'IMAGE')
    $inImage = 1;
  // Get the URL attribute of the enclosure tag.
  // This will be used for the link to the podcast audio file.
  if($inItem && $currentTag == 'ENCLOSURE')
  {
    global $allItems;

    global $currentItem;
    $allItems[$currentItem]['URL'] = $tagAttributes['URL'];
  }
}
// End tag.
function EndTag($parser,$tagName)
{
  global $allData;
  // remove any while space at the beginning and end (optional)
  $allData = trim($allData);
  // If we are leaving an ITEM tag, note that and bump up the item count.
  global $inItem;  
  global $inImage;
  global $currentItem;
  if($tagName == 'ITEM')
  {
    $inItem = 0;
    $currentItem++;
  }
  elseif($tagName == 'IMAGE')
    $inImage = 0;
  // We are not interested in any tags with no data.
  if($allData != '')
  {
    global $channel;
    global $allItems;
    global $currentTag; 
    
    if($inItem)
      $allItems[$currentItem][$currentTag] = $allData;
    else  // channel data
    {
      if($inImage)
        $currentTag = 'IMAGE:' . $currentTag;
      $channel[$currentTag] = $allData;
    }
  }
  $allData = '';
  $currentTag = '';
}
// Get the data between the start and end tags.
function TagData($parser,$data)
{
  global $currentTag;
  if($currentTag != '')
  {
    global $allData;
    $allData .= $data;
  }
}
?>

Listing One

The RSSInfo constructor takes a URL that is the RSS feed. Some of the code in this constructor is derived from the XML documentation available in the PHP manual. The constructor creates the parser, specifies the callback functions, parses the RSS feed, and frees the parser. After the RSS parsing is complete, I transfer the global variables to the class data members, channel, and item.

Where Am I? (or Call You Right Back)

Much of the code I have in the callback functions keeps track of where the parser is in the XML file when the function is called. This is so the RSS data can be stored in the correct place. By default, "case folding" is enabled in the XML parser functions. Therefore, any string compares are made using uppercase strings.

StartTag() is called when the parser reaches any start tag in the RSS file. For this project, I'm only concerned with whether the parser is in an ITEM. If it's not, it's parsing CHANNEL data. If, as you're parsing, you want to keep closer track of where you are in the XML hierarchy, you might use a stack. Push the current element in the start function and pop it in the end function.

I also want to know if I'm entering the CHANNEL's image tag, because I want to scope the data inside that tag as coming from within the image tag. Also, if the parser is in the ITEM's enclosure tag, I want to store the URL attribute of that tag as part of my item data.

EndTag() is called when the parser reaches any end, or closing, tag in XML file. Here is where I'm going to store the data retrieved from between the start and end tags in the correct global variables. I choose to trim any whitespace that may have been in the feed file. Then, if the parser has come to the end of an item, I note that I'm not currently in an item and bump up my item count. The item count is kept for indexing the 2D array of items. I also want to know if I've come to the end of an image tag.

Now, if any data from between the start/end tags has been retrieved, I store it in the proper place. It's either ITEM data or CHANNEL data. CHANNEL data is further scoped if it came from within the image tag. When storing the data, I take advantage of PHP's associative array feature. I use the tag name as the array key. Users of the class can then easily access any data just by knowing its tag name and whether it's CHANNEL or ITEM data they're looking for. The function ends after resetting the global variables to accept new input.

Last, the TagData() function simply appends to the global member that stores tag data. For some reason, that's not made clear in the PHP documentation; the callback specified in xml_set_character_data_handler() (in this case, TagData) is sometimes called more than once with parts of the character data. This seems to happen when there are linefeeds and special XML characters (like quotes) in the data. That's why I concatenate the data to the global.

Pod Plugging

Since I developed this idea while setting up my father's podcast pages, I'm going to use his podcast feed in my example. To demonstrate the application of the RssInfo class, I've included two "barebones" web pages (Listings Two and Three, available electronically; see "Resource Center," page 5). They're barebones in the sense that there are no graphics, other than the picture I get from the feed. I've done this so you can upload these pages to your web space along with the rss_info.php file (Listing One), substitute the feed URL with your own, and immediately see the results from your own RSS feed with no broken dependencies, such as missing image files.

I use the main podcast page (Listing Two, available electronically) as the index.htm page in the podcast directory of the site, <MySite>/podcast/index.htm. I use the ".htm" extension rather than ".php" in case I want to remove the PHP at some later time and other pages have already linked to my podcast page. (I add an ".htaccess" file to the podcast directory to tell the server to treat the ".htm" file as a ".php.")

The rss_info.php file also resides in <MySite>/podcast. I use the episode description page (Listing Three, available electronically) as the index.htm file in a subdirectory called "notes", <MySite>/podcast/notes/index.htm.

Figure 1 shows the output from the main podcast page in Listing Two. (I've removed some of the episodes in the podcast for space.) Looking at Listing Two, you'll see that I create an instance of the RSSInfo class, passing it the feed URL, in the HEAD section of the HTML file. (If you've uploaded the listings to your site, try substituting your own feed.) With my $rssInfo object now fully loaded, I place the title of the podcast in the caption bar of the browser.

[Click image to view at full size]

Figure 1: Main podcast page.

In the BODY section of the HTML file, I display the podcast title, author's name, podcast description, and the image. I then provide a link to subscribe to the podcast in iTunes. Apple has made this easy by letting you substitute the "http" in the feed URL with "itpc." I also include some instructions on how to deal with the browser's reaction when iTunes is attempting to launch. In addition, I provide a link to the feed itself that the user can input into the "Advanced | Subscribe to Podcast..." menu item in iTunes or use in another podcasting client.

Describing the different ways of obtaining podcast episodes to users who are probably somewhat Internet savvy but may know little or nothing about podcasting, presents a challenge for the web content provider. This is a subject that is probably worthy of an entire article.

Below the Subscription section (Figure 1) I have an Episode section that lists all the episodes in the podcast. I invite the user to click on an episode title for more information about that particular episode. This list of episodes runs from the latest to the earliest. But that is because that's the way I've entered them in my podcast feed. The other feeds I've seen do this the same way, but to be sure the list is ordered this way for any feed, you would have to implement some kind of sorting mechanism, probably based on the PUBDATE tag in each item.

The episodes in the list are linked to the podcast description page, with the GUID from the episode acting as a query string. In RSS land, a GUID is a string, which runs contrary to our software developer's COM-induced expectation. However, the GUID does act as a unique ID for the item. You can examine the RSS 2.0 spec to discern the exact role of the GUID, including the responsibilities of an aggregator in using the GUID tag, or just use the ITEM URL in the GUID tag and thank me later.

The episode description page is generated using the code in Listing Three (available electronically). Here, after creating an RSSInfo object in the HEAD of the HTML file as before, I use the query string to find the index of the particular episode in the array of items. If users have somehow come to this page without a query string, I default to the first episode in the list (in my case, the latest episode). I then put the episode title as well as the podcast title in the browser caption bar.

Browsers such as Firefox that support Live Bookmarks use the ITEM's TITLE tag to display the items and the ITEM's LINK tag as a destination when the user clicks on an ITEM title. Some podcasters use the ITEM's LINK tag to link to a place on their website. I use it to link to an MP3 stream file.

Putting the audio stream file (an ".m3u" file) in the LINK tag not only lets fans who have bookmarked the podcast be one click away from listening to any episode, but also lets me easily implement a Listen button on the website for any episode.

In the BODY section of the episode description page, I show the podcast title and author. When displaying the episode title, I link the title to the stream file simply to provide another way for the user to hear the episode. Below that is the episode subtitle and description. Then I provide explicit Listen and Download links.

To finish out the episode description page, I give users a way to return to the main podcast page and also display the date the episode was released (leaving off the time) as well as the podcast copyright information. If you'd like to see a sample of this podcast display mechanism in a "production" setting, go to www.dicksummer.net/podcast.

That's a Wrap

As podcasting continues its gain in popularity, your company will likely be interested in making use of this flexible and efficient means of communication. You may even want to start your own podcast. The information I've presented here should start you on your way toward letting the world know about your new podcast.