Channels ▼
RSS

Web Development

Creating RSS Files with XML::RSS


Dec02: Creating RSS Files with XML::RSS

Derek lives in New York City and works for azurance.com, an open source and security consulting firm that he cofounded. He is the author of the upcoming book, Managing RAID on Linux (O'Reilly and Associates, 2003), and can be contacted at derek@azurance.com.


In my article, "Parsing RSS Files with XML::RSS," (TPJ, Fall 2002), I covered using XML::RSS to locate, parse, and reuse dynamic content found on the World Wide Web. But what if you want to provide original material for others? XML::RSS can also be used to generate a properly formatted RSS file. You can create feeds on the fly using a database back end, or generate a static RSS file that gets updated at regular intervals.

The <channel> Element

RSS files are composed of various elements that describe a channel (or feed) and its dynamic content. Each item contained within a channel should contain a <title> and <link> element and may also contain the optional <description> element. Likewise, the <channel> element, which stores metadata about the channel, has its own <title>, <link>, and <description> subelements. In addition, the <channel> structure also contains an <items> subelement that provides a table of contents for the RSS document. So a bare-bones RSS <channel> element might look something like Example 1.

In the aforementioned example, our RSS feed contains three items (typical feeds contain about 10 items) and they are indexed by their URL, or in RSS-speak, they are indexed by each item's <link> element. Begin creating your RSS file by initializing a new RSS object and using the channel() method to populate the required items within the <channel> element, as in Example 2.

Only a title, link, and description are required, but other elements are also available. For example, <image> and <textinput> may be used to provide links to a site logo or newsletter subscription form. In addition to these attributes, several modules that extend the base RSS schema are available. These extensions provide elements that can include metadata about a site's topic, authors, and update frequency, and are categorized as modules that are part of the RSS specification. The Dublin Core module (http://web.resource.org/rss/1.0/modules/dc/), for example, includes provisions for information about copyright, publisher, publication date, and language. The Syndication module (http://web .resource.org/rss/1.0/modules/syndication/) provides elements that describe how often a feed is updated. I'll cover a few elements from each module. The specification for each module contains a comprehensive list of options. A complete list of modules is available from http://web.resource.org/rss/1.0/.

Use second-level hashes to compartmentalize RSS module metadata. In Listing 1, I have added a few elements from the Dublin Core and Syndication modules to my channel element.

The Dublin Core elements that I have added are straightforward, but the Syndication elements require a short explanation. The <syn:updatePeriod> specifies a time interval in which to measure the number of updates. In this case, like many RSS feeds, I have chosen one hour. Possible choices are hourly, daily, weekly, monthly, and yearly. The <syn:updateFrequency> specifies how many times the feed is updated during each period. So in this example, the feed is updated four times per hour, or every 15 minutes. The <syn:updateBase>, though it looks a bit confusing, simply represents the first time the feed was published. In this case, November 5, 1999 at 9:00am Eastern Standard Time. This information, combined with the update frequency and update period, allows users and applications to determine a publishing schedule.

Some module extensions may be applied to individual items in addition to the <channel> element. For example, specifying a <dc:creator> for each item is useful for sites that have articles written by more than one author. Add a second-level hash to each item for the modules and subelements that you want to include. Just follow the same examples I used for the <channel> element.

Adding Items

Next, I'll add some <item> elements using the add_item() method (see Listing 2).

In compliance with the RSS specification, the add_item() method requires a title and link, but may also include an optional description. Notice how each item corresponds to an entry in the <rdf:Seq> metadata from my previous examples. That information is automatically generated by XML::RSS as items are added to the RSS object.

Combining DBI and XML::RSS

At this point, it's probably obvious that generating an RSS file using Perl is not much easier than creating one by hand using a text editor. Therefore, creating a reusable program that can automatically generate and update your RSS feed is desirable. While you could use nearly anything on your system as the data source (like text files, DB files, an LDAP server, or a combination of sources), using a back-end SQL database is a popular choice. My rss_items function queries a SQL back end (MySQL in my case) and calls XML::RSS's add_item() method to populate the RSS object (see Listing 3).

rss_items takes a positive integer as input. This number is used to determine how many entries are added to the RSS object. Since I want to extract rows from the end of the table, I count the number of rows in the table and use this number to generate an offset (lines 20-23) for the first row that I want to return. The LIMIT portion of my second query (line 24) uses that offset and returns rows between that number ($offset) and the end of the table (-1). Depending on which back-end RDBMS you are using and what your schema looks like, you might need to perform some different steps to achieve this effect.

Finally, the while loop (line 27) iterates through each row of data returned, and calls the XML::RSS add_item() method using the title and link that was returned from my database.

Generating the File

Now I can call the as_string() or save() functions to output the data to standard out or to a file. For example:

 1 print $rss->as_string;
 2 save("/var/www/azurance.rss");

Calling either as_string() or save() results in the output shown in Listing 4.

Checking Your Work

After you're done creating a feed, you might want to check whether it complies with current RSS standards. Mark Pilgrim and Sam Ruby have made available an RSS validator (http://feeds .archive.org/validator/check). Quite an invaluable tool, the validator allows you to enter the URL of an RSS file to be checked for errors.

TPJ

Listing 1

<channel rdf:about="http://www.azurance.com">
 <title>azurance.com</title>
 <link>http://www.azurance.com</link>
 <description>Open Source and Security Consulting</description>
 <dc:language>en-us</dc:language>
 <dc:rights>Copyright &amp;copy; 1999-2002, Azurance.com</dc:rights>
 <dc:publisher>Azurance</dc:publisher>
 <dc:creator>derek@azurance.com</dc:creator>
 <dc:subject>Open Source, Security</dc:subject>
 <syn:updatePeriod>hourly</syn:updatePeriod>
 <syn:updateFrequency>4</syn:updateFrequency>
 <syn:updateBase>1999-11-05T09:00:00-05:00</syn:updateBase>
 <items>
  <rdf:Seq>
   <rdf:li rdf:resource="http://www.theregister.co.uk/content/55/27734.html" />
   <rdf:li rdf:resource="http://www.vnunet.com/News/1136204"/>
   <rdf:li rdf:resource="http://www.internetnews.com/infra/article.php/1486121"/>
  </rdf:Seq>
 </items>
</channel>

Back to Article

Listing 2

1  $rss->add_item(
2
3     title    => "Baltimore launches Trusted Business apps",
4     link     => "http://www.theregister.co.uk/content/55/27734.html"
5  );
6
7  $rss->add_item(
8
9     title    => "FBI investigates major web slowdown",
10    link     => "http://www.vnunet.com/News/1136204"
11 );
12
13  $rss->add_item(
14
15    title	=> "Cisco Boosts Security, Caters To Small Business",
16    link	=> "http://www.internetnews.com/infra/article.php/1486121"
17 );

Back to Article

Listing 3

1    sub rss_items {
2
3    use DBI;
4
5    my $itemCount = shift @_;
6    my ($dsn, $dbh, $sth, $rv, @row);
7
8    my $driver 	= "mysql";
9    my $database 	= "rss_news";
10   my $hostname	= "localhost";
11   my $port		= "3306";
12   my $user		= "username";
13   my $pw		= "password";
14   my $table		= "news";
15
16   $dsn = "DBI:$driver:database=$database;host=$hostname;port=$port";
17   $dbh = DBI->connect($dsn, $user, $pw);
18   $dbh->{PrintError} = 1; # turn off errors, we'll deal with it 		      
                                # ourselves
19
20   $sth = $dbh->prepare("SELECT COUNT(*) FROM news");
21   $rv = $sth->execute;
22   @count = $sth->fetchrow_array;
23   $offset = $count[0] - $itemCount;
24   $sth = $dbh->prepare("SELECT title, link FROM news LIMIT $offset, - 1");
25   $rv = $sth->execute;
26
27   while (@row = $sth->fetchrow_array) {
28
29       my ($title, $link) = @row;
31       $rss->add_item(
32
33           title   => "$title",
34           link    => "$link"
35       );
36   }
37 }

Back to Article

Listing 4

<?xml version="1.0" encoding="UTF-8"?>
<
rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns="http://purl.org/rss/1.0/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/"
 xmlns:syn="http://purl.org/rss/1.0/modules/syndication/">


 <channel rdf:about="http://www.azurance.com">
  <title>azurance.com</title>
  <link>http://www.azurance.com</link>
  <description>Open Source and Security Consulting</description>
  <dc:language>en-us</dc:language>
  <dc:rights>Copyright &amp;copy; 1999-2002, Azurance.com</dc:rights>
  <dc:publisher>Azurance</dc:publisher>
  <dc:creator>derek@azurance.com</dc:creator>
  <dc:subject>Open Source, Security</dc:subject>
  <syn:updatePeriod>hourly</syn:updatePeriod>
  <syn:updateFrequency>4</syn:updateFrequency>
  <syn:updateBase>1999-11-05T09:00:00-05:00</syn:updateBase>
  <items>
   <rdf:Seq>
    <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/xml/02/10/24/
       021024hnnpcwest.xml?s=IDGNS" />
    <rdf:li rdf:resource="http://www.idg.net/ic_959380_1794_9-10000.html" />
    <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/
       xml/02/10/25/021025hnsecurelinux.xml?s=IDGNS" />
    <rdf:li rdf:resource="http://www.cnn.com/2002/TECH/internet/
        10/23/net.attack/index.html" />
    <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/
     	xml/02/10/23/021023hnopteron.xml?s=IDGNS" />
    <rdf:li rdf:resource="http://www.businessweek.com/technology/
        cnet/stories/963054.htm" />
    <rdf:li rdf:resource="http://www.itweb.co.za/sections/
        internet/2002/0210240947.asp?A=HOME&O=FPIN" />
    <rdf:li rdf:resource="http://www.internetwk.com/
        security02/INW20021023S0001" />
    <rdf:li rdf:resource="http://www.pcw.co.uk/News/1136211" />
    <rdf:li rdf:resource="http://zdnet.com.com/2100-1105-963087.html" />
   </rdf:Seq>
  </items>
 </channel>

 <item rdf:about="http://www.infoworld.com/articles/hn/xml/
    02/10/24/021024hnnpcwest.xml?s=IDGNS">
  <title>Network chip makers focus on security</title>
  <link>http://www.infoworld.com/articles/hn/xml/02/10/
     24/021024hnnpcwest.xml?s=IDGNS</link>
 </item>

 <item rdf:about="http://www.idg.net/ic_959380_1794_9-10000.html">
  <title>'The Golden Age of Hacking rolls on'</title>
  <link>http://www.idg.net/ic_959380_1794_9-10000.html</link>
 </item>

 <item rdf:about="http://www.infoworld.com/articles/hn/xml/
   02/10/25/021025hnsecurelinux.xml?s=IDGNS">
  <title>Secure Linux maker teams with IBM in U.S.</title>
  <link>http://www.infoworld.com/articles/hn/xml/02/10/25/
    021025hnsecurelinux.xml?s=IDGNS</link>
 </item>
 
 <item rdf:about="http://www.cnn.com/2002/TECH/internet/10/23/
   net.attack/index.html">
  <title>FBI seeks to trace massive Net attack</title>
  <link>http://www.cnn.com/2002/TECH/internet/10/23/
    net.attack/index.html</link>
 </item>
 
 <item rdf:about="http://www.infoworld.com/articles/hn/xml/
   02/10/23/021023hnopteron.xml?s=IDGNS">
  <title>RSA, AMD team up on security for Opteron chips</title>
  <link>http://www.infoworld.com/articles/hn/xml/
    02/10/23/021023hnopteron.xml?s=IDGNS</link>
 </item>
 
 <item rdf:about="http://www.businessweek.com/technology/
   cnet/stories/963054.htm">
  <title>Encryption method getting the picture</title>
  <link>http://www.businessweek.com/technology/cnet/
    stories/963054.htm</link>
 </item>
 
 <item rdf:about="http://www.itweb.co.za/sections/
   internet/2002/0210240947.asp?A=HOME&O=FPIN">
  <title>Internet banking security revolutionised
    with SMS-based cross-checking</title>
  <link>http://www.itweb.co.za/sections/internet/2002/
    0210240947.asp?A=HOME&O=FPIN</link>
 </item>
 
 <item rdf:about="http://www.internetwk.com/security02/INW20021023S0001">
  <title>Vendor Warns Of New IE Holes; Microsoft Calls Reports
    Irresponsible</title>
  <link>http://www.internetwk.com/security02/INW20021023S0001</link>
 </item>
 
 <item rdf:about="http://www.pcw.co.uk/News/1136211">
  <title>PGP poised for major comeback</title>
  <link>http://www.pcw.co.uk/News/1136211</link>
 </item>
 
 <item rdf:about="http://zdnet.com.com/2100-1105-963087.html">
  <title>P2P hacking bill may be rewritten</title>
  <link>http://zdnet.com.com/2100-1105-963087.html</link>
 </item>
 <
/rdf:RDF>


Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV