Dr. Dobb's | Automating Release Notifications

Automating Release Notifications

Scott presents an automated approach for notifying users about updates to libraries and frameworks.

February 13, 2008
URL:http://www.drdobbs.com/jvm/automating-release-notifications/206503286

Scott works for Secure Passage LLC as a Senior Software Engineer. He can be contacted at [email protected].

With the proliferation of open-source libraries and frameworks, it is easy to find solutions to common programming needs. However, given the rapid release rate of many open-source projects, it can be difficult to ensure that applications always ship with the latest libraries. There are many ways to receive notifications of library updates, ranging from subscribing to e-mail distribution lists or RSS feeds, to periodically visiting project websites or blogs. But all of these activities require "manual information processing." In this article, I present an automated approach to notification.

While many applications have built-in online update capabilities, the drawback is that users aren't notified of updates until they run the application! Running applications periodically to merely check for updates can be time consuming, especially if you wish to keep all of your installed applications updated.

Luckily, there are tools that can help automate the update notification and installation process. Like the ubiquitous Windows Update, these tools scan your system to detect out-of-date software and facilitate the downloading and installation of needed updates. These tools include:

Version Tracker (www.versiontracker.com)
Google Pack's Google Updater (pack.google.com)
FileHippo's Update Checker (filehippo.com/updatechecker/)

But what if the software you wish to track isn't monitored by one of these tools; it might be an in-house application, a single-purpose domain-specific library, or the like? What if the software you wish to track doesn't have an auto update feature? You're back where you started—manually checking websites, blogs, and e-mails.

There are websites and utilities that monitor web pages for changes. A quick Google search for "monitor website changes" shows many promising options. However, you still may not necessarily have control of what part of a page is monitored and how frequently this page is monitored. Another potential problem is monitoring pages that aren't publicly available (intranet websites, for instance).

It has been said that good programmers are lazy programmers. In other words, good programmers work hard to automate tasks. Such is the case with checking websites for changes. After frequently checking multiple websites daily for new releases of software I use in my projects, I decided to automate.

Command-Line Example

I initially began the automation process at the command line. A variety of tools can be used to retrieve web pages from the command line, including:

Once you retrieve a page, you can parse its content using the ever-useful combination of sed/awk/grep and regular expressions.

So suppose you want to be notified when a new version of Microsoft's Process Explorer is released. Figure 1 is the web page for Process Explorer showing the version information you wish to monitor (v11.04) (www.microsoft.com/technet/sysinternals/utilities/processexplorer.mspx). The following one-line script determines the current version of Microsoft's Process Explorer. I ran this from a cygwin (www.cygwin.com) bash session:


wget -qO- http://www.microsoft.com/technet/
sysinternals/utilities/processexplorer.mspx | sed
's/<[^>]*>//g' - | grep -o -m1 -P "v[\d\.]{4,}"

[Click image to view at full size]

Figure 1: Web page for Process Explorer.

The output (at this writing) is as expected: v11.04. Note that:

The wget -qO- option causes the retrieved page to be sent to stdout instead of a file.
The sed 's/<[^>]*>//g' command deletes HTML tags (any text beginning with "<" and ending with ">"). Because you're after the page content, you dutifully ignore HTML tags.
The grep -o -m1 -P "v[\d\.]{4,}" command outputs the first match of the given Perl regular expression. You look for the "v" character followed by four or more digits or "." characters.

The script relies on a common editorial pattern: The most current information should appear first, hence the ability to simply use the first match. If the page listed multiple releases in chronologically ascending order, then additional processing would be required to grab the last match instead of the first match.

The command-line approach works well and can be readily incorporated into other scripts to provide additional functionality (automatic downloading, sending e-mail notifications, and so on). However, there are inherent limitations to this command-line approach:

Checking and processing multiple websites must be done sequentially.
A nonresponsive website can slow down the entire process if you check multiple websites sequentially.
Reviewing, sorting, and filtering results requires additional work.
Skipping a specific website from a set of websites to monitor is problematic.

UI Example

With all this in mind, I created a graphical Java application to overcome the limitations I encountered when doing command-line update checking. This allows for checking/parsing multiple websites concurrently, easy sorting and filtering of results, and the like. It lets me add, edit, and remove websites and their associated matching rules easily. For example, I can validate regular expression syntax when the regular expression is added instead of troubleshooting garbled results caused by an invalid regular expression at runtime. I also get a side-by-side view of the previous match with the current match. The tool, App2Date (www.app2date.com), is free for personal use and by nonprofit organizations. App2Date uses the same techniques as the aforementioned command-line approach: It retrieves web pages and applies a series of regular expressions to determine the relevant data (version number, release date, and the like). Figure 2 is the tool, while Figure 3 shows the rules used to determine the current version of the Process Explorer.

[Click image to view at full size]

Figure 2: App2Date tool.

[Click image to view at full size]

Figure 3: Rules used to determine current version of the Process Explorer.

Java Tools for Web Page Retrieval and Text Extraction

The Jakarta Commons HTTPClient class can be used to efficiently retrieve web pages. Listing One is code to get web page content. Java has built-in support for parsing HTML text. The HTMLEditorKit.ParserCallback class allows for text extraction. The Jericho HTML Parser (jerichohtml.sourceforge.net/doc/index.html) is another option. Regular expression parsing can be done using Java's java.util.regex.Matcher and Pattern classes. If full Perl 5 syntax support is required, the Jakarta ORO classes can be used. Multithreading to retrieve multiple website pages concurrently is greatly simplified by the java.util.concurrent package and HTTPClient's MultiThreadedHttpConnectionManager class. Java 6 introduced additional features that aid in the UI implementation: results are stored using Java 6's built-in JAXB support. The TableRowSorter and RowFilter classes are used for sorting and filtering the table of results.

Last of all, the Substance Look and Feel (https://substance.dev.java.net) is used to give the application a polished appearance.

public static String getContent(HttpClient httpClient, String url)
   {
   logger.info("getting content from url: " + url);
   GetMethod get = new GetMethod(url);
   get.setFollowRedirects(true);
   // impersonate browser
   get.setRequestHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows
NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR
2.0.50727; .NET CLR 3.0.04506.30)");
   get.setRequestHeader("Accept", "image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, application/x-shockwave-flash, application/xaml+xml,
application/vnd.ms-xpsdocument, application/x-ms-xbap,  application/x-msapplication,
*/*");
   get.setRequestHeader("Accept-Language", "en-us");
   get.setRequestHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
   try
   {
      int statusCode = httpClient.executeMethod(get);
      logger.info("http status code: " + statusCode);
      InputStream is = get.getResponseBodyAsStream();
      ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
      byte[] buff = new byte[2048];
      int bytesRead = 0;
      while ( (bytesRead = is.read(buff)) != -1)
      {
         baos.write(buff, 0, bytesRead);
      }
         is.close();
         baos.close();
         String text = baos.toString();
         logger.debug("raw html=" + text);
         return text;
      }
        catch (Exception ex)
     {
        logger.error(ex);
     }
     finally
     {
        get.releaseConnection();
     }
     return null;
}

Listing One