Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

XML and Pig Poop:
Agribusiness Online


February 2000: Features: XML and Pig Poop: Agribusiness Online

A homegrown engine for assessing the environmental and community impact of large livestock operations tests the promise of XML and distributed objects.

I’ll say it up front: I love my job. I’m one of two software engineers supporting a small team of scientists doing community outreach and research on agriculture and environmental issues. Working with the Agricultural Systems and Informatics Group (ASIG) is kind of like working for a start-up, except that our group is swaddled in a big research organization: the University of Wisconsin-Madison Department of Soil Science. And nobody second-guesses our technical decisions. Bliss!

Our outreach mission mainly involves decision support for growers and their communities. In some cases, we deliver raw information. For example, we publish daily maps derived from NASA satellite data which show sunlight intensity across the country. Other products run simulation models to forecast things like frost formation in cranberry bogs, or the likelihood of "late blight" in potatoes, and recommend how to get good yield with minimal environmental impact. In the past, we’ve used a variety of technologies: DOS and Windows applications written in Pascal or Paradox, automated e-mail, faxes and even a text-to-speech phone-response system (to reach the real low-tech diehards). Most of our work today, however, is delivered over the World Wide Web. The overhead of producing, delivering and maintaining custom applications devours resources far out of proportion to the results. With the World Wide Web, we have most of the same graphical and user-interface power available, but we can release as often as we like, and we never have to produce CD-ROMs. Also, our models primarily use data stored on our own servers, so it makes sense to run them locally and just put the user interface on the user’s machine.

Here’s the story of one of those projects. It’s got distributed objects. It’s got XML. It’s got design patterns. Plus, it’s got kilotons of pig poop. What more could you want?

Caution: Big Hogs Upwind

Bear with me for a little background.

In the past, university agricultural research primarily concerned technology and economics as seen from the grower’s standpoint. At ASIG, we’re trying to broaden that mission toward providing easily-accessible information to all stakeholders.

The past 10 years have seen a remarkable trend toward farm consolidation, especially livestock operations. When they get really big (I’m thinking 1,000 hogs, here), we call it a concentrated livestock operation or CLO. In Wisconsin, a new CLO usually means a political controversy. Often those for and against paint in broad strokes:

"Jobs for everybody!" the promoters crow.

"The stench will peel the paint off my house–not in my backyard, buddy."

Our purpose isn’t to wade into that fight. These are difficult socioeconomic trade-offs that the community has to resolve. On the other hand, at least some of the issues are amenable to scientific analysis. We aim to give growers, interested residents and scientists access to those numbers. Then we might see less heat and more light, and the debate has a hope of moving on to the hard questions.

Initially proposed in January 1998 and currently in alpha test, CELLO (Communities, Ecosystems and Large Livestock Operations) is designed to estimate a CLO’s net change in nitrogen and phosphorus (important determinants of ground- and surface-water quality), economic impact, infrastructure impact and odor prediction.

To get started, we needed to chat with Bill the Science Guy. Also known as the Big Dog, Dr. William Bland, associate professor of soil science, is our boss. Over the past five years or so, his interests have evolved from biophysical modeling to addressing societal questions about soil and land use with those models. (I poke some fun at him in this article, but Bill really does get it. He jumped on Java when it was brand new, then was smart enough to drop it until the language stabilized.) A series of talks with him yielded some critical requirements. CELLO must:

  • Be accessible by anyone with a web browser: homemakers concerned about water quality, growers proposing a CLO or zoning board members looking at the impact on the local road network.

  • Support varying levels of user interest, from a quick peek at the bottom line to a detailed analysis, with optional scenarios to explore. (What if farmers use less fertilizer, replacing it with manure from the CLO? What economic impacts will be seen from various salary structures for CLO employees?)

  • Be sufficiently simple to use so that no external documentation or support would be required.

  • Require no downloads or installation.

  • Analyze each CLO proposal in a "notebook," with a separate area detailing the calculations, assumptions and raw data common to all notebooks.

  • Allow new notebooks to be produced quickly as new CLOs are proposed, with minimal programming.

In addition, CELLO had to fit within our budget for tools and hardware (minimal) and for programmer time (part-time for two people). And the faster we could get something up, the more useful it would be. Finally, we hoped to develop technology that we could easily reuse in future projects.

This led us to some process and design criteria right off the bat:

  • A spiral development model would let us try risky technologies early and give us a constantly improving prototype for soliciting feedback.

  • We would use open-source or public-domain tools wherever possible.

  • We would deliver via the Web, using HTML and Java 1.1 for our user interface.

One Applet To Rule Them All

At first blush, getting to a prototype sounded easy. In August 1998, we selected the nitrogen/phosphorus model as our first target. "That model’s trivial," said Bill. "You just take the nitrogen imported and subtract the nitrogen exported."

Paul Kaarakka, the other software engineer on the project, comes to programming via zoology and biomedical engineering. He specializes in model implementation. He started turning Bill’s equations into Java classes. I opened JBuilder, flung down some tabbed dialogs and tacked on some code. Within a day I had an applet to show. Bill made approving noises.

But he wanted to "own" and edit the supporting text. In fact, he wanted rather a lot of text, with tables, formatting, graphics and embedded links. That sounded like a job for HTML, not Java.

Then I tested the nascent applet on Internet Explorer. "What’s this ‘ClassNotFoundException?’ Aaaarghh!" I screamed. I had been seduced by all the nifty user-interface widgets in Swing. But the browsers didn’t include it, and if I did (via an "archive" parameter in the applet), the download time became staggering. There had to be a better way.

Bucket O’ Applets

Okay, so the models had to live on the server. And I was still twitching and muttering from trying to force Java to do HTML’s job. So how about combining HTML with a gang of applets? We’d run the models and make the results available over the net ... somehow. Each browser applet would be just smart enough to (somehow) get a single number off the server and display it.

Bill could use HTML formatting and its easy-to-use tools, and just throw in a little black-box applet wherever he wanted a model-result number. The numbers would "automagically" update (somehow) whenever new ones were available. We could use the same technique (somehow) to send user input back to the server.

Of course, we’d need to turn "somehow" into a real protocol.

I went for some walks (it’s a lovely campus). I talked to myself a lot. I riffled through books (especially the classic Design Patterns: Elements of Reusable Object-Oriented Software, Gamma et. al., Addison-Wesley, 1995), and saw that the Observer pattern fit like a glove. Each applet could register its interest in a particular number (a topic) with a registry on the server (a whiteboard). Whenever a recalculation occurred, the applet would get the latest number. Conversely, when the user supplied some input (for example, changing the CLO’s pig population with a slider applet), the model on the server would get notified, recalculate and send out the new numbers. (See "RemoteObserver–the Whiteboard" below for more details on the pattern and our implementation.)

So we had to implement the Observer pattern over a remote communication channel. And the server would have to maintain state for simultaneous users, and ensure that only "their" numbers propagated to their applets.

The first problem was relatively simple. I spelunked the Java Development kit source code for the Observer classes, and shamelessly lifted the good ideas for a Remote Method Invocation-based RemoteObserver package. Next, I wrote RemoteTopicApplet (the display mechanism) and RemoteControl (an abstract user interface control class for information going the other way).

I didn’t know it at the time, but this infrastructure–a central whiteboard and its observers–would turn out to be a key to the project’s success. Its simple semantics kept Observers and Observables loosely coupled, enabling us to rip out and replace entire subsystems without disturbing anything else. From this point on, Paul could work on implementing the model mathematics, while I polished the plumbing.

The other problem (keeping track of session state) was a little trickier. How could the applets discover what session they belonged to? Every solution we proposed foundered on the same rock: we couldn’t cram the whole website onto a single page, but only applets on the same page can talk to each other. To get something running, I settled on an ugly hack: using a Perl CGI script to filter the HTML pages into temporary directory, customizing each applet tag with session-specific parameters as we went. By using relative URLs for interpage references, we could contain a user session within the prefiltered pages.

It wasn’t pretty, but it all worked. The applets woke up, looked around and started snagging their numbers from the server. Hurray! Bill gave demos. We got kudos (and, more importantly, feedback). The project lived, and Paul and I saw the sun, for a change.

But there were some nagging little problems, which on further reflection turned out to be showstoppers. For one thing, I had seriously offended the gods of software simplicity. By now our prototype of a "trivial" model involved a web server calling a Perl script which connected to a Java object (which instantiated some session-specific model objects), then filtered Java applet tags in HTML, RMI running one way, sockets over there ... I’ll spare you the rest of the details. But when I explained them to Paul, there was a long silence. "It’s very clever," he said, "but it’s got to go."

Worse, the server logs showed that every applet instance was being downloaded separately. Fifty numbers on the page? Fifty "gets" of RemoteTopic-Applet.class, thank you. Worse still, if you moved to a new page, the browser cheerfully slew all the applets on the old page, so if you clicked "Back," you got to wait for another 50 gets. (And that subdirectory business … eeww!)

HTML Filtering With Servlets

Well, maybe gangs of applets weren’t such a hot idea. Hey, we’re already filtering the HTML. Why not just stuff the numbers in and shoot the HTML directly to the user’s browser?

The next turn around the spiral used CGI to call a Java application which inserted the numbers. That worked, but it was a resource hog (and ugly besides). We quickly replaced that with a servlet, which was faster for the user and easier on the server. Losing the applets meant losing live-update capability, but clicking the "reload" button was a small price to pay for a much more robust system.

In fact, since servlets have session management built in, we’d found a better way to solve that problem, too. We could ditch the filter-to-a-directory nonsense. Best of all, the servlet could register as an Observer, just like the applets used to, and the rest of the project–including all of Paul’s code–would be completely untouched.

XML to the Rescue

All this time, Paul was churning Bill’s equations into Java. It was just arithmetic, but there were hundreds of numbers, and each had to get into the whiteboard. Paul’s habitually clean Java was getting caked with cruft–fully 30 percent of it was just whiteboard-stuffing overhead. For example, just to publish the number of cows in the study area, we needed:

roi.notifyObservers("SADairyCows",new Float(areaInfo.getValue ("Dairy",dataScale)));

And even a simple calculation was worse. This example just multiplies a couple of numbers:

roi.notifyObservers("SAMilkProduction",new Float(animalData.getValue

("Dairy", "Target")* ((Float)

roi.getUpdate("SADairyCows")).floatValue()));

Clearly, my friend was working way too hard.

Then I started reading about XML. Aha! If I could create a little dialect that knew about topics, the whiteboard and arithmetic, a single general-purpose parser could build models, run calculations, and update the whiteboard. And the dialect would be so simple, even a scientist could use it.

It turned out to be surprisingly easy, since so much of the work is done for you by a good XML parser. I downloaded IBM’s XML4J, hoovered through The XML Companion (Neil Bradley, Addison-Wesley, 1999) and The XML Black Book (Natanya Pitts-Moulis and Cheryl Kirk, The Coriolis PressGroup, 1999), and got to work. Two days later, I showed a toy model to Bill and Paul, and they were off and running.

Which brings the history up to date. Time to dig a little deeper into…

Architectural Details

The Observer pattern involves two partners, Observable and Observer. Observable publishes two methods: addObserver() and removeObserver(). Observer publishes one method: update(). An Observer obtains a reference to an Observable through the RMIRegistry classes, and calls Observable.addObserver(this) to register itself.

When an event of interest occurs, the Observable trundles through its list of Observers, calling each one’s update() method. Since update() takes a Serializable parameter, you can pass almost anything along, just so long as Observers and Observables adhere to the same convention. (At this point, CELLO uses a stone-simple convention: everything’s a Float). Of course, the Observer can do whatever it likes in the update() method–store a result for later, put up a dialog, print the number to a stream, and so on.

We added three twists to the pattern. First, we implemented the pattern over Java RMI (see Figure 1), so that any of the partners could exist in another process or on another machine altogether (such as an applet out in a browser somewhere).

Figure 1. The Observer Pattern

Second, currently, most Observers in CELLO just want to grab a number without waiting for the Observable to cue them, so we broke update() into two phases: notify() and getUpdate(). Design Patterns fans will recognize the "pull" model.

Third, since CELLO publishes hundreds of numbers, making each an Observable would be too hard to manage. Instead, we introduced the notion of topics. A topic is an individual result (say, the total phosphorous production due to pig manure). A collection of related topics is published via a single Observable. When registering itself, the Observer specifies the topic of interest.

As I mentioned above, this pattern was critical to CELLO’s success. By creating a lingua franca for communicating between software modules, we ensured loose coupling between them. Once we had this foundation in place, the architectural upheavals detailed above could go on without obviating existing work.

A Sample CELLO Session

CELLO is composed of a number of loosely confederated calculation engines called models. Each is responsible for a particular area of interest, such as the nitrogen/phosphorus budget or odors. When a user contacts the web server, requesting any CELLO page, she sets off a cascade of events (see Figure 2):

  • Apache rewrites the innocent-looking URL into a call to a servlet; the disk path to the requested document is passed to the servlet.

  • If necessary, the servlet loads and initializes, creating the singleton ModelManager class and a new user session.

  • The servlet parses through the requested HTML page, looking for topic tags. Each topic tag contains a model ID and a topic ID.

  • The servlet asks ModelManager for a reference to the requested model, which is returned as a RemoteObservable object. If necessary, ModelManager creates an instance of the model and adds it to this user’s session.

  • The servlet calls getUpdate() for the needed topic, and replaces the topic tag with the result.

Figure 2. Sequence Diagram

Certain web pages will have applets embedded in them which connect (also via RemoteObserver) to one of the models. If the user wonks on a control in the applet, an RMI call to update() occurs, and the model performs whatever calculations are necessary. As each topic’s value is calculated, the new value is published to all its Observers; some might be applets in the web pages, and others might be models. Eventually the chain of update() methods ends, the new values are propagated everywhere, and a new cycle can begin.

Modeling in XML

The XML dialect we use is quite simple. A model has a name (which is just for readability) and a whiteboard ID, and can contain topics and links. (Note that models only loosely correspond to whiteboars: a model might put its topics into a single whiteboard, or several. Conversely, a whiteboard could contain topics from several models.) Topics have a name, a whiteboard ID (optional, defaulting to the same as the model’s), and either an arithmetic operator or a constant. Arithmetic-operator topics can contain subtopics (for example, a "plus" topic contains a list of addends). Links have a whiteboard ID and a topic name, and can plug in numbers from elsewhere in the model, or from a different whiteboard altogether.

A simple API for an XML (SAX) parser reads the XML files, constructs trees of Operator objects and registers each with the appropriate whiteboard. If the Operator has any operands, each one registers the parent Operator as an Observer for itself (See Figure 3).

Figure 3. Class Diagram for the Potassium in Pig Feed Model

For example, suppose we’re interested in the phosphorus contained in pig feed. The calculation is simply:

(total phosphorus in feed) = (number of pigs) * (feed/pig) * (phosphorus/ pound of feed).

TotalPInPigFeed is an Observer of NumPigs, PigFeedRequired and PInPigFeed. The latter two are links to topics in another whiteboard. Once the tree has been constructed, it waits for some event to fire the operator. For example, the filter servlet might request the TotalPInPigFeed value from the N-P Budget model. That Operator, in turn, requests the values of each of its operands, multiplies them and returns the result (caching it so that subsequent queries don’t trigger the whole shebang all over again).

But suppose a user is playing "what-if" with the pig feeding model and inputs his own number for phosphorus in pig feed. When the operand’s value changes, it notifies its Observers. Among them is TotalPInPigFeed, which then recalculates again with its operands’ latest values. In turn, TotalPInPigFeed notifies its own Observers, if any, that its value has changed.

In this way, the model can accommodate changes to any value in the tree, without performing unneeded recalculations.

Ideas For The Future

As we progress toward implementing the less "trivial" models of CELLO (economic impact, infrastructure impact, and odor prediction), I’m filing a list of potential improvements. For one thing, I’ve never gotten over the live-update idea. It just bugs me to have to click "Reload" to see recalculated numbers. Someday we might have an applet that can slurp our HTML off the server, inserting up-to-date numbers as it goes and rendering the result. The applet would be a RemoteObserver, of course, so that whenever a number changed, the HTML would be rerendered with the new value. Or perhaps we could use server push to get the same effect.

Also, while our little XML dialect serves our needs, it’s easy to imagine extensions. It wouldn’t be hard to represent a database query in XML, for example; the parser could plug the results right into the model. We could also extend the dialect with arbitrary calculations by using Java’s class-loader facility; an attribute of the XML element would contain the name of the class file to load.

Finally, we added XML support to make CELLO models easier to write. It has occurred to us that we could also run the same XML code through a stylesheet, transforming it into HTML. If we could work that trick, CELLO users could see the actual mathematics in any model, without our having to write and maintain separate documentation. (In fact, the model authors are already clamoring for this, saying that the XML is too hard to read!) The challenge there will be deciding exactly what form the output HTML should take for maximum readability. Ideally, the tree should be rendered as a series of equations.

Another possibility is more radical: provide a mechanism for the user to modify the XML code. We already let them tweak selected numbers–why not the calculations themselves? Perhaps it wouldn’t be a useful feature for CELLO, but we’re already thinking of other projects in which we can reuse CELLO’s technology.

Epilogue

As planned, we turned the model-creation business over to the scientists, where it belonged. I installed a validating XML editor on Bill’s Mac, and spent an hour explaining the syntax. "Here’s where you fill in the arithmetic, see?" He nodded. I rose. "Should be trivial, hey?" I said, and fled.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.