A homegrown engine for assessing the environmental and community impact of large livestock operations tests the promise of XML and distributed objects.
Ill say it up front: I love my job. Im one of two software engineers supporting a small team of scientists doing community outreach and research on agriculture and environmental issues. Working with the Agricultural Systems and Informatics Group (ASIG) is kind of like working for a start-up, except that our group is swaddled in a big research organization: the University of Wisconsin-Madison Department of Soil Science. And nobody second-guesses our technical decisions. Bliss!
Our outreach mission mainly involves decision support for growers and their communities. In some cases, we deliver raw information. For example, we publish daily maps derived from NASA satellite data which show sunlight intensity across the country. Other products run simulation models to forecast things like frost formation in cranberry bogs, or the likelihood of "late blight" in potatoes, and recommend how to get good yield with minimal environmental impact. In the past, weve used a variety of technologies: DOS and Windows applications written in Pascal or Paradox, automated e-mail, faxes and even a text-to-speech phone-response system (to reach the real low-tech diehards). Most of our work today, however, is delivered over the World Wide Web. The overhead of producing, delivering and maintaining custom applications devours resources far out of proportion to the results. With the World Wide Web, we have most of the same graphical and user-interface power available, but we can release as often as we like, and we never have to produce CD-ROMs. Also, our models primarily use data stored on our own servers, so it makes sense to run them locally and just put the user interface on the users machine.
Heres the story of one of those projects. Its got distributed objects. Its got XML. Its got design patterns. Plus, its got kilotons of pig poop. What more could you want?
Caution: Big Hogs Upwind
Bear with me for a little background.
In the past, university agricultural research primarily concerned technology and economics as seen from the growers standpoint. At ASIG, were trying to broaden that mission toward providing easily-accessible information to all stakeholders.
The past 10 years have seen a remarkable trend toward farm consolidation, especially livestock operations. When they get really big (Im thinking 1,000 hogs, here), we call it a concentrated livestock operation or CLO. In Wisconsin, a new CLO usually means a political controversy. Often those for and against paint in broad strokes:
"Jobs for everybody!" the promoters crow.
"The stench will peel the paint off my housenot in my backyard, buddy."
Our purpose isnt to wade into that fight. These are difficult socioeconomic trade-offs that the community has to resolve. On the other hand, at least some of the issues are amenable to scientific analysis. We aim to give growers, interested residents and scientists access to those numbers. Then we might see less heat and more light, and the debate has a hope of moving on to the hard questions.
Initially proposed in January 1998 and currently in alpha test, CELLO (Communities, Ecosystems and Large Livestock Operations) is designed to estimate a CLOs net change in nitrogen and phosphorus (important determinants of ground- and surface-water quality), economic impact, infrastructure impact and odor prediction.
To get started, we needed to chat with Bill the Science Guy. Also known as the Big Dog, Dr. William Bland, associate professor of soil science, is our boss. Over the past five years or so, his interests have evolved from biophysical modeling to addressing societal questions about soil and land use with those models. (I poke some fun at him in this article, but Bill really does get it. He jumped on Java when it was brand new, then was smart enough to drop it until the language stabilized.) A series of talks with him yielded some critical requirements. CELLO must:
- Be accessible by anyone with a web
browser: homemakers concerned about water quality, growers proposing a CLO or zoning board members
looking at the impact on the local road network.
- Support varying levels of user
interest, from a quick peek at the bottom line to a detailed analysis, with optional scenarios to
explore. (What if farmers use less fertilizer, replacing it with manure
from the CLO? What economic impacts will be seen from various salary structures for CLO
employees?)
- Be sufficiently simple to use so that no external documentation or
support would be required.
- Require no downloads or installation.
- Analyze each CLO proposal in a "notebook," with a separate area detailing the
calculations, assumptions and raw data common to all notebooks.
- Allow new notebooks
to be produced quickly as new CLOs are proposed, with minimal programming.
In addition, CELLO had to fit within our budget for tools and hardware (minimal) and for programmer time (part-time for two people). And the faster we could get something up, the more useful it would be. Finally, we hoped to develop technology that we could easily reuse in future projects.
This led us to some process and design criteria right off the bat:
- A spiral development model would let
us try risky technologies early and give us a constantly improving prototype for soliciting
feedback.
- We would use open-source or public-domain tools wherever possible.
- We would deliver via the Web, using HTML and Java 1.1
for our user interface.
One Applet To Rule Them All
At first blush, getting to a prototype sounded easy. In August 1998, we selected the nitrogen/phosphorus model as our first target. "That models trivial," said Bill. "You just take the nitrogen imported and subtract the nitrogen exported."
Paul Kaarakka, the other software engineer on the project, comes to programming via zoology and biomedical engineering. He specializes in model implementation. He started turning Bills equations into Java classes. I opened JBuilder, flung down some tabbed dialogs and tacked on some code. Within a day I had an applet to show. Bill made approving noises.
But he wanted to "own" and edit the supporting text. In fact, he wanted rather a lot of text, with tables, formatting, graphics and embedded links. That sounded like a job for HTML, not Java.
Then I tested the nascent applet on Internet Explorer. "Whats this ClassNotFoundException? Aaaarghh!" I screamed. I had been seduced by all the nifty user-interface widgets in Swing. But the browsers didnt include it, and if I did (via an "archive" parameter in the applet), the download time became staggering. There had to be a better way.
Bucket O Applets
Okay, so the models had to live on the server. And I was still twitching and muttering from trying to force Java to do HTMLs job. So how about combining HTML with a gang of applets? Wed run the models and make the results available over the net ... somehow. Each browser applet would be just smart enough to (somehow) get a single number off the server and display it.
Bill could use HTML formatting and its easy-to-use tools, and just throw in a little black-box applet wherever he wanted a model-result number. The numbers would "automagically" update (somehow) whenever new ones were available. We could use the same technique (somehow) to send user input back to the server.
Of course, wed need to turn "somehow" into a real protocol.
I went for some walks (its a lovely campus). I talked to myself a lot. I riffled through books (especially the classic Design Patterns: Elements of Reusable Object-Oriented Software, Gamma et. al., Addison-Wesley, 1995), and saw that the Observer pattern fit like a glove. Each applet could register its interest in a particular number (a topic) with a registry on the server (a whiteboard). Whenever a recalculation occurred, the applet would get the latest number. Conversely, when the user supplied some input (for example, changing the CLOs pig population with a slider applet), the model on the server would get notified, recalculate and send out the new numbers. (See "RemoteObserverthe Whiteboard" below for more details on the pattern and our implementation.)
So we had to implement the Observer pattern over a remote communication channel. And the server would have to maintain state for simultaneous users, and ensure that only "their" numbers propagated to their applets.
The first problem was relatively simple. I spelunked the Java Development kit source code for the Observer classes, and shamelessly lifted the good ideas for a Remote Method Invocation-based RemoteObserver package. Next, I wrote RemoteTopicApplet (the display mechanism) and RemoteControl (an abstract user interface control class for information going the other way).
I didnt know it at the time, but this infrastructurea central whiteboard and its observerswould turn out to be a key to the projects success. Its simple semantics kept Observers and Observables loosely coupled, enabling us to rip out and replace entire subsystems without disturbing anything else. From this point on, Paul could work on implementing the model mathematics, while I polished the plumbing.
The other problem (keeping track of session state) was a little trickier. How could the applets discover what session they belonged to? Every solution we proposed foundered on the same rock: we couldnt cram the whole website onto a single page, but only applets on the same page can talk to each other. To get something running, I settled on an ugly hack: using a Perl CGI script to filter the HTML pages into temporary directory, customizing each applet tag with session-specific parameters as we went. By using relative URLs for interpage references, we could contain a user session within the prefiltered pages.
It wasnt pretty, but it all worked. The applets woke up, looked around and started snagging their numbers from the server. Hurray! Bill gave demos. We got kudos (and, more importantly, feedback). The project lived, and Paul and I saw the sun, for a change.
But there were some nagging little problems, which on further reflection turned out to be showstoppers. For one thing, I had seriously offended the gods of software simplicity. By now our prototype of a "trivial" model involved a web server calling a Perl script which connected to a Java object (which instantiated some session-specific model objects), then filtered Java applet tags in HTML, RMI running one way, sockets over there ... Ill spare you the rest of the details. But when I explained them to Paul, there was a long silence. "Its very clever," he said, "but its got to go."
Worse, the server logs showed that every applet instance was being downloaded separately. Fifty numbers on the page? Fifty "gets" of RemoteTopic-Applet.class, thank you. Worse still, if you moved to a new page, the browser cheerfully slew all the applets on the old page, so if you clicked "Back," you got to wait for another 50 gets. (And that subdirectory business eeww!)
HTML Filtering With Servlets
Well, maybe gangs of applets werent such a hot idea. Hey, were already filtering the HTML. Why not just stuff the numbers in and shoot the HTML directly to the users browser?
The next turn around the spiral used CGI to call a Java application which inserted the numbers. That worked, but it was a resource hog (and ugly besides). We quickly replaced that with a servlet, which was faster for the user and easier on the server. Losing the applets meant losing live-update capability, but clicking the "reload" button was a small price to pay for a much more robust system.
In fact, since servlets have session management built in, wed found a better way to solve that problem, too. We could ditch the filter-to-a-directory nonsense. Best of all, the servlet could register as an Observer, just like the applets used to, and the rest of the projectincluding all of Pauls codewould be completely untouched.
XML to the Rescue
All this time, Paul was churning Bills equations into Java. It was just arithmetic, but there were hundreds of numbers, and each had to get into the whiteboard. Pauls habitually clean Java was getting caked with cruftfully 30 percent of it was just whiteboard-stuffing overhead. For example, just to publish the number of cows in the study area, we needed:
roi.notifyObservers("SADairyCows",new Float(areaInfo.getValue ("Dairy",dataScale)));
And even a simple calculation was worse. This example just multiplies a couple of numbers:
roi.notifyObservers("SAMilkProduction",new Float(animalData.getValue
("Dairy", "Target")* ((Float)
roi.getUpdate("SADairyCows")).floatValue()));
Clearly, my friend was working way too hard.
Then I started reading about XML. Aha! If I could create a little dialect that knew about topics, the whiteboard and arithmetic, a single general-purpose parser could build models, run calculations, and update the whiteboard. And the dialect would be so simple, even a scientist could use it.
It turned out to be surprisingly easy, since so much of the work is done for you by a good XML parser. I downloaded IBMs XML4J, hoovered through The XML Companion (Neil Bradley, Addison-Wesley, 1999) and The XML Black Book (Natanya Pitts-Moulis and Cheryl Kirk, The Coriolis PressGroup, 1999), and got to work. Two days later, I showed a toy model to Bill and Paul, and they were off and running.
Which brings the history up to date. Time to dig a little deeper into
Architectural Details
The Observer pattern involves two partners, Observable and Observer. Observable publishes two methods: addObserver() and removeObserver(). Observer publishes one method: update(). An Observer obtains a reference to an Observable through the RMIRegistry classes, and calls Observable.addObserver(this) to register itself.
When an event of interest occurs, the Observable trundles through its list of Observers, calling each ones update() method. Since update() takes a Serializable parameter, you can pass almost anything along, just so long as Observers and Observables adhere to the same convention. (At this point, CELLO uses a stone-simple convention: everythings a Float). Of course, the Observer can do whatever it likes in the update() methodstore a result for later, put up a dialog, print the number to a stream, and so on.
We added three twists to the pattern. First, we implemented the pattern over Java RMI (see Figure 1), so that any of the partners could exist in another process or on another machine altogether (such as an applet out in a browser somewhere).
Figure 1. The Observer Pattern
|
Second, currently, most Observers in CELLO just want to grab a number without waiting for the Observable to cue them, so we broke update() into two phases: notify() and getUpdate(). Design Patterns fans will recognize the "pull" model.
Third, since CELLO publishes hundreds of numbers, making each an Observable would be too hard to manage. Instead, we introduced the notion of topics. A topic is an individual result (say, the total phosphorous production due to pig manure). A collection of related topics is published via a single Observable. When registering itself, the Observer specifies the topic of interest.
As I mentioned above, this pattern was critical to CELLOs success. By creating a lingua franca for communicating between software modules, we ensured loose coupling between them. Once we had this foundation in place, the architectural upheavals detailed above could go on without obviating existing work.
A Sample CELLO Session
CELLO is composed of a number of loosely confederated calculation engines called models. Each is responsible for a particular area of interest, such as the nitrogen/phosphorus budget or odors. When a user contacts the web server, requesting any CELLO page, she sets off a cascade of events (see Figure 2):
- Apache rewrites the
innocent-looking URL into a call to a servlet; the disk path to the requested document is passed to
the servlet.
- If necessary, the servlet loads and initializes, creating the
singleton ModelManager class and a new user session.
- The servlet parses through the
requested HTML page, looking for topic tags. Each topic tag contains a model ID and a topic ID.
- The servlet asks ModelManager for a reference to the requested model, which is returned
as a RemoteObservable object. If necessary, ModelManager creates an instance of the model and adds it
to this users session.
- The servlet calls getUpdate() for the needed topic,
and replaces the topic tag with the result.
Figure 2. Sequence Diagram
|
Certain web pages will have applets embedded in them which connect (also via RemoteObserver) to one of the models. If the user wonks on a control in the applet, an RMI call to update() occurs, and the model performs whatever calculations are necessary. As each topics value is calculated, the new value is published to all its Observers; some might be applets in the web pages, and others might be models. Eventually the chain of update() methods ends, the new values are propagated everywhere, and a new cycle can begin.
Modeling in XML
The XML dialect we use is quite simple. A model has a name (which is just for readability) and a whiteboard ID, and can contain topics and links. (Note that models only loosely correspond to whiteboars: a model might put its topics into a single whiteboard, or several. Conversely, a whiteboard could contain topics from several models.) Topics have a name, a whiteboard ID (optional, defaulting to the same as the models), and either an arithmetic operator or a constant. Arithmetic-operator topics can contain subtopics (for example, a "plus" topic contains a list of addends). Links have a whiteboard ID and a topic name, and can plug in numbers from elsewhere in the model, or from a different whiteboard altogether.
A simple API for an XML (SAX) parser reads the XML files, constructs trees of Operator objects and registers each with the appropriate whiteboard. If the Operator has any operands, each one registers the parent Operator as an Observer for itself (See Figure 3).
Figure 3. Class Diagram for the Potassium in Pig Feed Model
|
For example, suppose were interested in the phosphorus contained in pig feed. The calculation is simply:
(total phosphorus in feed) = (number of pigs) * (feed/pig) * (phosphorus/ pound of feed).
TotalPInPigFeed is an Observer of NumPigs, PigFeedRequired and PInPigFeed. The latter two are links to topics in another whiteboard. Once the tree has been constructed, it waits for some event to fire the operator. For example, the filter servlet might request the TotalPInPigFeed value from the N-P Budget model. That Operator, in turn, requests the values of each of its operands, multiplies them and returns the result (caching it so that subsequent queries dont trigger the whole shebang all over again).
But suppose a user is playing "what-if" with the pig feeding model and inputs his own number for phosphorus in pig feed. When the operands value changes, it notifies its Observers. Among them is TotalPInPigFeed, which then recalculates again with its operands latest values. In turn, TotalPInPigFeed notifies its own Observers, if any, that its value has changed.
In this way, the model can accommodate changes to any value in the tree, without performing unneeded recalculations.
Ideas For The Future
As we progress toward implementing the less "trivial" models of CELLO (economic impact, infrastructure impact, and odor prediction), Im filing a list of potential improvements. For one thing, Ive never gotten over the live-update idea. It just bugs me to have to click "Reload" to see recalculated numbers. Someday we might have an applet that can slurp our HTML off the server, inserting up-to-date numbers as it goes and rendering the result. The applet would be a RemoteObserver, of course, so that whenever a number changed, the HTML would be rerendered with the new value. Or perhaps we could use server push to get the same effect.
Also, while our little XML dialect serves our needs, its easy to imagine extensions. It wouldnt be hard to represent a database query in XML, for example; the parser could plug the results right into the model. We could also extend the dialect with arbitrary calculations by using Javas class-loader facility; an attribute of the XML element would contain the name of the class file to load.
Finally, we added XML support to make CELLO models easier to write. It has occurred to us that we could also run the same XML code through a stylesheet, transforming it into HTML. If we could work that trick, CELLO users could see the actual mathematics in any model, without our having to write and maintain separate documentation. (In fact, the model authors are already clamoring for this, saying that the XML is too hard to read!) The challenge there will be deciding exactly what form the output HTML should take for maximum readability. Ideally, the tree should be rendered as a series of equations.
Another possibility is more radical: provide a mechanism for the user to modify the XML code. We already let them tweak selected numberswhy not the calculations themselves? Perhaps it wouldnt be a useful feature for CELLO, but were already thinking of other projects in which we can reuse CELLOs technology.
Epilogue
As planned, we turned the model-creation business over to the scientists, where it belonged. I installed a validating XML editor on Bills Mac, and spent an hour explaining the syntax. "Heres where you fill in the arithmetic, see?" He nodded. I rose. "Should be trivial, hey?" I said, and fled.