Database

Programming Paradigms

By Michael Swaine, July 01, 1994

Michael wraps up his mushroom-identification programming project for the Newton MessagePad.

JUL94: PROGRAMMING PARADIGMS

PROGRAMMING PARADIGMS

Mushroom Programming, the Sequel

Michael Swaine

A generation or two ago, schoolmarms with their hair in buns taught that good handwriting really mattered. Who knew then that these teachers were anticipating the arrival of the Newton MessagePad?

Hmm. That's not going to work. I can blame the MessagePad's unsatisfactory handwriting recognition on bad writing skills, but then how do I alibi its year-long lack of cellular communications or its sluggish sales?

My view is that Newton is not really about handwriting or wireless communication and that ultimately the sales of Newton devices will be out of Apple's hands, as other hardware manufacturers build Newton devices.

Newton is just not the same kind of platform as a PC, and it isn't subject to the same old marketing homilies. In particular, I don't think that the size of the installed base matters very much. When the hardware platform costs less than a lot of PC software packages, I think it's clear that a different model applies. In my view, Newton is mainly about vertical applications of certain types, and the right vertical application can be sold to its target market directly; the hardware can be bundled with the application.

We used to talk about VisiCalc or Lotus 1-2-3 selling computers; in the case of Newton I think that the idea of the software selling the hardware may become the norm.

Anyway, that's my theory. Thus this non-handwriting-dependent, non-communications-oriented, vertical-application project. It's more an elucidation of my theory than a demonstration of my programming skills.

Our Friend, the Fungus

Last month, I presented part of a Newton-Script programming project I'm working on: a field guide to identifying mushrooms. I should emphasize two terms: "project" and "field guide." This endeavor is for my own education; definite identification of mushrooms can require a microscope, while indefinite identification can be dangerous, to say the least. This program (see Figure 1) is intended, at most, as a guide to deciding which mushrooms to throw in the basket to take back to the house for later, more accurate identification. That, anyway, is the disclaimer I'd use if it were a real product.

What I supplied last month was the user interface. That column was really an exercise in using the Newton Toolkit (NTK), since a whole slew of user-interface templates are built into the ROM, and the NTK lets you build a UI by dragging these around on screen and setting their attributes via menu selections. If you think that sounds like using custom controls in Visual Basic, you've got the idea, although the NextStep environment is probably an even better analogy.

I did do some coding in building the interface, but not much. This month's effort was much more of an exercise in NewtonScript programming, since I wrote the program's internals, for which there are no handy templates, in ROM.

Last month, in a flight of rhetoric, I referred to these internals as an "expert system." What I've written is really just a database and a matching routine, but it arguably follows the basic structure of an expert system as laid down in The Handbook of Artificial Intelligence, Volume IV by Avram Barr, Paul R. Cohen, and Edward A. Feigenbaum (Addison-Wesley, 1989). That is, it has a knowledge base of facts about the domain of knowledge and a relatively simple program, called an "inference engine," for reasoning about that knowledge base.

And in fact, I started to design a somewhat less-minimal expert system, only to be redirected in my efforts by a principle of object-oriented design. Some of you may be interested in that process, so I'm reporting it here.

Many of you, though, will find the programming insights obvious and the design trivial. They are, and it is, but perhaps my efforts will demonstrate some interesting features of this fascinating new object-oriented language.

There are a Lot of Mushrooms

The so-called "knowledge base" of this application is intended to hold all the knowledge necessary for identifying mushrooms on the basis of their attributes. In a conventional expert system, this knowledge might be stored as rules, as in Example 1.

Last month's user interface lets the user enter attributes of the found mushroom (the exemplar) such as color (white, buff, grey, yellow, brown, black, and so on); size (in centimeters); gill presence, appearance, and mode of attachment to the stem; and five other attributes.

This list is way too short. There are thousands of mushroom species. To tell one species from another with any reasonable certainty might require knowledge of attributes such as: cap_presence, cap_color, cap_color_change, cap_surface, cap_shape, cap_size, cap_texture, plus a couple dozen others; the stem, flesh, and gills; where the mushroom was found (what part of the country, presence of trees nearby, whether the mushroom was growing on the ground or on wood); and the time of year.

In addition to the many species, attributes, and possible values (color, for example), other complications afflict this data. First, some attributes are contingent on others: Mushrooms that lack gills have no meaningful gill-appearance or gill-attachment attribute values, for example.

Then there is the question of the appropriate level at which to make the identification. Identifying the species may not be precise enough, as some species (Boletus Edulis, for example) have distinct varieties. You might want to identify the variety, or you might just want to use certain information to shorten your search by eliminating, for example, an East Coast USA variety if you're on the West Coast.

But species can also be too precise an identification. Out in the field, you might be satisfied to know simply that the mushroom is the genus boletus; species identification can be postponed until you get home.

All these considerations affect the structure of the knowledge base, of course, but they also affect the inference engine. In expert systems, the inference engine needs to crank through the knowledge base efficiently.

What do You Mean by Identify?

The scientific name of a mushroom is one kind of identification, but there are others. It might be enough to know if it is poisonous, good to eat, or hallucinogenic.

Knowledge bases are usually structured to facilitate answering useful questions. Other criteria that could influence the knowledge-base structure include:

The desire to identify the most-common mushrooms easily and quickly. This would argue for emphasizing those attributes most useful for distinguishing these common mushrooms, rather than generally distinctive or very salient attributes (like size and color).
The desire to make things natural and easy for the user. The user will recognize and enter salient attributes like color whether or not they happen to be the most useful attributes for identifying the current mushroom.

As it happens, I didn't take any of these things into account in building the knowledge base. I just implemented the scientific hierarchy of variety, species, genus, and the like, and let that dictate the order in which the knowledge base was accessed. I did this for two reasons:

"The expert system is a_model of the expert's model of the domain," according to Bruce Buchanan in The Handbook of AI. The real experts are the botanists who sort these fungi into Linnean categories.
"A good plan is to study the physical system you are trying to model and create the classes of objects it has," is Actor author Chuck Duff's advice on deciding what objects to create in an object-oriented system.

Well, the classes that mushrooms have, in the model of reality that botanists use, are: kingdom (fungi), division, subdivision, class, order (agaricales), family (boletacaea), genus (boletus), species (edulis).

This structure submits willingly to object-oriented exploitation. Each rule is simply a description of a species (or variety or genus or whatever) of mushroom in terms of defining attributes. Inheritance comes into play naturally: Species inherit from genus, and so on. Family boletacaea is defined by the lack of gills, so the gill_type slot in all its genera and species inherit this value unless they explicitly override it.

This structure imposes a hierarchy on the search, so the inference engine doesn't have to examine the entire knowledge base and can be much more efficient. Another nice feature is that the inference engine could be designed to stop at the level of family or genus or keep cranking down to species or variety, although I haven't implemented this yet.

The Knowledge Base and the Inference Engine

The knowledge base consists of frames, each defining a mushroom species, or genus, or order, or whatever. The inference engine always starts with the same frame, corresponding to the highest level in this hierarchy of fungi; see Example 2(a).

It compares the values of this frame's slots with the values of corresponding slots in the frame named Observations, which gets built as the user enters data about the mushroom to be identified; see Example 2(b). In this version of the program, only one slot in the agaricales frame, namely stem_position, has its value compared with that of the corresponding slot in Observations. If there is a match, the inference engine then recurses over the children of this frame. The single quote in front of the names in the array children indicates that these are symbols, which is what they need to be to appear as names of frames; see Example 2(c).

The inference engine then does its comparison with each of these children. If it finds a match, it continues down the tree further, examining that frame's children. The frame above is a child of agaricales, and it has children suillus, boletus, and so on. Each of these children is a genus, and each genus has child frames that represent species; that's where the recursion stops and an identification is returned. As Example 2(d) shows, the inference engine hands back to the user the genus, species, and possibly other information picked up along the way (poisonous, delicious, attracts maggots, or whatever). The key to the inference engine is its matching algorithm. I may seem to be glossing over its detail here, but I'm not: As currently implemented, it really does no more than what I've described. Well, it does shift gears when looking at the size slot, using the size value to create a range and checking to see if the observation's size value is in that range. But there's a serious flaw in the model I've used.

Although the inheritance mechanism in NewtonScript does allow a child to inherit from a parent (for example, all frames inherit the slot stem_position with value "central" from frame agaricales), and it allows the child to override that value (you could give boletus a stem_position of "eccentric" if you wished), the overriding does no good, because this inference engine will never compare an eccentric-stemmed sample with the frame for boletus. It will have bailed out with the very first comparison.

The problem is that nature is not neat. Agaricales consists of mushrooms that generally, but not always, have centrally placed stems, while on the other hand, always have pores rather than gills. Some of the characteristics actually do allow cutting off entire branches of the search tree, while others merely indicate which branches are less promising than others.

What I apparently need are two things: 1. a flag attached to each slot indicating whether it is a required, typical, rare, or prohibited characteristic for this particular species, genus, and the like; and 2. a matching algorithm that uses this information to manage the search tree, cutting off a branch whenever a required characteristic is missing or a prohibited one is found, and reordering branches to follow promising ones first whenever typical or rare characteristics are found or missed. I'm working on it.

That's probably enough of this. My real point was to give an example of what I consider an appropriate Newton application: a vertical, handwriting-independent application that requires transportability and gives quick answers.

How many people will be asking the questions that my program is prepared to answer is an issue, of course, and uncertainty about the size of the market does give me pause. There is already at least one mushroom-identification program on the market, though. It runs on, er, um, the Amiga.

Figure 1 The mushroom-identification program in action.

Example 1: Storing knowledge as rules.

if stem_position = "central"
and gill_type = "absent"
then return "boletus"

Example 2: (a) The knowledge base starts at the highest level; (b) comparing the values of this frame's slots with the values of corresponding slots in the frame; (c) the single quote in front of the names in the array children indicates that these are symbols; (d) the inference engine hands back to the user the genus, species, and other information.

(a)
  agaricales : {
  // Agaricales is an order of fungi,
  // usually characterized by a centrally placed stem.
  // It contains several families of fungi.
  level:"order",
  children:['boletacaea,...],
  stem_position:"central"}
(b)
  Observations : {
  color : "brown",
  size : "15",
  cap_shape : "",
  cap_surface : "dry",
  gill_type : "absent",
  gill_attachment : "",
  stem_position : "central",
  stem_surface : "smooth",
  veils : ""}
(c)
  boletacaea : {
  // Boletacaea is a family of fungi,
  // characterized by pores rather than gills.
  // It contains several genera of fungi.
  level:"family",
  children:
    ['suillus, 'boletus, 'boletellus, 'gyroporus, 'pulveroboletus],
  gill_type:"absent"}
(d)
  suillus : {
  // Suillus is a genus of fungi,
  // typically characterized by a viscid cap surface.
  // It contains several species of fungi.
  level:"genus",
  children:[],
  cap_surface:"viscid"}
boletus : {
  // Boletus is a genus of fungi,
  // typically characterized by a dry cap surface.
  // It contains several species of fungi.
  level:"genus",
  children:['flaviporus,'satanas,'pulcherrimus,'edulis],
  cap_surface:"dry"}
flaviporus : {
  // Flaviporus is a species of fungi,
  // usually yellow with a smooth stem.
  level:"species",
  children:[],
  color:"yellow",
  stem_surface:"smooth"}
edulis : {
  // Edulis is a species of fungi,
  // usually large and brown, with a scaly stem.
  level:"species",
  children:[],
  color:"brown",
  size:15,
  stem_surface:"scaly"}

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Programming Paradigms

PROGRAMMING PARADIGMS

Mushroom Programming, the Sequel

Michael Swaine

Our Friend, the Fungus

There are a Lot of Mushrooms

What do You Mean by Identify?

The Knowledge Base and the Inference Engine

Example 1: Storing knowledge as rules.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Programming Paradigms

PROGRAMMING PARADIGMS

Michael Swaine

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content