Designing Complex Datacentric Applications

MAR93: DESIGNING COMPLEX DATACENTRIC APPLICATIONS

HyperChem brings client/server architecture to computational chemistry

Paul is the former director of software development for Hypercube Inc. He writes the "Windows Q&A" column for the Windows/DOS Developer's Journal and can be contacted through the DDJ offices.

HyperChem is a software tool for designing, visualizing and analyzing molecular structures. This molecular modeling tool runs on 386 PCs (and better) under Microsoft Windows and on Silicon Graphics workstations. With HyperChem (which is marketed by Autodesk), you can create three-dimensional atomic structures, visualize and manipulate their structural relationships, and perform classical and semi-empirical quantum mechanical calculations.

HyperChem is implemented in about 500,000 lines of C code. This article focuses on some of the key architectural features necessary in a product of this complexity, and presents one of the algorithms for processing a data structure representing atoms and molecules. Hopefully, the strategies discussed here are transferable to the design and implementation of other large GUI systems, such as CAD and IC design systems.

Top-level Architecture

HyperChem is implemented as a client/server system. The client, or front end, presents the Windows- or Motif-based user interface. The server, or back end, performs scientific computations such as molecular-dynamics simulations.

There are several benefits to this architectural strategy. By using a well-defined communication protocol, HyperChem is not restricted to providing the user with only one type of computation. Different back ends can be substituted for particular computational requirements. Four back ends are included with release 2.0.

The Windows version uses dynamic data exchange (DDE) for interprocess communication; the UNIX-hosted SGI package uses sockets. Because the communication protocol is independent of the communication medium, a future version of the Windows product can utilize an SGI-hosted back end in the presence of a TCP/IP network. Thus, a network of PCs will be able to utilize the raw computational power of an expensive workstation.

Two of the back ends utilize parallel algorithms. On multiple-processor SGI machines, a back end can be split into multiple processes, one per processor, with a speedup nearly proportional to the number of processors. A typical calculation will run nearly twice as fast on a dual-processor machine as on a single-processor machine. Because Windows NT supports multiprocessor machines, a future Windows NT version will implement a similar scheme. Thus, the client/server model allows the back ends to be scalable.

Another benefit of this client/server design is it allows the unique computational needs of the client and the server to be addressed. The back end benefits from maximum utilization of the processor and memory. The front end requires a tight coupling to the GUI windowing system (X/Motif for the SGI product) to achieve snappy user response.

We used Watcom's 32-bit Windows compiler (C8.5 and now C9.0) in implementing the back ends for the Windows version. As a result, the back ends run an average of four times faster than when compiled with a 16-bit compiler. Using 32-bit code in 16-bit Windows does have its drawbacks. The big problem is that, in general, Windows 3 does not know how to handle USE32 segments. Thus, all data must either be copied to a USE16 segment, or to the bottom 64K of a USE32 segment, when copying data to or from a 32-bit application. If the frequency of calls to the Windows API is high, the benefits of 32-bit code are lost due to the overhead of the 16/32-bit mapping layer. In fact, the Windows front end is implemented as 16-bit code. A native 32-bit environment such as Windows NT does not suffer these problems when running a 32-bit application. In general, a 32-bit front end running on Windows NT runs slightly faster than a 16-bit front end on Windows 3.1. This is significant, since at the time of this writing the overhead of the Windows NT operating system is greater than that of Windows 3.1.

External Interfaces

In addition to the user interface supplied by the front end, HyperChem also exposes an ASCII-string scripting interface. The interface is divided into menu equivalents, state variables, and commands. Menu equivalents use a naming scheme to correspond to each HyperChem menu item. For example, the script-menu equivalent for File/New is menu-file-new. State variables provide an interface for examining and/or manipulating the state of HyperChem. The script interface allows a user to set or query a state variable. State variables have read and write attributes; current-file-name is such a read-only variable--you must actually read a file to change the current filename. In addition to the set of static hardcoded state variables, there are also dynamic ones. For instance, atom-count refers to a particular molecule, because multiple molecules can be present in the system. Therefore, atom-count is qualified with the index of a molecule. Changing the value of a writable state variable will cause HyperChem to perform the appropriate action to make sure the internal state remains consistent. For example, setting the state variable show-stereo will cause HyperChem to immediately switch to stereo viewing mode.

Finally, script commands cause HyperChem to take an action that is not expressible by the change of a single state variable, but frequently has a UI analogue. An example is the command add-amino-acid, which behaves as if the user pressed an amino-acid push button on the Amino Residues dialog box. The result is the creation of a new amino-acid residue, attached to a growing chain if present. Several state variables are affected by the change, including the number of atoms, the total weight of the system, and the state of the newly created atoms.

One important command does not have a user-interface equivalent. Notify-on-update takes as an argument the name of a state variable, and instructs HyperChem to generate a message whenever the value of that variable changes. So notify-on-update atom-count will cause a message to be generated each time the number of atoms in the system is changed, either through user interaction, an executing script, or another application talking to HyperChem via DDE or sockets. Because DDE already has this concept embedded in the DDE Request message, notify-on-update is implemented with WM_DDE_REQUEST for the Windows product. HyperChem can support multiple simultaneous conversations with client applications. By using the notify-on-update, arbitrarily complex systems can be built using two or more applications, with HyperChem mediating the flow of information among them.

Internal Architecture

Internally, HyperChem is a datacentric application. Because it is presenting a model of chemical systems, at its heart lies a representation of that data in the form of a tree of undirected graphs. The tree maintains a hierarchy of objects, with each level containing unique properties and acting as a container for the level below it. There are four levels to this hierarchy.

The atom class is at the lowest level. Atom properties include coordinates, atomic number (for example, 6 for carbon), electrical charge, velocity (this is used for molecular dynamics calculations where the user perceives the system in motion), the number of other atoms to which it is chemically bonded (neighbors), and a neighbor list.

The second level is the residue class. A residue consists of at least one atom, and provides a way to group atoms into a repetitive unit. Currently, HyperChem supports two predefined residue types, amino-acid and nucleic-acid residues (the building blocks of proteins and DNA, respectively). However, the user can specify user-defined residue groups via an ASCII template file. Because residues are commonly chained together, an important residue property specifies the angles to use to connect a residue to its residue neighbor.

The third level contains molecule objects. A molecule is composed of at least one atom and residue. Every neighbor of an atom of a given molecule must be a member of the same molecule. Another way of saying this is that no atom of a molecule has a neighbor in a different molecule. A strict hierarchy has not been implemented, since a molecule can have both atoms and residues as descendants. A molecule is a very simple object that has a name, an atomic weight (the combined mass of all atoms it contains), and a count of the number of its descendants that are residues.

At the very top of the tree is the system class. A system is a collection of molecules, and its main property is the overall temperature of all the molecules. Usually there is only one system present at one time. However, during operations such as loading a new system from a file, the new system is created as a sibling of the current system. Only after the entire file has been successfully loaded is the old system deleted and replaced with the new one.

The tree provides a convenient and natural way to contain and access the really important data, the atoms. Each atom contains a list of its neighbors; a collection of atoms represents an undirected graph. So each molecule contains a graph, and a system can be thought of as a collection of graphs. Because molecules in nature frequently contain rings of atoms, these graphs are not acyclic; they cannot generally be represented by trees.

This leads to interesting graph-manipulative algorithms inside HyperChem. As an example, I recently had to write an import dynamic link library (DLL) (similar to the use of DLLs by Word for Windows in importing foreign file formats) to read a foreign molecule file format. This particular file format consisted of a list of atoms and a list of bonds, where each bond is represented as a pair of atoms. What is missing is a way of collecting those atoms belonging to unique molecules, so one of the tasks of the DLL was to group atoms into molecules. One way to approach this is to grow a spanning tree. Such a tree will contain every atom of a molecule.

This method can be used to implement a molecule-grouping routine, as follows. For each atom, test to see if it has been marked as belonging to a molecule. If so, skip it and move on to the next atom. If not, grow a spanning tree with this atom as the root of the tree, and mark each atom of the tree with a new molecule identifier. The challenge is to write the code to generate a spanning tree that is efficient in both execution speed and memory utilization. Listing One (page 102) presents a sanitized version of the code that does this. It implements a depth-first, nonrecursive search.

It is very important to avoid recursion in the Windows environment because the theoretical limit of a stack is 64K. (In practice it is much less than this.) Depending on the size of the stack frame and the size of the molecule being searched, a recursive solution could exceed the size of the stack. In the interest of clarity, the Walk() routine of Listing One accepts an array of atoms to walk (instead of a system node). There are numerous other interesting algorithms for dealing with the atom tree, but these lie outside the scope of this article.



_DESIGNING COMPLEX DATACENTRIC APPLICATIONS_
by Paul Bonneau

[LISTING ONE]

<a name="00b9_0008">

/**************************************************************************
 MARK.C -- determines the molecules in an array of atoms
***************************************************************************/
#define catmMax        6        /* Maximum number of neighbors. */
#define NULL           0

typedef struct atm
{
    struct atm *    patmParent;         /* Parent atom during walk. */
    struct atm *    patmRoot;           /* Root atom. */
    int             catm;               /* Number of of neighbors. */
    struct atm *    rgpatm[catmMax];    /* Neighbor list. */
} ATM;                                  /* AToM. */
/*  Prototypes */
void    Mark(ATM *, int);
ATM *   PatmNextChild(ATM * rgatm, ATM * patm);
void    Walk(ATM *, ATM *);

/**************************************************************************/
/* void Mark(ATM * rgatm, int catm)                                       */
/*    Mark each atom in the array with its parent molecule.               */
/*    Upon return, each atom will point to the root atom of the molecule  */
/*    containing it via the patmRoot field.                               */
/*        rgatm    : The array of atoms to mark.                          */
/*        catm    : The number of atoms in the array.                     */
/**************************************************************************/

void Mark(ATM * rgatm, int catm)
{   ATM *    patmLim  = rgatm + catm;    /* The limit of the array. */
    ATM *    patm;                       /* An array iterator. */

    /* First, make sure the mark fields are zeroed out. */
    for (patm = rgatm; patm < patmLim; patm++)
        patm->patmParent = patm->patmRoot = NULL;

    /* Now mark the atoms.  If an atom has already been marked, skip it. */
    /* If an atom has not been marked, then start a new molecule, and */
    /* grow a spanning tree and mark each atom in the tree with the new */
    /* molecule.  We conveniently use the first atom as the root to mark */
    /* all atoms in the tree with. */
    for (patm = rgatm; patm < patmLim; patm++)
        if (patm->patmRoot)
            Walk(rgatm, patm);
}
/**********************************************************************/
/* void Walk(ATM * rgatm, ATM * patm)                                 */
/*    Mark all atoms in this molecule with the root atom.             */
/*    Depth first non-recursive marking.                              */
/*    rgatm    : Array of all atoms.                                  */
/*    patm    : First atom in molecule.                               */
/**********************************************************************/
void Walk(ATM * rgatm, ATM * patm)
{   ATM *    patmRoot;

    patmRoot = patm->patmRoot = patm;    /* Remember root atom. */

    /* Mark the parent of the root atom with a special value, so we can */
    /* detect it is the root atom when encountered during the walk. */
    patm->patmParent = (ATM *)-1;

    for (;;)    /* Walk the tree. */
    {
        ATM *    patmNext;
        /* Given the current atom, get the next one to visit. */
        if (!(patmNext = PatmNextChild(rgatm, patm)))
        {
            /* No more kids, move back to parent.  If parent is */
            /* the root, stop. */
            if ((patm = patm->patmParent) == (ATM *)-1)
                break;                     /* We're done. */
        }
        else
        {
            patm = patmNext;               /* Move to "child". */
            patm->patmRoot = patmRoot;     /* Mark child.      */
        }
    }
}
/**********************************************************************/
/* ATM * PatmNextChild(ATM * rgatm, ATM * patm)                       */
/*    Return the next unvisited neighbor.                             */
/*    Return null if all neighbors have been visited.                 */
/*        rgatm    : Array of all atoms.                              */
/*        patm    : Atom to find an unvisited neighbor of.            */
/**********************************************************************/
ATM * PatmNextChild(ATM * rgatm, ATM * patm)
{   ATM **    ppatmLim;    /* Limit of atom's neighbor list. */
    ATM **    ppatm;        /* Neighbor list iterator. */

    /* Loop over all neighbors of this atom, looking for an unmarked one. */
    ppatmLim = patm->rgpatm + patm->catm;
    for (ppatm = patm->rgpatm; ppatm < ppatmLim; ppatm++)
        if (!(*ppatm)->patmParent)
        {
            (*ppatm)->patmParent = patm;
            return *ppatm;
         }
    return NULL;
}

Database

Designing Complex Datacentric Applications

Top-level Architecture

External Interfaces

Internal Architecture

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Designing Complex Datacentric Applications

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content