C/C++

The Spatial Aggregation Language

Feng Zhao, Chris Bailey,Kellogg and and Ivan Ordonez

, April 01, 2001

The Spatial Aggregation Language (SAL) is a C++ library for supporting rapid prototyping of data analysis and control applications for distributed physical systems.

Apr01: The Spatial Aggregation Language

Feng, Christopher, and Ivan are researchers at Xerox PARC, Dartmouth College, and Ohio State University, respectively. They can be reached at fz@alum .mit.edu, [email protected], and [email protected].

Trajectory Bundling

The Spatial Aggregation Language (SAL) is a programming environment and C++ library supporting rapid prototyping of data analysis and control applications for distributed physical systems. Weather prediction is an example of analysis of a distributed physical system. The weather charts you see on TV result from computation-intensive analysis on massive amounts of weather data, including pressure, temperature, humidity, and wind velocity measurements collected from weather sensors across the continent. Other examples might be the tasks of describing mixtures of chemical species in a reaction tank, understanding the phenomenon of glycolysis in living organisms, or predicting population variations in a predator-prey environment. Interestingly, diffusion-reaction models from nonlinear dynamics can describe all these phenomena. Analysis of these models results in rich spatio-temporal data exhibiting striking patterns, some resembling zebra stripes and others polka dots (see Figure 1). In all these applications, the core programming tasks involve representing and abstracting spatio-temporal data into concise descriptions suitable for explanation, design, prediction, and other analyses.

Recent advances in the fabrication of low-cost, large-scale arrays of microelectromechanical systems (MEMS) have enabled the engineering of so-called "smart matter" systems that integrate sensing, computation, and actuation at a massive scale never before possible. For example, an array of distributed cilia-like actuators manipulate and sort parts; see Figure 2. As another example, structural elements (steel beams, for instance) coated with piezoelectric patches sense and counteract buckling and hence greatly increase load-bearing ability. Coordinating actions and analyzing associated sensor data in such smart matter systems require scalable data representations similar to those in the weather prediction problem. Rather than twisting existing programming constructs to manipulate distributed spatio-temporal data, it would be preferable if you could leverage a core set of data types and operators supporting these common computational patterns.

Challenges in Modeling Distributed Data

A number of challenges face the development of programming tools for mining distributed data sets. First of all, in many distributed data analysis applications, spatial features such as topology and geometry are important properties of the data. For example, temperature fields are influenced by the geometric shape of a region, spatial variations in conductivity, and boundary conditions. Unlike C/C++ and MATLAB that provide vectors and arrays for representing regularly distributed data, SAL provides spatial objects as a generalized data type to model a much richer set of interesting topologies and geometries, whether regularly or irregularly distributed. SAL 1.2 has been successfully compiled on Sun UNIX workstations running SunOS and Solaris, and PCs running Redhat Linux 5.2 and newer. Compiling SAL 1.2 on other platforms requires minor tweaking of system library paths in various makefiles.

Secondly, global computational methods manipulating entire data sets soon reach computational limitations as the size of the data sets increases. Instead, applications must rely on local methods that manipulate separate subsets of the data relatively independently and then iteratively combine the results at multiple levels of spatio-temporal granularity. For example, domain decomposition methods for solving partial differential equations (PDEs) form subregions of a discretization and iteratively combine and refine independent solutions for the subregions. Data reduction lets meteorologists work with structures such as isobars, pressure troughs, and pressure cells, reasoning about underlying data at a higher level of abstraction. To support these approaches, SAL provides a set of generic, local-to-global data transformation operators.

SAL has its roots in an imagistic reasoning style of computation. Imagistic reasoning applies vision-like routines to manipulate multilayer geometric and topological structures in spatially distributed data. Physical properties such as continuity and locality give rise to regions of uniformity in data, making it possible to uncover and exploit multilevel spatio-temporal structures in the data. The Spatial Aggregation theory proposes a uniform mechanism utilizing explicit representations of such physical knowledge, expressed as metrics, adjacency relations, and equivalence predicates.

SAL supports the construction of applications in this style with a large language library of appropriate data types and operators, and a graphical, interpreted environment for rapid prototyping of application programs. In a nutshell, SAL's data constructs and operators provide a good "impedance match" to a programmer's physical intuitions about a distributed data interpretation or control task, and facilitate the exploitation of these intuitions in transforming data from local to global and back. In this article, we'll examine SAL's main features using examples to illustrate its utility in a difficult data interpretation task — extraction of high-level spatio-temporal descriptions of diffusion-reaction systems.

Trajectory Bundling Example

An example of how SAL organizes computation in the imagistic reasoning style is presented in the accompanying text box entitled "Trajectory Bundling." The example is a simple trajectory bundler, based on Yip's KAM program for analysis of dynamical systems. The KAM program is an expert system for automatically analyzing dynamical systems (see "Spatial Aggregation: Theory and Applications," by K. Yip and F. Zhao, Journal of Artificial Intelligence Research, 1996). In this application, the input is a set of points representing states of a dynamical system as in Figure 6 (a) in "Trajectory Bundling" (points in a two-dimensional plane in this example). Over time, the system's state evolves, defining a mapping from one point to the next. The goal of the trajectory bundler is to find states that, in the limit, have the same behavior. This is done by uncovering structures at two levels of abstraction — trajectory curves of points in a sequence, and trajectory bundles of trajectory curves with similar shapes (the figures are SAL interpreter plots).

SAL Features

Notice the similarity in computational structure for aggregating points into trajectories and trajectories into bundles: A uniform set of operators uses different knowledge at different levels of abstraction. This structure, the basis for Spatial Aggregation, also applies across a variety of other domains. Figure 3 shows the key steps of this data flow and can be described as follows:

Aggregation explicates a neighborhood relation indicating adjacent spatial objects. For example, in an image analysis program, a point's neighbors could be the four or eight adjacent points. In the trajectory interpretation application, an object's neighbors could belong to a minimal spanning tree or Delaunay triangulation. These are are two commonly used neighborhood graphs that connect nodes to their neighbors in some spatial domain.
Classification forms equivalence classes of neighboring spatial objects, with respect to a predicate encoding domain-specific knowledge about how continuity and spatio-temporal scales give rise to apparent "objects" in physical data. For example, isotherms arise in temperature data based on similarity of temperature values; regions arise in computer vision tasks based on similarity in gray-scale value.
Redescription abstracts equivalence classes of objects at one level as single higher-level objects at the next level. In the trajectory interpretation example, this produces a single trajectory object from each such equivalence class so that aggregate properties such as curvature of a trajectory become meaningful and the aggregation process can be repeated at a higher level. The inverse of redescription is localization, which opens up a higher-level object as an equivalence class of constituent objects at the lower level.
Additional operations search neighborhood graphs, systematically transform or filter values, compute geometric and topological properties, and so forth.

SAL has been implemented as a C++ library and interactive programming environment to support exploration of structures in spatially distributed physical data. The primitives in the language are spatial objects; the compounds are (possibly structured) collections of spatial objects such as neighborhood graphs and equivalence classes; the means of abstraction convert collections at one level to single objects at a higher level. The programming environment lets users explore data and structures derived from it, testing the effects of different predicates and graphically interacting with representations of the structures. Table 1 summarizes some of the current library members.

Imagistic Programming Style

SAL facilitates a particular style of programming for data interpretation tasks. This style can be summarized as follows: Use knowledge of locality, continuity, and spatio-temporal scales to extract and exploit multilayer structures in spatially distributed physical data.

Recall that Spatial Aggregation is an instance of imagistic reasoning, where computations are structured around perception-like operations on image-like representations of data. A key part in programming a SAL application is to adopt the "imagistic stance" by encoding a task in terms of geometric and topological structures in image-like representations. For example, configuration space-based mechanical mechanism analysis views the configuration of a mechanism (such as a car transmission gear box) as a point with coordinates for the mechanism's degrees of freedom. The analysis then uncovers regions of that space corresponding to feasible configurations of the mechanism. Adopting this stance yields the following input/output characteristics for SAL-based applications:

Input: spatially-distributed data described as a field.
Output: abstract geometric/topological structures uncovered in the data.

The goal of SAL application design is to identify the abstraction hierarchy linking the input field to the output description, types of structures that are to be manipulated at various levels in the hierarchy, and requirements necessary for jumping from one level to the next. The following steps specify these details. An actual design process might iterate, alternating between partial solutions to these steps.

1. Continuity and different spatio-temporal scales give rise to regions of uniformity at multiple levels of abstraction. Identify appropriate structural descriptions for such regions. In the trajectory bundling example, structures at different scales include sample points, trajectories, and bundles of trajectories.

2. Based on the relationships between these structures, choose abstraction levels. For example, a substructure/structure relation can group points into curves into bundles or points into regions into bodies. An adjacency relation can merge triangles into polygons. Specification of an abstraction hierarchy can proceed both bottom-up and top-down to bridge the gap between type of input and type of desired output. For example, in the trajectory bundling example, an abstraction hierarchy could be specified bottom-up by noticing that points can be grouped into curves which can be grouped into bundles. Alternatively, the hierarchy could be specified top-down by noticing that bundles are comprised of curves which are comprised of points.

3. Identify how groups of objects at one level are to be redescribed as single objects at a higher level. Identify the sources of uniformity, the preconditions necessary for objects to be grouped, the abstraction transformation to be performed on the groups, and the postconditions that will be true for the higher-level objects. The sources of uniformity in the trajectory bundling example include nearness of points and similarity in curvature of trajectories. For the abstraction of points into trajectories to be applicable, the points must be linearly connected. For the abstraction of trajectories into bundles to be applicable, the trajectories must be adjacent and have similar limit behavior.

SAL captures engineering intuitions and mathematical methods about distributed data at the appropriate levels of abstraction for application design. Just as MATLAB, for example, provides matrices and vectors to conveniently organize computation on regular data sets, SAL provides spatial objects and operators, along with relevant knowledge specifications (metrics, continuity, locality, and so forth) for computation on spatio-temporally varying data.

An Application

An interesting class of problems arising from the study of biological organisms, population dynamics, and physical chemistry are described by so-called "diffusion-reaction" systems. The behaviors of these systems can exhibit qualitatively distinct patterns that emerge and vary in seemingly unpredictable ways. Alan Turing first computationally analyzed them in 1952; hence the name "Turing patterns." Allen Turing speculated that the phenomenon of morphogenesis, the generation of macroscopic shape in living organisms, might be governed by similar systems. Figure 4 shows the temporal evolution of a Turing pattern for one parameter; Figure 5 shows variation in patterns over a set of parameters. Scientists are interested in understanding what types of behaviors are possible under which conditions. The answer to the question relies on a careful analysis of the qualitative distinctions of the structures in the data.

SAL has been used to prototype a program for analyzing an important family of diffusion-reaction models in two dimensions, the Gray-Scott (GS) model of glycolysis (see "Complex Patterns in a Simple System," by J. Pearson, Science, 1993). The task here is to automatically recognize, track, and catalog qualitative structures in the spatio-temporal data sets generated from the model.

The SAL program carries out a systematic simulation of the temporal evolutions of the GS model. Events in the data such as births, deaths, collisions, and fusions of objects as well as their shape changes are identified and tracked using a Delaunay triangulation-based clustering supported in SAL. A "history" is a sequence of such events. Multiple histories generated as above are compared, and then classified according to their behavioral similarity. The SAL program groups systems with different parameterizations into equivalence classes, each of which comprises members that exhibit qualitatively similar behaviors. The classification automatically generated by the SAL program is comparable to those produced by scientists through carefully hand-tuned computational experiments and reported in Pearson's article.

Conclusion

We have illustrated how SAL supports rapid prototyping of application programs for distributed data analysis via a set of carefully designed data types and generic operators. An application to the analysis of diffusion-reaction systems has shown that SAL programs can be effectively constructed from a small set of common data types. In addition, we have gained valuable experience in applying the software tool to other applications. A SAL program analyzed weather data sets downloaded from the National Weather Services to identify location and shapes of pressure troughs and thermal packing regions (see "Relation Based Aggregation: Finding Objects In Large Spatial Datasets," by X. Huang and F. Zhao, Intelligent Data Analysis, 2000). Other application areas we've studied include distributed control optimization and fluid flow analysis.

The C++ implementation of SAL library permits SAL to be used either as a standalone tool or as a component within existing program development environments. The SAL interpreter lets you quickly and interactively experiment with core program fragments written in SAL style and visualize the results. We expect SAL to complement existing MATLAB and C/C++ development environments in applications that explicitly represent and exploit structures in distributed data.

Additional Resources

Source code for SAL 1.2, a command-line interpreter, and example programs and documentation is available for research use at http://www.parc.xerox.com/zhao/sal.html or http://www.cs .dartmouth.edu/~cbk/sal.html or http:// www.cis.ohio-state.edu/insight/sal-code.html.

DDJ

Listing One

// (1.1) Read input points.
points = read_points(infile)

// (1.2) Aggregate into a Delaunay triangulation; find the 
//  minimal spanning tree.
point_trig = aggregate(points, make_mesh_delaunay())
point_mst = mst(point_trig)

// (1.3) Classify, keeping close-enough neighbors.
good_adj = function(adj) {
  nearby_ngraph = reachable_ngraph(point_mst, adj, depth)
  nearby_edges = faces(make_ngraph_geom(nearby_ngraph), 1)
  edge_lengths = map(nearby_edges, volume)
  m = average(edge_lengths)
  sigma = stddev(edge_lengths, m)
  distance(from(adj), to(adj)) < m + num_devs*sigma
}
point_classifier = classify(points, make_classifier_transitive(point_mst,good_adj))
// (2.1) Redescribe point classes as trajectories.
points_to_traj = redescribe(classes(point_classifier),                    make_redescribe_op_path_nline(point_mst))
trajs = high_level_objects(points_to_traj)

// (2.2) Aggregate trajectories based on point ngraph.
traj_ngraph = aggregate(trajs, make_ngraph_connected_substructure(point_mst, 
                                          points_classifier, points_to_traj))
// (2.3) Classify based on similarity of tangent vectors.
same_limit = function(adj) {
  t1 = from(adj); t2 = to(adj)
  p1a = end1(t1); p1b = end2(t1)
  t2_points = localize(points_to_traj, t2)
  t2_corresponding_p1a = intersection(t2_points, neighbors(point_trig, p1a))
  t2_corresponding_p1b = intersection(t2_points, neighbors(point_trig, p1b))
  if (size(t2_corresponding_p1a) == 0 || size(t2_corresponding_p1b) == 0) {
    false
  } else {
    tan_1a = tangent(t1, p1a); tan_1b = tangent(t1, p1b)
    tan_2a = space_centroid(map(p in t2_corresponding_p1a,{ tangent(t2, p) }))
    tan_2b = space_centroid(map(p in t2_corresponding_p1b,{ tangent(t2, p) }))
    ((dot(tan_1a, tan_2a)>=angle_thresh} && dot(tan_1b,tan_2b)>=angle_thresh}) ||
     (-dot(tan_1a, tan_2a)>=angle_thresh} && -dot(tan_1b, tan_2b)>=angle_thresh}))
  }
}
traj_classifier = classify(trajs, make_classifier_transitive(traj_ngraph, same_limit))

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

The Spatial Aggregation Language