DEC91: GRAPHICAL DATA VISUALIZATION

GRAPHICAL DATA VISUALIZATION

Implementing an interactive GUI

Marian G. Williams and Peter D. Varhol

Peter is a freelance writer and assistant professor of computer science and mathematics at Rivier College in New Hampshire. Marian is a research scientist at the Center for Productivity Enhancement at the University of Massachusetts at Lowell. They can be contacted through DDJ.

As professionals trained in the traditional numerical methods of data analysis, we've come to be impressed with the potential of various data visualization techniques in filling in some of the blind spots inherent in these methods. In particular, numerical methods are notoriously poor at taking highly multivariate data and identifying interesting relationships worthy of further investigation.

This area, known generically as exploratory data analysis, is important when theories have not yet caught up with experimentation, or when large numbers or interrelationships between variables may confound straightforward predictions. For example, it is well known that there is a relationship between smoking and lung disease, but the relationship is complicated by other aspects of a person's lifestyle, socio-economic standing, other health considerations, and even ethnic and racial background. Determining which factors affect which others can be difficult unless the researcher knows exactly what to look for.

Furthermore, few software applications do this type of data analysis well. We would like to discuss both a technique for exploratory data analysis using visualization and an applications development environment that we found useful in prototyping it.

Taking the Solution at Face Value

A unique method of performing exploratory data analysis was introduced in 1973 by a Stanford University statistician named Herman Chernoff. His idea was conceptually simple but very sensible. He proposed a straightforward caricature of the human face as an innovative tool for the visualization of multivariate data -- long before the term "data visualization" came into common use. The face itself represents different aspects of the data. In practice, it looks much like the so-called "happy face" that has adorned posters for years.

Chernoff's goal, and the goal of data visualization in general, is to develop a graphical method of representing data points in n-dimensional space that allows an observer to easily recognize strong and complex relationships in the data. The observer can then perform statistical analyses to study the relationships.

Chernoff's premise is that a person spends years learning how to read other people's facial expressions. Why not make use of this highly developed skill for data visualization? Use one face to represent each data point in n-dimensional space, and let the facial features be determined by the fields of that data point. Let one field determine the curve of the smiling or frowning mouth, another determine the size of the pupils, and so forth. Then arrange all of the faces in a display. A person can spot regularities and peculiarities in the data by recognizing similarities and differences among the faces.

Chernoff's faces are potentially useful for cluster analysis and for detecting changes in time series. Chernoff used faces for cluster analysis of Eocene Yellow Limestone fossils and for discrimination analysis of mineral data from core drillings. In the fossil example, each fossil was represented by a face. Chernoff showed how the fossils could be sorted by grouping the faces according to similarity of expression. There may be a measure of subjectivity involved, however, as a colleague of Chernoff's could not entirely replicate Chernoff's own groupings. In the core drilling example, a sequence of faces represented data from specimens taken sequentially along a core. Changes in expression indicated the places that critical changes took place in the core.

The usefulness of a display of faces depends very much on which data field is matched with which facial feature, yet there is no algorithm for doing the matching. In a recent article about the pace of life in various American cities, the editors of American Scientist used Chernoff's faces to represent the author's data. They deliberately mapped the data fields onto the facial features in such a way that the cities with a slow pace of life (Los Angeles, for instance) generated smiling, relaxed expressions, while cities with a fast pace of life (such as Boston and New York) were represented by scowling faces. A different mapping of data fields onto facial features could have resulted in a less meaningful representation of the data.

Because the right way, by which we mean the most informative way, to match data fields to facial features may not be known in advance of drawing the faces, software for generating displays of faces needs to have an interface that lets the observer experiment interactively. This was one of the goals of our graphical prototype, discussed shortly.

Chernoff says of his faces, "This approach is an amusing reversal of a common one in artificial intelligence. Instead of using machines to discriminate between human faces by reducing them to numbers, we discriminate between numbers by using the machine to do the brute labor of drawing faces and leaving the intelligence to the humans, who are still more flexible and clever."

There are a number of visualization research projects that exploit basic perceptual capabilities in ways similar to Chernoff's ideas. The Exvis project at the University of Massachusetts at Lowell (formerly the University of Lowell) is one such project. The researchers are studying the use of texture perception, sound perception, and color perception in creating representations that help scientists to locate the important information in their data. Instead of faces, they use small stick figures that have graphical, sound, and color attributes.

A Graphical Implementation of Chernoff's Faces

While the concept seems difficult to implement using traditional software development tools, recent advances in object-oriented technologies have made the development of highly graphical user interfaces far easier than in the past. In particular, when screen objects have a corresponding object representation in the development system, implementing an interactive graphical interface is conceptually clean.

We used VZ Programmer for Windows and OS/2 to implement a rapid prototype of a data visualization tool based on Chernoff's faces. VZ Programmer lets you use a graphical object editor to draw a picture of the user interface of your application. Each component of the picture is an instance of an object class in the VZ class library, and has certain characteristic data and behaviors associated with it. Furthermore, through VZ's implementation of a C interpreter/incremental compiler, the behaviors of instances can be modified through the introduction of new methods and data attributes.

We chose to represent a single data object as an individual Chernoff's face. Six different characteristics of the face and its position were used to represent up to six different variables, or measurements, from each data source. These measurements are identified on a Cartesian plane by one of the following characteristics: Position on X-axis, position on Y-axis, length of mouth, position of mouth, size of eyes, and distance between eyes.

The initial interface in VZ Programmer is highly graphical, and uses a file cabinet metaphor. By clicking on a drawer of the file cabinet, the files in that drawer are displayed as file folders. Opening one of these file folders takes the programmer into the development environment, which centers around the object editor.

VZ Programmer provides the object editor for drawing the user interface. This editor includes a tools palette, which lets you draw virtually any shape on the screen, including lines, circles, controls, and scroll bars. Each of these shapes are VZ objects, and can have methods and data associated with them to define their behavior.

Attributes are used to store methods and data within an object. Data attributes simply have a name and an untyped value. Methods include a script of C++ code, such as that shown in Example 1. For example, for a button to open a window, a method has to be associated with that button to accomplish that task.

Example 1: (a) Entering and error checking a data value; (b) using a menu to call up new data analysis window

  (a)

  EndEdit(  void  )
  {
    int index    = int(textString);
    int newValue = this=>(textString);

    Erase(0);
    if (newValue >= 0) {
      chartData[index]   = newValue;
      this=>(textString) = newValue;
      }
    else {
      this=>(textString) = chartData[index];
      NoticeBox("Range Check Error",
                "Data must be larger than zero.", 0);
      }
    Draw(0);
    }

  (b)

  MenuCommand(  short id  ) {
    string      pageName;
    VZ_PAGEWINDOW** window;
    VZ_PAGE*    page;

    switch(id) {
       case 100:
      pageName = "faces 1";
      window = &mdiChild1;
          break;
       case 200:
      pageName = "faces 2";
      window   = &mdiChild2;
          break;
       case 300:
      pageName = "faces 3";
      window   = &mdiChild3;
          break;
       }

    if (!(*window) && (page = FindPage(pageName))) {
      *window = new VZ_PAGEWINDOW (page);
      if (*window) {
        (*window)->Show();
        }
      }
    }

The first consideration in our application was data entry. While the most convenient method, the spreadsheet metaphor, is not normally thought of as object oriented, VZ Programmer provides two different types of text-field object classes that can be readily adapted to the task. Therefore, setting up a limited-functionality spreadsheet interface took only a few minutes. The columns corresponded to variables within a single data object, while the rows represented new data objects. Each field had a variable attribute to intercept the entered value.

We then created a library of faces using the object editor. The library consisted of a faces object class, with subclasses representing each of the different components of face used. Therefore, all our faces were actually predefined in these classes, so that displaying a face was simply a matter of instantiating a new instance of the appropriate class or classes.

Ideally, it would have been better to have the application draw each face from scratch at runtime, using the input data provided. However, in addition to simplifying the implementation, this approach recognizes that there are limitations to both the resolution of the screen and the ability of the data analyst to discern differences in similar values. We believe that we did not lose a great deal of physical or perceptual resolution by grouping similar values together into the same graphical representation.

New classes are created from within the object editor, by selecting the Edit Class item from the Utilities menu. After creating the new class, we defined the appearance of the faces classes by drawing them from within the object editor. By working with class attributes, we also set the global behaviors associated with these classes.

Putting the Pieces Together

The application came together in the following way. Each value associated with a data object in the input spreadsheet was normalized so that it represented a value between 0 and 1. Each subclass of the faces class had a corresponding range of values between 0 and 1. The components for the faces were selected out of the class libraries according to this value, and the resulting instances were assembled on the screen.

Furthermore, the user can select the facial component used to represent a particular variable. We used a dialog box for the user to name both the variable and the corresponding component class. Both menus and dialog boxes are created within the VZ object editor, using the predefined control classes from the palette. Of course, both also need the appropriate code to enable menu and dialog selections to accomplish the necessary actions.

The result is that the user brings up the spreadsheet with a menu selection in order to enter data, then brings up the dialog box to associate variable with facial component. Accomplishing this step then automatically performs normalization of the data and matches the normalized data with the appropriate facial components. Then the application opens up a new window that displays Chernoff faces on a coordinate plane. Each face comes from the same class library and takes up the same amount of space. The facial components come from separate libraries, and are added to the facial objects. Figure 1, for instance, shows an example Chernoff's faces display that contains demographic data from the most recent US Census. The faces represent 49 different cities and towns in Massachusetts. The X- and Y-axis position them according to their relative geographic location (X, north and south; Y, east and west). The smile represents population growth over the last ten years, with the larger the smile, the more the relative growth. The space between the eyes represents growth in per capita income over the same period, with more income depicted by more space.

Is this a useful way of representing data for the purpose of discovering new relationships between variables? It almost seems too tongue-in-cheek to be of real use, but it may inspire data analysts to look at other nonconventional ways of visualizing data. While the concept itself is intuitively appealing, it's difficult to tell at this time if the approach has value in exploratory data analysis. With graphical prototype building tools such as VZ Programmer, examining other approaches can be done quickly and easily.

How VZ Programmer Contributed to the Task

VZ Corporation refers to their development language as C++. That is something of a misnomer, because it is not completely complaint with either ANSI C nor AT&T C++ 2.0. In both cases there is a workable subset of the language. However, even under the best of circumstances, it is unlikely that a significant amount of code can be ported into or out of the environment. The problem is that the code is used to supplement the object class library, especially for graphical screen objects. No standard C++ class library exists, so true code compatibility is out of the question. This is likely to be the case in any object-oriented C++ development efforts until that standard class library does exist.

As a result of its class library, a VZ Programmer executable is not a stand-alone application. It must be executed either in the VZ Programmer development environment, or along with a runtime module that contains essential class bindings. The methods in a VZ executable file are actually compiled C code, with hooks for predefined screen objects defined in the runtime module. The executable file also contains the complete definitions of any user-defined classes, such as our faces classes.

Versions of VZ Programmer run on both Windows and OS/2 Presentation Manager. In our tests, compatibility between the two versions appears to be very good. All of the code and graphical objects developed under the Windows version ran without modification on the OS/2 version. We worked in MS Windows using version 2.0 of VZ Programmer. In this environment, it requires Windows running in standard or enhanced mode, and 2 Mbytes of memory. It takes up about 1.5 Mbytes of disk space.

Without a graphical development tool such as VZ Programmer, this project would have required months of dedicated coding, and we would probably have not undertaken the effort at all. The application described took a bit more than a month of part-time effort, a portion of which was spent learning VZ Programmer itself. While the resulting prototype was satisfactory for experimental uses, it may not be everyone's idea of a general distribution product. There is still resistance from some developers to bundling a runtime module with an application, but as object-oriented development with predefined class libraries continues to grow in popularity, the need to include class bindings as a runtime component will grow.

References

Chernoff, Herman. "The Use of Faces to Represent Points in n-Dimensional Space Graphically." Journal of the American Statistical Association (June 1973).

Levine, Robert V. "The Pace of Life." American Scientist (Sept./Oct. 1990).

Williams, Marian G., Stuart Smith, and Giampiero Pecelli. "Experimentally Driven Visual Language Design: Texture Perception Experiments for Iconographic Displays." Proceedings of the 1989 IEEE International Workshop on Visual Languages (Oct, 1989).

Products Mentioned

VZ Programmer VZ Corporation 57 West South Temple Street Salt Lake City, UT 84101 800-627-8851 or 801-595-1352

Database

Graphical Data Visualization

GRAPHICAL DATA VISUALIZATION

Implementing an interactive GUI

Marian G. Williams and Peter D. Varhol

Taking the Solution at Face Value

A Graphical Implementation of Chernoff's Faces

Example 1: (a) Entering and error checking a data value; (b) using a menu to call up new data analysis window

Putting the Pieces Together

How VZ Programmer Contributed to the Task

References

Products Mentioned

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Graphical Data Visualization

GRAPHICAL DATA VISUALIZATION

Marian G. Williams and Peter D. Varhol

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content