C/C++

SourceMonitor: Expose Your Code

By James F. Wanner, March 01, 2000

SourceMonitor is a programmer's metrics tool that Jim wrote to expose the size and quality of his source code. To parse source code, Jim used Sandstone's Visual Parse++; to create reports, he turned to Stingray's Objective Chart.

Mar00: Programmer's Toolchest

Jim is a programmer who has been writing code for many years, starting in assembler and Fortran and working more recently in C++ and Java. He can be contacted at [email protected].

Most software metrics tools have been developed to serve the needs of managers or academic researchers, rather than working programmers. I'd have been happy to use such tools if I could have found one that suited my needs. In a nutshell, what I needed was a metrics tool that would expose the size and quality of my source code -- a program that would quickly scan my code, accumulate some simple metrics counts (such as physical lines and statements), and point me to code that might be hard to maintain due to its complexity. I didn't need anything fancy, but the tool had to be easy to use or I probably wouldn't bother using it.

A quick search turned up two kinds of products -- simple ones that dumped output to the DOS command line (or to a printed report), and managerial ones that cost far more than any programmer can afford. Most products attempted to make sophisticated assessments based on some theory of software complexity and/or correctness. As a result, they tended to be slow and difficult to configure. Few maintained a history of their results so I could see if I was winning or losing. In a way, what I found was not "good enough" metrics but rather "too good" metrics. Maybe that's what managers want, but it's not what programmers need.

Finally, I decided that if I couldn't buy the tool I needed, then I would have to build a metrics tool specifically designed for programmers. I call my better mouse trap "SourceMonitor." It is simple but fast -- it processes more than 15,000 lines of C++ code per second on a 550-MHz Pentium III. I use it to expose code that is of low quality, to track progress on a project, and to assess code written by others (code I'm asked to review or to modify). It sits on my Tools menu in the Microsoft Visual Studio IDE so I can check the file I have open or assess my current Visual Studio project.

SourceMonitor is written using Microsoft Visual C++, MFC, Visual Parse++ from Sandstone Software (http://www .sand-stone.com/), and Objective Chart from Stingray Software (http://www .stingray .com/). All in all, SourceMonitor uses the entire parse engine from Visual Parse++ and about a dozen files from Objective Chart. Normally, you would have to purchase a license (as I did) for access to these files. However, for the purposes of this article, both Sandstone and Stingray have graciously allowed me to distribute these files with this article for noncommercial use. Thanks to the generosity of these vendors, the complete source code to SourceMonitor is available electronically; see "Resource Center," page 7. Even though I implemented SourceMonitor in C++, Visual Parse++ uses a parsing engine that is controlled by a different parse table for each language. SourceMonitor has parse tables for C/C++, Java, Delphi, Visual Basic, and HTML.

I've read about lines of code and function points as well as other metrics that have helped others manage software development projects (see "Automated Metrics and Object-Oriented Development," DDJ, March 1998). While these metrics may work for project managers, many of them are inappropriate for working stiffs trying to look at their own work. For me, speed of execution was as important as measurement sophistication. If collecting metrics weren't quick and painless, I probably wouldn't bother to do it very often. My goal was to create a tool that measured as much useful information as possible in the shortest time.

Of course, speed always has its price. SourceMonitor measures source code one file at a time in a single pass. This means that sophisticated system-wide parameters are ignored. For C++, where class code is typically placed in two separate files (.h and .cpp), some information, such as inline methods, will be lost. Still, file-by-file measurements can be summarized across a project to provide an overview with considerable value. SourceMonitor measures both quantity and quality metrics as shown in Tables 1 and 2. Not all metrics are collected for all languages, as pointed out in the table. Selection of metrics for each language was somewhat arbitrary. A language that is easily parsed, such as Delphi's Object Pascal, tempted me to include more metrics. In addition to the metrics themselves, SourceMonitor records the size and location of the biggest method or subroutine and the line number of maximum nested depth. These are used in the source viewer window so users can go directly to these elements.

What Is Quality?

Dividing metrics into quantity and quality categories is somewhat arbitrary. In addition, one's definition of quality will vary based on experience. SourceMonitor offers no rules for good metrics -- it just dishes up the numbers for you to use as you see fit. For example, I once worked in a group that had software audited by a potential customer. Some of the code had an average block depth around 2, but some registered over 10. A quick look at the latter revealed serious quality problems. That's what I was after -- a quick way to find bad code.

Coding style has an important impact on some of these metrics as well. For example, the percentage of lines with comments can vary significantly. When I review code written by others, I tend to look more for extremes rather than specific values. If someone's code has very few comments, I tend to think I'd have trouble working with it (your take may be different). In general, I look for the following in my code:

Percent comments: 10 to 50 percent.
Percent branches: 10 to 30 percent.
Average methods per class: 0 to 20.
Average statements per method: 1 to 25.
Average block depth: Below 1.8 (below 2.8 for Java).

I measured the following metrics for the code used to build SourceMonitor:

Percent comments: 30.4 percent.
Percent branches: 18.5 percent.
Average methods per class: 7.53.
Average statements per method: 12.5.
Average block depth: 1.44.

Test Drive

SourceMonitor measures metrics, displays the results in tables, charts them in graphs, prints them, saves them to a file for later review, and exports them for further manipulation in spreadsheets or databases. In addition, SourceMonitor produces a detailed display of the metrics for any one file or project checkpoint.

As an interesting (if somewhat artificial) example of how SourceMonitor works, I collected several versions of the source code for Sun's Java SDK and Microsoft's MFC. I then created a project for each and created a checkpoint for each release of each product. (Normally, I would create a checkpoint as each release of my code is completed, but in this case, all of the checkpoints were created on the same day.) The Java project contains the three main releases of the Java SDK classes, as shown in Figure 1. The Java metrics are shown in the list view headers and summary results for all files in the checkpoint are displayed for each checkpoint in the project.

The power of persisted checkpoints is immediately obvious -- the size of the Java class library has grown rapidly. SDK 1.1.8 is approximately 10 times larger than SDK 1.0.2 and SDK 1.2.2 is almost double the size of SDK 1.1.8. The quality has changed far less as the class library has grown. Checkpoints for the MFC project, shown in Figure 2, reveal a similar growth over time. Figure 3 shows a graph of the number of classes in each MFC checkpoint (SourceMonitor can display a graph of each metric).

In addition to the metrics summary for each checkpoint, SourceMonitor saves the metrics for each file as well. These are displayed in a second window as illustrated in Figure 4. By clicking on a column header, you can sort the files by the column's metrics data (a second click on the column header reverses the sort order). For example, the files in Figure 4 have been sorted by the descending number of "Statements." The pop-up menu in Figure 4 provides access to additional information for any file in the checkpoint. For example, Figure 5 shows the File Metrics Detail dialog and Figure 6 the file viewer for the selected file. The viewer is handy because it has buttons to take you to the first statement in the deepest nested block and to the biggest method.

Charts that summarize all the files in a checkpoint are available to help you assess the wealth of data produced by SourceMonitor. For example, Figure 7 shows the frequency of average block depth metrics for the files in the checkpoint in Figure 4. This graph shows that although there are a few files in the checkpoint with very deep (and therefore complex) block nesting, overall the code is well structured. Of course, with experience you can develop your own tolerances for what is acceptable and what is too complex and thus too difficult to maintain.

In addition to visual presentations of metrics in charts and list views, SourceMonitor can print any chart or copy any chart image to the Windows clipboard. The contents of any list view can be printed in the form of a report or exported to a file in comma-separated value format suitable for import into spreadsheets or databases.

What to Do With All This Data?

SourceMonitor is one of those tools I almost forget about until I push back the keyboard and think about the bigger picture. If I seem to be losing control of a project, I run a checkpoint on it and look for the complex code. Often, some simple restructuring and repartitioning is indicated by classes with too many methods or blocks nested too deep. I usually find that just one or two classes need some attention. When I clean them up I often find everything else gets easier. Maybe it's magic but it works for me.

Another time I reach for SourceMonitor is when I'm asked to do a project similar to one I've already done. I can look at the number of statements in the old project and get a much better idea of how long the new project will take. Finally, I always run code through SourceMonitor as part of my preparation for a code review (you do have code reviews in your shop, don't you?).

Under the Hood

I created SourceMonitor with Microsoft Visual C++ 6.0 and MFC. The program is a standard MDI application with two document classes -- one for projects and one for checkpoints -- with corresponding view classes (see Figure 1 and Figure 4). Additional view classes were created to display the charts. I elected static binding of the MFC and run-time code so I could avoid hassles with DLL versions (this added about 500 KB to the executable but eliminated the distribution of the much larger MFC DLLs). To keep this tool tidy, I did not use a database to persist the file and checkpoint data but instead implemented the MFC Serialize logic. All data for a project is saved in a single file that contains serialized objects. The project file for the MFC project displayed in Figure 2 occupies about half a megabyte.

The persistent classes are diagrammed in Figure 8. Only the primary class properties are shown. Each project file contains a version object followed by a project object. Within the project object, one or more checkpoint objects are embedded, each of which contains one or more file objects. The version object is persisted separately so changes in the other persisted objects can be handled by the Serialize() method as necessary if a new version should change their structure. I used MFC containers instead of STL here because of their support for MFC serialization.

The basic SourceMonitor MDI application collects and displays metrics using the same logic for all source-code languages. Except for a switch statement in the CDoc::OnNewDocument() method, the user interface code never needs to know which language a project uses. I achieved this code reuse with James Coplien's exemplar idiom (see Advanced C++ Programming Styles and Idioms, Addison-Wesley, 1992). This idiom extends run-time polymorphism to the object level so that an object can take on more than one behavior.

Each language has its own class based on a common SMLanguage base class, and each implements a MakeNew() method that creates the correct language-specialized object and initializes it as appropriate. In effect, creation of a language object is a two-step process: Create the language exemplar, then ask it to create the desired language object. You could say that the exemplar is a class factory for language objects of a certain type. In fact, the normal constructors for all of these classes are protected so the only way to create a language object is to call the MakeNew() method. Once the language exemplar is installed in a project object, all other logic is the same for all languages.

In SourceMonitor, exemplars are used to hide the different metrics collected for each language while maintaining a common interface to the code that invokes metrics data collection and display. What amounts to an empty language object, a language exemplar, is embedded in each project object (see Figure 8). This exemplar's MakeNew() method is called to acquire a properly initialized language object for each checkpoint object and for each of a checkpoint object's file objects. These language objects are designed to hold either the summary metrics for a checkpoint or the metrics for an individual file. The power of the exemplar is that all of the specialized logic regarding the metrics for a language is encapsulated inside the objects created by the exemplar. This encapsulation greatly simplifies all of the other SourceMonitor code. It also simplifies the process of adding languages to SourceMonitor.

Parsing Source Code

The parser does the real computational work in SourceMonitor, even though the UI code appears to introduce most of the complexity. I used Visual Parse++ to create parsing tables for each supported language. These tables are stored as custom resources inside the SourceMonitor executable file. I then added the Visual Parse++ parsing engine to SourceMonitor and passed it the proper language table at run time. This engine calls a reduce() method defined in a file generated by Visual Parse++ when it compiles a grammar. I added logic to this method to accumulate counts and capture other data such as method names and source-code line numbers.

The Visual Parse++ parser is not only very fast at run time, but it comes with a visual IDE that speeds up the process of grammar definition. When you have a grammar defined, you can test it in the IDE and watch the parser recognize lexemes and fire productions as it moves through a source code file. I did not elect to use any of the full language grammars supplied with Visual Parse++ but instead defined a simple grammar for each language that identified only those elements that I wished to count. It is likely that some unusual source-code constructs will fool my grammars and I will miss a count or two. However, I am willing to accept an occasional error in exchange for the parsing speed of a simpler grammar. If the parser chokes on any source code, SourceMonitor displays the file, line, and column where the parser had trouble, then keeps going.

DDJ

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++