Dr. Dobb's | Bulletproofing C++ Code

Bulletproofing C++ Code

Sergei examines techniques that improve the stability and reduce the risks of errors for programming.

January 09, 2007
URL:http://www.drdobbs.com/cpp/bulletproofing-c-code/196802351

Sergei is the C/C++ Solution Manager at Parasoft. He can be reached at [email protected].

C++ turned 22 this year—yes, it was 22 years ago that Bjarne Stroustrup published the first specification of the language. Moreover, C has another dozen or so years on C++. Despite all predictions that Java and .NET languages will make C++ obsolete, C and C++ development remains healthy. More than 3 million programmers maintain and enhance the C and C++ applications currently in use and continue to write significant amounts of new code.

C/C++ programmers face somewhat different challenges than programmers working in other languages and domains. Today's C++ projects often involve modifying or extending existing systems, refactoring available code for new applications, or integrating existing modules in new ways. Most commonly, a team inherits a pile of C or C++ code from a different group, an outside vendor/contractor, or the open-source community. Such projects inevitably require modifying some amount of unfamiliar code (while getting to understand it in the process) and maintaining it going forward.

In this article, I examine several techniques that, when properly and consistently applied and monitored, reduce the risks of errors when diverse teams of programmers develop code on top of existing code bases, as well as improve the stability and quality of the project's evolving code base.

Step 1: Use Static Analysis

Figuring out a bug, or at least getting some understanding of the code's properties without executing it, has always held a great deal of significance for programmers. In fact, this is a cornerstone of code reviews—using a human brain to read and understand the code, looking for defects with a fresh eye. Peer code reviews remain the best approach for finding code defects. On average, 60 percent of defects can be removed via code reviews; see the "Software Defect Reduction Top 10 List" by Barry Boehm and Victor R. Basili (www.cebase.org/www/resources/reports/usc/usccse2001-515.pdf). This should not be surprising because code reviews use the finest analysis instrument—the human brain. A quest for code review automation, so that reviewers could concentrate on tasks of higher intelligence rather than validating conformance to coding conventions, drove the development of automated static-analysis tools, which has made tremendous strides in the last few years.

Properly implemented, automated static analysis effectively identifies the code's soft spots and possible bugs in a relatively short time—something that can't be achieved manually. For example, a commonly encountered error is failing to recognize that floating numbers cannot be compared with the "==" operator, unlike, say, integers or classes with implemented operator==; see the control expression for an if statement in Listing One.

bool SetClient::contains(double d, SetType & h)
 {
   for (SetType::const_iterator it = 
                 h.begin(); it != h.end(); ++it) {
        // Pointer to a double of value d?
        if (*((double*)((*it)->getValue())) == d) {
            return true;
        }
   }
   return false;
 }

Listing One

This is one of the more common checks supported by static-analysis tools. More sophisticated algorithms are required to spot data- or control-dependent errors, such as using NULL pointers to objects or files or locally leaking memory. For example, if Domino's failed to deliver a pizza, the following code's logic would no longer work, and the pizza consumption program would crash at the attempt to determine if the pizza was sliced:

Pizza* cheesePizza =  	Dominos::deliver();
if (!cheesePizza->sliced())
	Slice(cheesePizza);

More advanced static-analysis tools can analyze the source code of all functions involved (as long as they are available in source form) and will raise red flags for our pizza example that the cheesePizza pointer is dereferenced without checking if the deliver() function can return NULL under some conditions.

Two prevalent types of static analysis today are pattern matching and dataflow analysis. The pattern-matching approach supports implementation of a set of coding practices, or a coding policy. For C++, many such "best practices" are defined in books by Scott Meyers, Herb Sutter, and Andrei Alexandrescu; the "Gotchas" series by Dan Saks and Steve Dewhurst; de facto industry standards such as Ellemtel and MISRA; and common guidelines for 32- and 64-bit portability. In addition to supporting the common C++ guidelines, static-analysis tools should also provide the ability to customize rules to suit your specific application and implementation context, as well as help you identify code patterns indicative of application-specific defects.

Dataflow analysis is based on the idea that program code can be compiled into graph-like representations, which can be simulated to determine whether certain execution paths can (in principle) lead to structural bugs such as memory leaks or (the favorite bug of the Java pundits) array bounds errors. This bug-prevention technique is powerful because it does not depend on user input to identify bugs that, in reality, are data dependent. This technique can only find bugs in available source code (binary libraries will present the equivalent of an impenetrable fence), but this limitation does not detract from its high ROI.

Effective application of static analysis goes beyond buying the appropriate tools; it also requires the careful application and monitoring of processes. To start, the team architect (or other designated leader) defines a coding policy that team members understand and respect, then configures the tool to automatically check that coding policy. If, like most C++ development teams, you're charged with extending and maintaining large inherited code bases, the next step is typically to scan that code base and resolve all reported violations. This removes a significant number of hidden defects in the code you will be building upon, improving the stability of the foundation code.

After cleaning the violations in the legacy code, establish a process for when the team starts building upon that code base. Ideally, this involves implementing two levels of automated static analysis.

The first level occurs on the programmer's desktop. As programmers complete new or modified code, they perform static analysis from their IDE, then resolve all reported violations before committing the related code to source control.
The second level is performed from a team server machine, where batch-mode analysis automatically checks the entire code base each night, and violation counts are tracked and correlated with other application quality metrics.

As additional defects surface (reported by developers or testers) and are fixed, the team should not miss the opportunity to analyze their root causes and try to identify a set of rules that prevents the same bugs from recurring, then configure the static-analysis tool to automatically check the entire codebase against these rules.

Step 2: Establish a Reliable Regression Base

Whenever new capabilities are added or bugs are fixed in existing code, regression bugs are a common yet unintended side effect. These bugs occur when a certain software function no longer works as expected as a result of some other code changes (see en.wikipedia.org/wiki/Regression_testing). In fact, studies have shown that, on average, 25 percent of software defects are introduced while programmers are changing and fixing existing code during maintenance; see Software Process Improvement by R.B. Grady (Prentice Hall, 1997). Regression testing is the primary technique for containing such risks. With regression testing, the software behavior captured by previously run tests is verified by rerunning the tests and comparing the current results with those from the originally captured "golden set."

A challenging situation arises when the inherited code to be modified lacks regression tests. In such cases, try to resist the temptation to "hack at it anyway," and first create some regression tests before making any code changes. This safety net will help verify that local code changes do not break something else in the code. Otherwise, attempting to maintain the integrity of the previous functionality while introducing new features becomes a high-stakes crapshoot.

There are two complementary approaches to creating such a test suite. A manual approach involves writing some tests at the highest abstraction level of the code. Because writing tests by hand is generally expensive, it's best to minimize that effort upfront while still capturing the reference behavior. This can be accomplished by identifying the module-under-test's high-level APIs and exercising them first. The tests should follow the module's "positive" behavior, and produce some code coverage for expected conditions. This is beneficial because a few tests aimed at the high-level API exercise many of the module's lower level functions. The resulting test coverage can be amended with lower level tests as required to increase the test coverage to a comfortable level before you change the code.

Another way to approach this problem is by creating tests from the bottom up, starting from the leaf-level functions. Such tests typically exercise more paths through the code. However, the bottom-up approach is much more labor intensive because many more tests need to be created. To get results in a reasonable time, you need an automated tool capable of auto-generating API tests (either with semirandom inputs or using specified value ranges) and capturing their results. Some commercial testing tools offer this capability; for public domain software, see tools such as CxxTest (cxxtest.sourceforge.net).

The primary method for assessing the completeness of such a regression test suite is to track test coverage—a cumulative measurement of how much code the tests actually exercise. Although the commonly used line coverage metric is certainly better than nothing, it is by no means sufficient. While line coverage indicates what code was executed, it mostly ignores the aspect of what conditions led to executing it. In general, programs can be thought of as giant state machines with code paths representing (sequences of) transitions between the states. Real paths are formed by control branches activated by appropriate values for conditions. Hence, in order to measure how much of the real behavior of a software system is captured by tests, it is typically necessary to track path, branch, and condition coverage metrics. A number of commercial tools are able to support this level of coverage analysis.

Once the test suite is in place, its execution must be automated so that it can be run on a regular and frequent basis. This is an absolute requirement for reliably checking existing functionality. Use scripts, crontab, CruiseControl, or any other automated execution fixture to run the test suite automatically, and collect coverage from the runs. In addition, I recommend using a runtime memory-error detection tool during test suite execution. These tools accurately detect runtime bugs, cover both your source and third-party libraries, and generally check for errors that can't currently be detected via static analysis.

At this point, you should be prepared to start changing the code.

Step 3: Develop Unit and Regression Tests

Once you begin writing code or fixing issues in existing code, there's no excuse for neglecting to properly test the code you wrote, even if the rest of the code lacks such diligently created test cases. This is the time to apply Test Driven Development (TDD) or any other technique that keeps test cases at the front of—or at least tightly linked to—the development process.

Depending on the nature of the code change, functional-level (application-level) regression tests and/or unit-level regression tests may be appropriate. Each has its own benefits. The functional-level tests are preferred in cases when tests can feasibly be set up to represent a high-level requirement to implement—or a specific bug that should be addressed—and the tested program can produce sufficient output to effectively validate the code's behavior (via the direct outputs it generates, log messages, debug messages, and the like). Unit tests are appropriate when test setup is complicated because the error condition itself is not a common scenario, when the changed code is buried deep in the module hierarchy, or when it's difficult to validate the test outcome at the functional level (that is, the program lacks debugging facilities). This is frequently the case with corner-case conditions or error-handling functions. These are supposed to trigger in exceptional circumstances, which would be difficult to set up in a test harness.

For example, take a real-life problem discovered in the file-processing utility in Listing Two. One of the functions of this file utility is "normalization" of the file path. If the file foo.c is said to be in directory /a/b/c/../d, the utility transforms this into /a/b/d/foo.c. However, when a directory path starts with .. (dot dot), this part of the path is discarded as a result of the code annotated with the comment // BUG; thus ../../a/b/c is transformed into /a/b/c. Listing Three eliminates this problem.

void Path::normalize(PathInfo& input) 
{ 
  PathElements result; 
  PathElements::const_iterator it =            input.elements.begin(); 
  PathElements::const_iterator end =            input.elements.end(); 
  for (; it != end; ++it) { 
      if (*it == "..") { 
          if (result.empty()) {   // BUG
              result.push_back(*it); 
          } else { 
              result.pop_back(); 
          } 
      } else if (*it != ".") { 
          result.push_back(*it); 
      } 
  } 
  input.elements.swap(result); 
}

Listing Two

void Path::normalize(PathInfo& input) 
  PathElements result; 
  PathElements::const_iterator it  =            input.elements.begin(); 
  PathElements::const_iterator end =            input.elements.end(); 
  for (; it != end; ++it) { 
     if (*it == "..") { 
       if (result.empty() 
          || result.back() == "..") { // BUG fixed
            result.push_back(*it); 
       } else { 
            result.pop_back(); 
       } 
     } else if (*it != ".") { 
        result.push_back(*it); 
    } 
  } 
  input.elements.swap(result); 
}

Listing Three

Additionally, tests have to be created to verify the fix. When considering tests to validate the fix, the choice of tests is influenced by the fact that this function is part of a common library, and both the defect and fix are completely contained within the function. Rather than create a functional test for the file utility, it's actually easier to create a unit test for the fix using an appropriately crafted string for a file path:

void PathTest::testGetNormalized() 
{ 
  Path test
      ("../../a/../b/c/./d/../e"); 
  Path result = 
      test.getNormalized(); 
  CPPTEST_ASSERT_CSTR_EQUAL
      ("../../b/c/e", 
    prepareNormalizedPathString
      (result.toString()).c_str()); 
}

This unit test can be complemented by the functional test of the file utility, as long as it can reliably capture the utility's output for the specific file and verify that it is correct.

No matter what type of tests you create for a new code fragment, they must be added to the established regression suite and be verified on a regular basis. This assures that if subsequent changes break this functionality, the problem will not go unnoticed.

Step 4: Static Analysis and Code Review

To complete the loop, any new code needs to be analyzed by the same static-analysis process that was established in Step 1. Ideally, programmers perform static analysis on their desktops so that they can check and correct code before committing it to source control. Even if this is not possible, the new code—once it is committed to source control—is still subject to static analysis when the batch process checks all code available in the code base.

Once the new code passes all the checks, including a clean bill of health from static analysis and regression testing with sufficient coverage, it should be finalized by a team code review. It does not really matter how the code review is conducted—it can be integrated into the code-authoring process (as in pair programming), distributed to programmers with automatic notification based on source-control commits, or performed in the old-fashioned manner by gathering people into a room with code printouts. The key is to have the code reviewed by humans.

Conclusion

This four-step process illustrates how C++ development teams can approach the common C++ development tasks of managing and modifying unfamiliar legacy code in a way that promotes a methodical approach and yields high-quality results. Although each of the techniques I introduce is a good practice on its own, combining them in an automated process framework helps teams effectively address their productivity and code-quality goals, as well as confidently manage development risks. Further gains can be realized by monitoring and continually improving the process.