Dr. Dobb's | Detecting Bugs in Safety-Critical Code

When software is used for safety-critical applications, bugs aren't just expensive annoyances—they can kill.

When software is used for safety-critical applications, bugs aren't just expensive annoyances—they can kill. Faced with such dire consequences, developers of safety-critical systems go to great lengths to prevent bugs from making it into the field. These measures are undeniably effective at reducing risk. Although there have been some famous catastrophic failures over the years, if medical devices or flight-control systems failed as often as most software fails, the headlines would be much grimmer.

So how do they do it, and how can those of us who do not write safety-critical code emulate their success? Well, there are many strategies, but two stand out as being key and offer important lessons for other developers—static analysis and rigorous testing.

Historically, static analysis had been used to enforce standards or style rules, and do some superficial syntactic checks for patterns that might indicate flaws. While helpful, especially as standards such as Misra C (misra.org.uk) or JSF C++ (www.research.att.com/~bs/JSF-AV-rules.pdf) are widely used by safety-critical software developers, these old-style tools have been difficult to use effectively, not least because of their high false-positive rate. Recently, a new breed of "advanced" static-analysis tools has emerged. These are capable of finding serious bugs such as buffer overruns, null pointer dereferences, resource leaks, and race conditions. They can also highlight inconsistencies or contradictions in the code, such as unreachable code, useless assignments, and redundant conditions, all of which often indicate programmer confusion, and correlate well with bugs. In Gerald Holzmann's "Ten Rules for Writing Safety-Critical Code" (www.spinroot.com/p10), rule 10 specifies that advanced static-analysis tools should be used proactively all through the development process.

Systematic testing is the other prong. As well as being a good idea, often it's also the law. Regulators such as the FAA specify strict rules about how code is tested before it can be deployed in a safety-critical device. In some cases, developers must demonstrate that test suites achieve full coverage of the code. The trouble with this is that it can be enormously expensive to develop these test suites. However, advanced static-analysis tools can help reduce the cost by steering developers away from futile or unnecessary work.

In this article, I focus on advanced static analysis, how it complements traditional testing, and how it can be used for both bug finding and for reducing testing costs.

Advanced Static Analysis

With roots in esoteric techniques such as model checking and abstract interpretation, advanced static analysis is not your father's technology. Instead of complaining endlessly about stylistic discrepancies or superficial syntactic problems, this new breed of tools can find serious coding flaws. They integrate easily with most build systems, report their results in a reasonable time, create easy-to-understand reports, and have a manageable false-positive rate.

There is no substitute for rigorous unit, regression, and integration testing. Ideally, your tests would feed in as many combinations of inputs and conditions as possible such that all parts of the code are exercised thoroughly. Statement coverage tools can help you develop a test suite that makes sure that every line of code is executed at least once. But as all programmers know, just because a statement executes correctly once doesn't mean it will always do so—it may trigger an error only under a very unusual set of circumstances. There are tools that will measure condition coverage and even path coverage. These are all helpful for exercising these corner cases, but achieving full coverage for nontrivial programs is extraordinarily time consuming and expensive.

This is where static analysis shines. Static analysis examines paths and considers conditions and program states in the abstract. By doing so, it can achieve much higher coverage of your code than is usually feasible with testing. Best of all, it does all this without requiring you to write any test cases.

This is the first major way in which static analysis reduces the cost of testing. The cheapest bug is the one you find earliest. Because static analysis is a compile-time process, it can find bugs before you even finish writing the program. This is usually less expensive than if you have to find them by writing a test case or debugging a crash.

How Tools Perform Analysis

You can think of an advanced static-analysis tool as doing a "pretend" or abstract execution of your program. Instead of variables containing actual concrete values, they contain abstract values. Values that are inputs are given symbolic names, too. The analysis then pretends to execute the program using these symbols by following paths through the code. As this execution proceeds, the analysis may learn facts about the variables and how they relate to each other. For example, because of the conditions on the path leading to a statement, the analysis may determine that if variable x is greater than 10, then pointer p may be NULL. As this execution proceeds, the analysis looks for anomalies. For example, if p is dereferenced at that statement, then the analysis issues a report that a null pointer dereference may occur. A good tool explains the reasoning the tool used to arrive at that conclusion, which is essential to help you determine whether the report is a true or false positive.

This is the essential difference between testing and static analysis: Testing uses real inputs and concrete values, whereas static analysis uses symbolic inputs and abstract values. The appeal of the latter is that because each abstract value represents a wide range of possible concrete values, static analysis can take into account many possible program states simultaneously, and so achieve much greater coverage than testing.

Static analysis finds places where violations of the fundamental rules of the language or libraries might lead to errors. The following illustrates some of the most important classes. The first class is the most serious—bugs that either cause the program to terminate abnormally or result in highly unpredictable behavior. These include buffer overrun and underrun, null pointer dereference, division by zero, and use of uninitialized variables. Memory allocation errors are those that result from the misuse of malloc or new. These can be tricky to debug because the erroneous behavior may only show up long after the event that caused the error. Such errors include double free, use after free, and memory leak. Concurrency bugs may be caused by misuse of the threads library. Double locks or unlocks, race conditions, and futile attempts to lock are among the checks that are available.

A second class of check is for inconsistencies or redundancies. These are not bugs per se, but are often indicators that a programmer misunderstood something. For example, if you check the return value of a function 99 percent of the time, then that 1 percent that is left unchecked may indicate such a problem. This class includes redundant conditions, useless assignments, and checking whether a pointer is null after it has already been dereferenced.

Finally, some tools let you author your own checks. The mechanism for this differs widely between tools. CodeSonar (the tool I work on) lets you write little C code stubs that use a simple API to check for properties. These stubs are never actually executed: They are fed to the analysis alongside your own source code and are then symbolically executed. Statements that are analogous to asserts are used to signal that a warning should be reported.

Static analysis can't check all properties of your program. If your code contains a logic error that causes your program to produce the wrong result, then static analysis will usually be of little help finding that bug. That's what testing is for.

Code Sample

To illustrate the kinds of flaws that advanced static analysis is capable of detecting, take a look at the report in Figure 1. All examples are distilled from real flaws found in production software.

The code in red is on the path to the error and the column on the left shows the values that the conditions that must hold for the error to occur. Code with a yellow background is where the error occurs, and code with a green background indicates that something directly relevant to the error occurred at that line. In this report, the analysis has found that the variable named state may be null (actually <= 4095 indicates that it is in the 0th page of virtual memory). Following the path back, you can see that this was returned by a call to acquire_state on line 24. Note also that the conditional on line 26 was not true, indicating that the analysis has deduced that the value of acquire_err is REG_NOERROR along this path. That variable also gets assigned to in the call to acquire_state. To see what happens in there, refer to Figure 2 where you can see that on the path taken, *err (which gets passed back as acquire_err at the call site) is indeed REG_NOERROR, and the return value is NULL. Had this path been tested, it would definitely have resulted in a program crash.

[Click image to view at full size]

Figure 1: Null pointer dereference report from CodeSonar.

[Click image to view at full size]

Figure 2: Snippet from the function called.

False Positives

It is trivially easy to write a static-analysis tool that finds all the bugs in your program—one that reports all lines as bugs satisfies this criterion. Similarly, it is trivial to write a tool that never reports a false positive—one that tells you that all lines are bug free will suffice. Obviously, neither tool is useful. The real measure of the effectiveness of a static-analysis tool is how well it simultaneously balances the false-positive rate with the false-negative rate. For almost all nontrivial programs, all serious static-analysis tools that attempt to find bugs report some false-positive results, and none are capable of finding all of the bugs in such programs.

Too many false positives means that you may spend too much time sifting through the chaff looking for the real bugs. This robs your development effort of resources that might be better-spent finding bugs through other methods. A high false-positive rate has a subtle psychological implication too: As it increases, users are less likely to trust the results, and are more likely to erroneously tag a true positive as a false positive.

Different kinds of bugs merit different levels of effort to find, depending on the amount of risk they carry. A buffer overrun in a medical device may be life threatening, whereas a leak in a game controller that means it must be reset once a day is very low risk. The amount of risk determines the false-positive rate that users are prepared to accept. In practice, we have found that for the serious class of flaws, such as buffer overruns and null pointer dereferences, users are often prepared to accept a false-positive rate of as much as 75-90 percent. For less risky classes, a false-positive rate of more than 50 percent is usually considered unacceptable.

Static Analysis and Systematic Testing

Again, safety-critical developers are required to demonstrate coverage. There are of course different kinds of coverage, and the risk the code carries dictates which kind of coverage is required. In the DO-178B Standard for aviation, the riskiest code requires 100-percent modified condition/decision coverage (MCDC). The next two most risky classes require 100-percent decision coverage and statement coverage, respectively. The least risky code, such as the inflight entertainment system, has no coverage requirements at all.

Most developers are familiar with statement coverage, and anyone who has tried to test with 100-percent statement coverage knows how difficult it is to achieve. Decision coverage is more stringent—it requires that all branches in the control flow are taken. You must make test cases that force all your conditional expressions to evaluate both true and false. MCDC coverage is stronger still, and means that all subexpressions in a conditional should be evaluated to both true and false. Consider what this means in terms of the number of test cases for this code snippet:

Table 1 shows the number of test cases required to achieve full coverage for each coverage criterion, and example values required of the inputs.

Coverage	a	b	c
Statement (1 case)	1	—	—
Decision (2 cases)	1	—	—
	0	0	0
MCDC (4 cases)	1	—	—
	0	1	—
	0	0	1
	0	0	0

Table 1: The number of test cases required for full coverage.

Clearly, achieving full coverage is nontrivial. What can make it extremely frustrating is if it is fundamentally impossible. If your program contains unreachable code, then you cannot even get full statement coverage, never mind decision or MCDC. Similarly, if your code contains redundant conditions (conditions that are either always true or always false), then you might achieve statement coverage, but not decision coverage.

A developer may spend hours trying to develop a test case before it becomes evident that it is impossible. If you know going in that this is the case, then you don't have to waste time on it.

In Figure 3, the value of variable rest (which is an unaliased automatic integer) must be at least 3 by the time you get to line 12. The decrement means it is at least 2 at the comparison, so the condition never evaluates to false. The following line 13 is also redundant by the same reasoning, and appears in a different report. This is a fairly simple example, and many humans would have spotted this in a code review because all the relevant expressions are in the same vicinity. If the bodies of the conditionals had been larger or if the conditions themselves had been embedded in a function or a macro, then the weakness would not have been as easy to spot. However, the static-analysis tool would not have been fooled.

[Click image to view at full size]

Figure 3: Redundant condition warning from CodeSonar.

These kinds of issues correlate well with bugs, too. Consider Figure 4, in which the column on the left shows the condition that cannot be satisfied: pb->type can never be equal to 3. This value comes from a call to get_type(); see Figure 5. The error is on line 30 and is clearly a typo—the programmer missed the final X.

[Click image to view at full size]

Figure 4: Fragment of a report from CodeSonar illustrating a redundant condition.

[Click image to view at full size]

Figure 5: The body of get_type, showing that there is an error on line 30.

Conclusion

Bugs are a plague on our profession and bad for business. That's why advanced static-analysis tools should be in every programmer's toolbox, and not just for safety-critical systems. They help find real errors, and reduce cost.