Advanced Static Analysis
With roots in esoteric techniques such as model checking and abstract interpretation, advanced static analysis is not your father's technology. Instead of complaining endlessly about stylistic discrepancies or superficial syntactic problems, this new breed of tools can find serious coding flaws. They integrate easily with most build systems, report their results in a reasonable time, create easy-to-understand reports, and have a manageable false-positive rate.
There is no substitute for rigorous unit, regression, and integration testing. Ideally, your tests would feed in as many combinations of inputs and conditions as possible such that all parts of the code are exercised thoroughly. Statement coverage tools can help you develop a test suite that makes sure that every line of code is executed at least once. But as all programmers know, just because a statement executes correctly once doesn't mean it will always do soit may trigger an error only under a very unusual set of circumstances. There are tools that will measure condition coverage and even path coverage. These are all helpful for exercising these corner cases, but achieving full coverage for nontrivial programs is extraordinarily time consuming and expensive.
This is where static analysis shines. Static analysis examines paths and considers conditions and program states in the abstract. By doing so, it can achieve much higher coverage of your code than is usually feasible with testing. Best of all, it does all this without requiring you to write any test cases.
This is the first major way in which static analysis reduces the cost of testing. The cheapest bug is the one you find earliest. Because static analysis is a compile-time process, it can find bugs before you even finish writing the program. This is usually less expensive than if you have to find them by writing a test case or debugging a crash.
How Tools Perform Analysis
You can think of an advanced static-analysis tool as doing a "pretend" or abstract execution of your program. Instead of variables containing actual concrete values, they contain abstract values. Values that are inputs are given symbolic names, too. The analysis then pretends to execute the program using these symbols by following paths through the code. As this execution proceeds, the analysis may learn facts about the variables and how they relate to each other. For example, because of the conditions on the path leading to a statement, the analysis may determine that if variable x is greater than 10, then pointer p may be NULL. As this execution proceeds, the analysis looks for anomalies. For example, if p is dereferenced at that statement, then the analysis issues a report that a null pointer dereference may occur. A good tool explains the reasoning the tool used to arrive at that conclusion, which is essential to help you determine whether the report is a true or false positive.
This is the essential difference between testing and static analysis: Testing uses real inputs and concrete values, whereas static analysis uses symbolic inputs and abstract values. The appeal of the latter is that because each abstract value represents a wide range of possible concrete values, static analysis can take into account many possible program states simultaneously, and so achieve much greater coverage than testing.
Static analysis finds places where violations of the fundamental rules of the language or libraries might lead to errors. The following illustrates some of the most important classes. The first class is the most seriousbugs that either cause the program to terminate abnormally or result in highly unpredictable behavior. These include buffer overrun and underrun, null pointer dereference, division by zero, and use of uninitialized variables. Memory allocation errors are those that result from the misuse of malloc or new. These can be tricky to debug because the erroneous behavior may only show up long after the event that caused the error. Such errors include double free, use after free, and memory leak. Concurrency bugs may be caused by misuse of the threads library. Double locks or unlocks, race conditions, and futile attempts to lock are among the checks that are available.
A second class of check is for inconsistencies or redundancies. These are not bugs per se, but are often indicators that a programmer misunderstood something. For example, if you check the return value of a function 99 percent of the time, then that 1 percent that is left unchecked may indicate such a problem. This class includes redundant conditions, useless assignments, and checking whether a pointer is null after it has already been dereferenced.
Finally, some tools let you author your own checks. The mechanism for this differs widely between tools. CodeSonar (the tool I work on) lets you write little C code stubs that use a simple API to check for properties. These stubs are never actually executed: They are fed to the analysis alongside your own source code and are then symbolically executed. Statements that are analogous to asserts are used to signal that a warning should be reported.
Static analysis can't check all properties of your program. If your code contains a logic error that causes your program to produce the wrong result, then static analysis will usually be of little help finding that bug. That's what testing is for.