Static analyzers try to find weaknesses in other programs that could be triggered accidentally or exploited by intruders. A report from the National Institute of Standards and Technology (NIST) entitled Static Analysis Tool Exposition (SATE), edited by Vadim Okun, Romain Gaucher, and Paul Black, documents NIST's Static Analysis Tool Exposition -- an exercise by NIST and static analyzer vendors to improve the performance of these tools.
The static analyzers (and languages) in the study included Aspect Security ASC 2.0 (Java), Checkmarx CxSuite 2.4.3 (Java), Flawfinder 1.27 (C), Fortify SCA 5.0.0.0267 (C, Java), Grammatech CodeSonar 3.0p0 (C), HP DevInspect 5.0.5612.0 (Java), SofCheck Inspector for Java 2.1.2 (Java), University of Maryland FindBugs 1.3.1 (Java), and Veracode SecurityReview (C, Java).
According to NIST's Vadim Okun, SATE was a long-overdue idea. "Most modern software is too lengthy and complex to analyze by hand," says Okun. "Additionally, programs that would have been considered secure ten years ago may now be vulnerable to hackers. We're trying to focus on identifying what in a program's code might be exploitable."
While the SATE 2008 process was not designed to compare the performance of participating tools, it was successful in understanding some of their capabilities in a wide variety of weaknesses. SATE demonstrated that results from multiple tools can be combined into a single database from which further analysis is possible. While the backtrace explanations were useful, the study concluded that the evaluation might have been more efficient and less error-prone by closely integrating with the navigation and visualization capabilities of the tools.
Future studies should plan for the possibility that the tools may generate more warnings than they can evaluate. Consistent criteria for warning selection are needed to address any analytical resource limitations in a way that produces cleaner data. It is important to use a clear definition of true positives and false positives from the beginning, although there may still be subtle difficulties in producing consistent evaluations. Finally, if any comparative analysis is to be performed, warnings will need to be normalized to account for tool-specific differences in how warnings are reported and quantified.
Okun believes that there's a good deal of research remains to be done. The effort was not only highly demanding, but it also showed some goals may be out of reach. While users want static analyzers to find all the problems in a piece of software, but also raise no false alarms, "that's not achievable," Okun says. "We want to show people that this isn't a trivial process, but the tools are improving and it makes good sense to use them."


