Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Java & Static Analysis


July, 2005: Java & Static Analysis

Amit is a technical architect in Software Engineering and Technology Labs at Infosys Technologies and can be reached at [email protected].


Error detection in software is usually done via code reviews, unit testing, system testing, integration testing, and user-acceptance testing. The first possible error-detection stage subsequent to the build phase is the code review, which helps in early defect detection by enforcing language-specific programming standards and best practices. The probability of errors during the build phase are therefore reduced. These rules are typically verified manually as part of the code review process. However, the process can become cumbersome and unmanageable, especially when undertaken at the end of the build phase by senior programmers or architects. In short, manual code review of large projects is neither efficient nor always possible, as the effort required for manual code review from senior staff is huge.

Static analysis provides a mechanism for tool-based automated code reviews to find code defects early in the build phase. Static analysis at early stages not only keeps bugs out of the code, but helps in locating bugs even before programs run. Static analysis ensures early bug detection and remediation by comparing source code with predefined language patterns, improving development productivity and end-product reliability. Static analysis also helps in enforcing coding conventions, thus making maintainability easier. Among the common methodologies used by static analysis are:

  • Syntactic analysis is done by determining the structure of the input Java code and comparing it with predefined patterns. Common defects found using this methodology are coding conventions such as naming standards, or always having default clauses in switch statements. For instance, the absence of a default clause in Example 1 would hide potential bugs that could have been caught by the default clause.
  • Data-flow analysis tracks objects (variables) and their states (data values) in a particular execution segment of a program or method. This methodology monitors the state of variables in all possible flows, thus predicting things such as null pointer exceptions or database objects that aren't closed in all possible flows. Example 2 shows the database connection object con2 that is not closed in all flows of the method process. This is critical, as there is no guarantee that garbage collection will happen in a timely manner. Moreover, if the database connections are a limited resource, running short of them could trigger more problems.
  • Flow-graph analysis, also known as "Cyclomatic Complexity," checks the complexity of an operation's body. Cyclomatic complexity counts every decision point (if, while, for, or case statements) in a method. Cyclomatic complexity's strict academic definition from graph theory is: CC=E-N+P, where E represents the number of edges on a graph, N the number of nodes, and P the number of connected components.
  • Another way of calculating cyclomatic complexity is CC=Number of decision points+1. Example 3, for instance, shows the method checkCC with cyclomatic complexity as "3;" two decision points (if and else) plus "1" are added for a method entry point.
  • Reflection gives Java classes the ability to look inside dynamically loaded classes. The Java Reflection API provides a mechanism to fetch class information (super classes, method names, and the like) that is used to implement rules like inheritance hierarchy, object coupling, and the maximum number of methods in a class.

Static Analysis

In general, large projects spend more than half of their lifecycle effort on code reviews and defect prevention. This effort can be significantly reduced by automating the code review process. Automation also helps in achieving consistency in terms of coding standards and best practices across project teams and organizations. In this article, I refer to coding standards and best practices enforced by static-analysis tools as "rules."

Here, I focus on Java in exploring the different techniques used to achieve code review automation. In the process, I also examine parameters and techniques required for Java static analysis. Finally, I showcase and analyze the quantitative data from real-world projects to underscore the benefits of static analysis.

Where does static analysis fit in the software lifecycle for maximum benefit? Figure 1 illustrates the role of static analysis in the software development lifecycle and the participants involved in the process:

  • Developers are responsible for writing code and performing static analysis to identify and fix defects early.
  • Architects are responsible for the selection of static-analysis tools and configuration of rules.
  • Software Quality Advisors are responsible for defect analysis and prevention.

In Figure 1, static analysis does not undermine the importance of senior programmers and architects, because the selection of reliable and appropriate static-analysis tools is critical. Project managers who understand organizational quality standards and architects who understand architecture and language requirements are critical.

Additionally, the automation of static analysis involves developers who ensure defect detection and remediation in the development stage. The process illustrates how automated static analysis involves programmers whose code has to go through the tool before any testing is done on their programs. This ensures that program bugs are caught while testing.

The process lifecycle clearly illustrates how software quality advisors (SQA) collect defect reports after several rounds of feedback and correction in the source code.

In short, all participants want highly reliable, automated review processes that let project team members concentrate on other important tasks to fulfill functional requirements. Before getting into the details of various techniques used to automate code review, it is important to understand the basic parameters required for automated code review tools from a Java standpoint. Tools should have the capability of scanning Java and Java bytecode in a structured way and should provide APIs to simplify the implementation of rules.

In this article, I look at various techniques commonly used for Java static analysis. These techniques can be categorized into two categories—analysis performed on Java source (Java files) or Java compiler-generated bytecode (class files).

Scanning Java Source

Tools taking Java source as input first scan through Java source files using parsers, then execute predefined rules on that source code. A thorough understanding of the language is a must for any tool to be able to identify bugs in that particular language program. There are numerous Java language parsers that adhere to the Java Language Specifications (JLS), including JavaCC (https://javacc.dev.java.net/) and ANTLR (http://www.antlr.org/).

Java parsers and tools also simplify Java source by converting source code into tree-like structures. There are numerous Java parsers that simplify the code into a tree-like structure known as "Abstract Syntax Tree." Listing One shows an abstract syntax tree (AST) generated for Java code. I used JavaCC and the JJTree parser (https://javacc.dev.java.net/) to build a program that recognizes matches to grammar specifications. In addition to this, it also generates an AST of the Java source file. I use the PMD rule designer (http://pmd.sourceforge.net/) in Figure 2 to show an AST generated by JavaCC and JJTree from Java source. The parsed and structured source helps tool you in implementing rules that not only involves syntactic errors, but also errors requiring data-flow analysis of source code.

The generated AST is scanned by predefined patterns applied in rules by using parser-generated AST APIs. I illustrate this technique by implementing a rule to catch a common mistake and a potential performance issue committed by inexperienced developers by doing string concatenation in a loop instead of using StringBuffer. The StringConcatInForRule class in Listing Two shows the core implementation of this rule. The Visitor design pattern can be used to execute individual rules with AST being passed as argument. All the classes imported for AST blocks are generated using the JavaCC parser and JJTree tool. The complete source code implementation of various predefined rules with details of Visitor pattern and AST blocks can be found at the PMD Sourceforge repository.

Scanning Java Bytecode

The scanning Java bytecode technique scans through Java bytecode in Java class files. This approach utilizes bytecode libraries to access Java bytecode and implements predefined patterns and rules using these libraries. Java bytecode libraries help you in accessing bytecode by providing interfaces for source-level abstraction. You can read a class file without detailed knowledge of bytecode. For rules requiring pattern matching and understanding of Java bytecode, the java -p javaclassname command helps in analyzing bytecode and subsequently implementing rules using bytecode engineering libraries.

Listing Three is a sample Java program and the bytecode generated using the javap -c (JDK1.4) command. Rules can be implemented using Java bytecode libraries that provide APIs to scan bytecode in a class.

Listing Four presents a rule implementation to catch a design defect using BCEL Java bytecode library (http://bcel .sourceforge.net/). The rule counts the depth of the inheritance hierarchy for a particular class, meaning the number of classes and interfaces a class is inherited from. The program raises an alert whenever it finds a class inherited from more than MAX_INHERETANCE_DEPTH, which is set to "5" in Listing Four. A high value indicates that the class is quite specialized and tightly coupled with other classes. Most tools provide configuration mechanism to set this limit as per project requirements; for instance, Java Swing classes have high inheritance depth.

FindClassInheritanceDepth takes a jar file as an argument and scans through all the class files present in that jar file. It gets all hierarchical super classes and interfaces information for these classes, compares the total number of hierarchy (by adding super classes and interfaces) with the predefined inheritance depth limit, and raises an alert if hierarchy exceeds this limit. The Java classpath should be set to include all jar files used to compile input classes in order to avoid exceptions. For instance, if the classes zipped in an input jar file are using the testing framework JUnit (http://www.junit.org/), the JUnit jar file should be added to the classpath before running FindClassInheritanceDepth. I have caught ClassNotFoundException and intentionally added printStackTrace() in the catch block to cause class information to be added to the classpath.

Guidelines for Tool Selection

There are no hard-and-fast rules for selecting static-analysis tools and techniques. Tool selection primarily depends on the requirements of the project and organization. The reason I have implemented two different varieties of rules in this article is to demonstrate the strength of each technique. Tools using Java source code for analysis are good at checking code conventions, like Sun Code Conventions (http://java.sun.com/docs/codeconv/). These tools use techniques such as syntactic pattern matching and data-flow analysis to implement these rules. These rules are also implemented in tools using the bytecode approach, but extending or modifying these tools to write your own rules is cumbersome. Tools using the AST technique are more extensible when it comes to new organizational standards and to ensure that the tool does not become obsolete.

Bytecode-analysis tools are good for implementing rules related to class-level and class-relationship checks, such as coupling between objects, inheritance depth (Listing Four), and Halstead's software science metrics. The disadvantage of scanning bytecode is that the process ignores defects that are fixed due to compiler optimization, hence the defect never gets fixed in original source code.

Tools using the bytecode technique are more useful in large maintenance projects where class files are already available for review, while Java code scanner tools are ideal for developers during the build phase in development projects.

Open-source tools such as PMD and CheckStyle (http://checkstyle.sourceforge.net/) are good examples of Java code scanning using AST technique. FindBugs (http://findbugs.sourceforge.net/) and JLint (http://jlint.sourceforge.net/) use bytecode-scanning technique for static analysis. Most of these tools provide plug-ins for Eclipse, JBuilder, and Emacs, thus providing a uniform total development environment.

Each approach and tool has some pluses and minuses. For instance, some tools tend to miss instances of incorrect code in their analysis that are caught by other tools. I recommend that projects go with multiple tools, based on their requirements as there's not much effort required in executing these tools, while the benefits justify the effort.

Static-Analysis Benefits

Qualitative benefits from static analysis include a significant improvement in the overall reliability and quality of the software. For instance, Figure 3 illustrates the defects found in a development project using static-analysis tools. The automation tools primarily catch defects in categories such as coding standards, potential performance issues, potential bugs, and design defects. It is obvious from Figure 3 that more than 50 percent of the defects caught during static analysis are cosmetic (Coding Standards) in nature. However, don't forget that without static analysis, detecting these cosmetic defects also requires a cumbersome manual code review process. It is hard to ignore the rest of the defects that are caught, although they are small in number compared to cosmetic defects. Any potential bugs and performance issues caught before the application is run reduces testing and bug fixing at later stages. As shown in Figure 3, the second highest category of defects fall into an irrelevant category; these are also known as "false positives." Such defects are either not applicable to the particular project or are the result of too much automation. These false positives can be suppressed to an extent by the right tool selection and rules configuration.

Figure 4 shows the comparison of automated static analysis with a manual code review process. I took quality statistical data from five live projects using static-analysis techniques and running at CMM Level 4 with average function points around 700. The same data is collected from projects relying only on manual code review. The comparative study of data collected implies that projects using static-analysis techniques are distinctively ahead of other projects. The number of delivered defects in projects using static-analysis tools is significantly lower compared to projects doing manual code review. Figure 4 also points out productivity gains, while overall defects per person hour and line count have come down.

Conclusion

Making review process more manageable and predictable by using static analysis improves software development productivity and end-product reliability. However, the biggest benefit of all is the ability to identify defects at the coding stage, thus directly impacting the reliability of software systems.

There are plenty of tools on the market for Java static analysis, both commercial and freely available, open source and proprietary. In this article, I examined four open-source tools as part of another research project and concluded that none could supplement or supercede each other in terms of functionality. It is imperative that static analysis be applied with the right tools, rules configuration, and well-timed usage to be a potent mechanism for overall quantitative measurable benefits in projects and organizations.

DDJ



Listing One

public class  GenerateAST {
  private String printFuncName() {
    System.out.println(funcName + "Generate AST"); 
  }
}
CompilationUnit
 TypeDeclaration
  ClassDeclaration:(public)
   UnmodifiedClassDeclaration(GenerateAST)
    ClassBody
     ClassBodyDeclaration
      MethodDeclaration:(private)
       ResultType
        Type
         Name:String
       MethodDeclarator(printFuncName)
        FormalParameters
       Block
        BlockStatement
         Statement
          StatementExpression
           PrimaryExpression
            PrimaryPrefix
             Name:System.out.println
            PrimarySuffix
             Arguments
              ArgumentList
               Expression
                AdditiveExpression:+
                 PrimaryExpression
                  PrimaryPrefix
                   Name:funcName
                 PrimaryExpression
                  PrimaryPrefix
                   Literal:"Generate AST"
Back to article


Listing Two
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import com.ddj.ast.ASTAdditiveExpression;
import com.ddj.ast.ASTBlockStatement;
import com.ddj.ast.ASTExpression;
import com.ddj.ast.ASTForStatement;
import com.ddj.ast.ASTLocalVariableDeclaration;
import com.ddj.ast.ASTMethodDeclaration;
import com.ddj.ast.ASTName;
import com.ddj.ast.ASTVariableDeclarator;
import com.ddj.ast.ASTVariableDeclaratorId;

public class StringConcatInForRule {
    static int lineNo = 0;

    public Object visitRule(ASTForStatement node, Object data) {
        ASTMethodDeclaration method = (ASTMethodDeclaration) node
                .getFirstParentOfType(ASTMethodDeclaration.class);
        List localvars = new ArrayList();
        if (method != null) {
            method.findChildrenOfType(ASTLocalVariableDeclaration.class,
                    localvars, true);
            for (Iterator lvars = localvars.iterator(); lvars.hasNext();) {
                ASTLocalVariableDeclaration localvar = 
                      (ASTLocalVariableDeclaration) lvars.next();
                if (localvar.jjtGetChild(0).jjtGetChild(0) 
                                                       instanceof ASTName) {
                    ASTName localVarName = (ASTName) localvar.jjtGetChild(0)
                            .jjtGetChild(0);
                    if (localVarName.getImage().equals("String")) {
                        if (localvar.jjtGetChild(1) 
                                         instanceof ASTVariableDeclarator) {
                            ASTVariableDeclaratorId varId = 
                                (ASTVariableDeclaratorId) localvar
                                        .jjtGetChild(1).jjtGetChild(0);
                            checkForConcat(node, varId.getImage(), data);
                        }
                    }
                }
            }
        }
        return data;
    }
    public boolean checkForConcat(ASTForStatement node, String varName,
            Object data) {
        boolean closed = false;
        List blocks = new ArrayList();
        int oldLineNo = 0;
        node.findChildrenOfType(ASTBlockStatement.class, blocks, true);
        for (Iterator it2 = blocks.iterator(); it2.hasNext();) {
            ASTBlockStatement block = (ASTBlockStatement) it2.next();
            List exps = new ArrayList();
            block.findChildrenOfType(ASTExpression.class, exps, true);
            for (Iterator it = exps.iterator(); it.hasNext();) {
               ASTExpression exp = (ASTExpression) it.next();
                if (exp.jjtGetChild(0) instanceof ASTAdditiveExpression) {
                    List names = new ArrayList();
                    exp.findChildrenOfType(ASTName.class, names, true);
                    for (Iterator it1 = names.iterator(); it1.hasNext();) {
                        ASTName name = (ASTName) it1.next();
                        if (name.getImage().equals(varName)) {
                            lineNo = block.getBeginLine();
                            if (oldLineNo != lineNo) {
                                System.out.println("String Concatenation in 
                                        loop at line no. " + lineNo +
                                                  " use Stringbuffer");
                            }
                            closed = true;
                            oldLineNo = lineNo;
                        }

                    }
                }
            }
        }
        return closed;
    }
}
Back to article


Listing Three
Compiled from CreateObjects.java
public class CreateObjects extends SimpleObjects {
    public CreateObjects();
    public java.lang.String create();
}

Method CreateObjects()
   0 aload_0
   1 invokespecial #1 <Method SimpleObjects()>
   4 return

Method java.lang.String create()
   0 new #2 <Class java.lang.String>
   3 dup
   4 invokespecial #3 <Method java.lang.String()>
   7 areturn
Back to article


Listing Four
// FindClassInheritanceDepth.java

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import org.apache.bcel.classfile.ClassParser;
import org.apache.bcel.classfile.JavaClass;

public class FindClassInheritanceDepth {
    private static ZipInputStream zipInputStream;
    private static ClassParser classParser;
    private static FileInputStream fileInputStream;
    private static final int MAX_INHERETANCE_DEPTH = 5;
    
    static String getFileExtension(String fileName) {
        int lastDot = fileName.lastIndexOf('.');
        return (lastDot >= 0) ? fileName.substring(lastDot) : null;
    }

    static int CheckClassDepth(JavaClass javaClass) {
        int inheritanceDepth = 0;
        try {
            JavaClass[] aJavaClass = javaClass.getSuperClasses();
            JavaClass[] aJavaInterfaces = javaClass.getAllInterfaces();
            inheritanceDepth = aJavaClass.length + aJavaInterfaces.length;
        } catch (ClassNotFoundException cnfe) {
            System.out.println("Base Class: " + javaClass.getClassName());
            cnfe.printStackTrace();
        }
        return inheritanceDepth;
    }
    public static void main(String args[]) {
        
        if (args.length == 0) {
          System.out.println("Usage: java FindClassInheritanceDepth jarFile");
          return;
        }
        try {
            fileInputStream = new FileInputStream(args[0]);

        } catch (FileNotFoundException fe) {
            fe.printStackTrace();
        }
        zipInputStream = new ZipInputStream(fileInputStream);
        for (;;) {
            try {
                ZipEntry zipEntry = zipInputStream.getNextEntry();
                if (zipEntry == null) {
                    return;
                }
                String fileExtension = getFileExtension(zipEntry.getName());
                if (fileExtension != null) {
                    if (fileExtension.equals(".class")) {
                        classParser = new ClassParser(args[0], zipEntry
                                .getName());
                        JavaClass jClass = classParser.parse();
                        if(CheckClassDepth(jClass) > MAX_INHERETANCE_DEPTH)
                            System.out.println("Class: "+ 
                                 jClass.getClassName() + 
                                   " have excedded inheretance hierarchy 
                                         limit(max 5 is allowed)" + "\n");
                    }
                }
            } catch (IOException ie) {
               ie.printStackTrace();
            }
        }
    }
}
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.