Channels ▼

Web Development

Soot: Analyze, Transform, and Optimize Java Bytecodes

Code Analysis Using Soot

Now, let's look at a more complex example: Using Soot to do semantic, rather than syntactical, analysis of Java code. Developers often use simple textual searches, such as grep, to navigate source code. However, textual searches are limited to syntactic properties. They can be foiled by whitespace, and their output can be difficult to interpret. An alternative is to write simple code analysis tools using Soot.

I'll start by presenting a simple tool that searches for class initializers. On the Soot project, we long ago decided that classes should not contain static fields that are initialized to non-constant values. One contributor, therefore, developed a tool called BadFields, which enforces this constraint. This tool is included in the Soot distribution, and you can run it as follows:

$ java -cp ~/soot-svn/classes Test -soot-class-path /usr/lib/j2sdk1.5-sun/jre/lib/rt.jar:/usr/lib/j2sdk1.5-sun/jre/lib/jsse.jar:/usr/lib/j2sdk1.5-sun/jre/lib/jce.jar:. -W -f j

The -W option tells Soot to do a whole-program analysis. This analysis requires the jsse.jar and jce.jar files as part of the Soot classpath.

The input file was:

class Test {
    static f = System.out;

Let's look at various parts of this implementation. First, I'll examine how the implementation connects to Soot; then, the implementation itself.

Hooking up to Soot

The BadFields class extends a SceneTransformer. Soot uses two types of transformers: the BodyTransformer and the SceneTransformer. Both transformers contain an internalTransform() method; however, Soot invokes the BodyTransformer one method at a time. (Soot stores an IR for a method in a so-called "method body"; methods may have a Jimple Body, a Baf body, etc.) However, Soot invokes the SceneTransformer just once for the program as a whole. The SceneTransformer implementation is then responsible for doing whatever it chooses with the whole program, which is stored in Soot's Scene singleton. To connect to Soot, the BadFields class includes a main method, which calls Soot's main method. This is the best way to extend Soot; ensuring that you don't need to parse command-line options or call Soot's internal classes in the right order. The main method contains the following code:

    public static void main(String[] args) 
            new Transform("cg.badfields", new BadFields()));

The code adds the BadFields transformer to the "cg" (or call graph) pack. This tells Soot to run BadFields after it has computed a call graph for the input program.

Implementing a SceneTransformer

Now that we've hooked up the transformer, we'd better implement it. Let's investigate key parts of the BadFields implementation. The first part is the internalTransform() method, of which I've included parts below:

    protected void internalTransform(String phaseName, Map options)
        for( Iterator clIt = 
             Scene.v().getApplicationClasses().iterator(); clIt.hasNext(); ) {

            final SootClass cl = (SootClass);
            for( Iterator it = cl.methodIterator(); it.hasNext(); ) {
                handleMethod( (SootMethod) );

We can see that this method requests all the program's application classes. Any analysis of the program itself would be interested in the application classes. (An analysis of the program and its libraries would request the library classes, which are a superset of the application classes.) I've omitted a call to handleClass(), but I've left in the call to handleMethod(). We access the list of methods of the class cl by calling the methodIterator() function on it. I've reproduced parts of handleMethod() below:

    private void handleMethod( SootMethod m ) {
        if( !m.isConcrete() ) return;
        if( m.getName().equals( "<clinit>" ) ) {
            for( Iterator sIt = 
                 m.getActiveBody().getUnits().iterator(); sIt.hasNext(); ) {
                final Stmt s = (Stmt);
                for( Iterator bIt = s.getUseBoxes().iterator(); bIt.hasNext();) {
                    final ValueBox b = (ValueBox);
                    Value v = b.getValue();
                    if( v instanceof FieldRef ) {
                        warn( m.getName()+" reads field "+v );

Soot also handles non-concrete methods (e.g., interface methods, abstract methods), so the handleMethod() implementation first checks to see that m is a concrete method. Next, because the analysis is only verifying properties of class initializers, I skip methods that are not class initializers. The Java language specification states that class initializers are compiled to methods called <clinit>; you can verify that by looking at Jimple output.

Once we have a class initializer, our analysis needs to examine its statements. We therefore request the Jimple body, and iterate on its units. (Units are the superclass of Jimple statements; the other type of unit is a Baf instruction.)

Each unit can use and define values. For each unit u, we request the list of values that u uses, by calling u.getUseBoxes(). Think of a box as a pointer; boxes add a level of indirection that enables transformations to uniformly modify statements by calling setValue on the box, rather than on its containing statement. Now, if any value is a FieldRef, then our analysis displays a warning, which includes the method (and class) name, as well as the field being read. And in this way, the error is reported.

Because almost any conditions can be searched for in the Jimple IR, Soot becomes a powerful toolkit for finding code constructs that are difficult to specify at a purely syntactical level.

Additional Resources

The Soot homepage contains many resources where you can find out more about Soot. In particular, the Soot Survivor's Guide by Arni Einarsson and Janus Dam Nielsen is quite helpful. You can also consult Soot tutorials. Finally, Soot has a mailing list where you can ask questions and participate in the Soot community. Enjoy!

— Patrick Lam is an Assistant Professor Department of Electrical and Computer Engineering at the University of Waterloo.

Modifying a Java App Without Modifying Its Source Code

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Dr. Dobb's TV