Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Inside Java Class Files


Dr. Dobb's Journal January 1998: Inside Java Class Files

The key to Java's binary portability

Matt is the founder of Laserstars Technologies, where he develops and consults on system software. He can be contacted at [email protected].


One of the key features of the Java programming paradigm is cross-platform binary compatibility -- compiled Java code can execute on any machine and operating system, given the appropriate software. This is made possible by Java's virtual machine approach, where processor- and platform-neutral compiled code (bytecode) is run by a Java Virtual Machine (JVM). The JVM software forms an abstraction layer between Java programs and the physical machine on which the JVM itself runs.

The Java compiler, like any compiler, takes your program's source code and translates it into machine code and binary symbolic information. In a traditional system, this data would be stored in an object file for later use or execution. In Java's case, it is placed into a separate .class file for each Java class or interface in your source code.

In this article, I'll explore the key to Java's binary portability -- its unique class file format. In the process, I'll present two programs: JavaDump, for documenting the structures in class files, and StripDebug, which removes extra debug information from classes. Both use the JCF class file manipulation library, which you can use with your own Java programs for reading and modifying class files.

Class File Organization

Class files are structured around a linear, record-based organization, somewhat similar to 80x86 OMF object files (see Figure 1). They follow a rigid five-part format, yet allow significant flexibility in each section. Each class file begins with a magic number and version information, followed by a constant pool, a class descriptor header, fields, methods, and finally an extension area. All multibyte data types are in Big-endian byte order (most-significant byte first).

Because of Java's dynamically linked nature, each class file must contain a large amount of symbolic and typing information (as opposed to containing mainly code). This data informs the JVM about how to resolve internal and external class references, and also allows it to verify the security and integrity of classes.

The Constant Pool and Class Descriptor

The constant pool of a class file is similar to the symbol table in a traditional object file; see Figures 1(a) and 1(f). However, the constant pool's data is referenced primarily by other structures and code within the class file, and thus contains a wealth of additional information beyond the usual symbol names. The pool is treated as a one-based array of slots, each containing one variable-length data type called a "tag."

The most common constant-pool tags are strings. Strings are stored in the UTF8 format in which Unicode characters are packed into bytes to save space. UTF strings have several uses:

  • Internal and external field, method, and class names. In the case of class names, packages are delimited with "/" characters instead of the "." characters used in Java source code.
  • Field and method signatures, which are similar in format to C++ mangled names.
  • Actual text strings used within the class.

The constant pool also integrates aspects of traditional import/export and relocation tables. There are a number of special linking tags that simply contain the indices of other pool slots. These tags (such as CLASS, METHOD, and NAMETYPE) are used to dynamically link Java classes. For instance, a METHOD tag points to a CLASS tag (to specify an imported class), as well as a NAMEANDTYPE tag (to identify a specific method in that class). Linking tags are also directly referenced by the class's bytecode as a dynamic relocation table.

Finally, the constant pool contains primitive data types (integers, floats, longs, and doubles) larger than the bytecode's 16-bit literal limit. An interesting caveat of the 64-bit types is that they each take two constant-pool slots (which seems to be a JVM-related optimization).

After the constant pool, the class descriptor consists of several miscellaneous fields related to the entire class; see Figure 1(b). These include the access flags of the class (public, private, and so on), as well as constant-pool indexes to the class and its superclass. An array of constant-pool indexes to any interfaces implemented by the class also appear here.

Fields, Methods, and Attributes

Following the secondary header, there are two arrays that describe fields (Figure 1(c)) and methods (Figure 1(d)). Both arrays have an identical structure, but they describe different types of class members. Each variable-length entry identifies the access flags, name, and signature of the member, as well as a list of associated "attributes."

An attribute is a basic component of the class format, and is merely a special type of record that provides additional information in a more flexible format. For instance, each method descriptor contains a nested Code attribute that fully describes the actual bytecode for that method. Similarly, field descriptors may contain a ConstantValue attribute, which points to a constant-pool entry that describes a "static final" constant in a class. In addition, several attributes are optional, and are related solely to debugging. You can include your own attributes in class files to extend the class format without breaking existing code or JVMs.

The Code attribute is especially important because it contains the actual Java bytecode (along with stack and local variable information). It can also contain nested attributes, which makes it interesting. For example, it can nest an Exceptions attribute (to list any exceptions thrown by the method owning the Code attribute) as well as several debug attributes, such as LineNumberTable, LocalVariables, and SourceFile.

At the end of the class file (Figure 1(e)) is a separate section for other attributes that apply to the class as a whole. The SourceFile attribute is placed here by the Java compiler, and vendors are free to put additional attributes in this section as well. For instance, the attributes section is a good place to put class authentication or security information, or perhaps revision-control system data.

Exploring Class Files

The best way to understand a new file format is to use a "dump" utility to display its internal structure and organization. JavaDump, a complete dump utility for class files, reads in a class file and generates a human-readable report on the file. (The complete JavaDump system is available electronically; see "Resource Center," page 3.) Unlike most dump utilities, JavaDump follows Java's web heritage by formatting its output in HTML; see Figure 2. For more information on JavaDump's command-line options, run the program without parameters. Its class name is lti.java.javadump.javadump (assuming that you have placed the appropriate class files in your system's class path).

JavaDump is not a stand-alone program; it uses the Java Class File (JCF) package (lti.java.jcf, also available electronically) to parse class files and maintain information about their structures. After opening a stream to read the desired class file (see Listing One), any program using JCF must create a specialized JcfClassInputStream object (or a class that implements the JcfClassInput interface) layered on top of the actual stream. This approach is used for flexibility in reading and writing class files, since JcfClassInput declares special methods for reading constant-pool entries and other class-file-specific data types. For instance, it would be fairly easy, by overriding JcfClassInput.readCPRef(), to obtain a reference count for each constant-pool slot. Without this layered stream approach, overriding the way a constant-pool reference is read would be very difficult.

After setting up the appropriate parsing stream, a new JcfClassFile object is instantiated with that stream as its constructor's parameter. The code in JcfClassFile will then create various objects representing different class structures as it reads them from the stream. For example, by default JcfClassFile creates a JcfAttributeCollection for each group of attributes (that is, for a method or Code subattribute) that it reads. The JcfAttributeCollection then creates a generic JcfAttribute or JcfCodeAttribute (depending on the type of attribute), which reads itself in from the stream. Thus, a JcfClassFile "owns" a JcfMemberCollection (used for storing the fields or methods of a class), which in turn owns a JcfAttributeCollection, which, finally, owns a JcfAttribute (with potential nesting). In this way, JCF forms a self-loading tree of structures rooted at the initial JcfClassFile.

After parsing the class file into a JcfClassFile object, JavaDump passes this object to DumpClassToHtml, a class that essentially iterates through all of the objects owned by the initial JcfClassFile, creating HTML formatting combined with the actual information derived from the class file. Many of the classes that comprise the JCF package are subclasses of Vector, so they support the Enumeration interface for easy iteration through the objects that they may own (see Listing Two). This applies to most classes with names ending in "Collection."

The JDK includes a program called javap, which is a simple disassembler for class files. While javap is excellent for displaying the bytecode behind methods, it is fairly limited in terms of divulging the internal structures (especially nonstandard attributes) within classes. This is where JavaDump steps in. While JavaDump will not disassemble bytecode as javap will, it excels in producing complete reports of a class's file structure. When combined with a disassembler like javap, JavaDump is a powerful tool for anyone developing class file processors.

Trimming Class Files

As mentioned earlier, the Java compiler stores debug information in special attributes within the class file. These attributes include LineNumberTable, LocalVariables (a list of locals and parameters for each method), and SourceFile (the name of the .java source file). Debug attributes obviously are of little use outside of a debugging situation; they simply bloat class files. Furthermore, their presence can aid reverse engineering and decompiling. The only downside to removing these attributes is that the JVM cannot display line numbers in an exception stack backtrace. After being stripped, classes can still be fully used by the Java compiler.

StripDebug, which also uses the JCF package, is a utility that removes unused debug information from class files. (The complete StripDebug system also is available electronically.) StripDebug (class lti.java.stripdebug.stripdebug) takes the name of a directory as a command-line parameter, and strips the debug information from all class files in that directory. Because StripDebug actually modifies class files (unlike JavaDump, which simply reads them), it uses some additional functionality within JCF.

It was previously mentioned that some objects in the JCF package can own other objects, forming a tree-like structure. The actual child classes that owner objects instantiate while parsing class files are not hardcoded; you can easily substitute compatible subclasses that JCF will instantiate instead of the default. This makes some exciting extensions to JCF possible; for instance, a subclass of JcfConstantPool could be hooked into a JcfClassFile, and would automatically sort constant-pool entries as they are read. Many classes in the JCF package support this factory pattern; see Listing Three.

By overriding the readAttributeCollection() method, you can force JCF to think that it is using its own JcfAttributeCollection, but the expected functionality is actually implemented by StripDebugAttributeCollection's methods (Listing Four). To strip a class's debug attributes, the write() method, which overrides JcfAttributeCollection's write(), simply scans through the attribute collection and removes any debug attributes based on their names. It then invokes the superclass's write() to perform the actual Save operation. This is basically all you need to do to strip the debug information from a class file and, with the help of JCF, it can be accomplished in less than 50 lines of code. The examples presented here are only the tip of the iceberg; you may find the JCF package very useful for more advanced work with class files in your own projects.

Summary

Java class files are one of the keys to Java's ability to deliver portable executable code across platforms, and understanding the structure of class files can be a valuable asset to anyone who needs to get at the information that they contain. The JavaDump utility, built with the JCF class file manipulation package, produces an HTML report from the structures in a class file, and StripDebug takes JCF a step further by actually modifying class files to remove bulky debug information. The ability to read and manipulate class files is important if you work in low-level Java development.

References

Lindholm, Tim and Frank Yellin. The Java Virtual Machine Specification, Addison-Wesley, 1997; or http://www.javasoft.com /docs/language_vm_specification.html.

The software mentioned in this article is also available online at http://www .laserstars.com/articles/ddj/insidejcf.

DDJ

Listing One

[lti\java\javadump\javadump.java]package lti.java.javadump;
import java.io.*;
import lti.java.jcf.*;


</p>
public class javadump
{
  public static void main(String args[]) throws Exception
  {
    // Open the class file.
    FileInputStream fist = new FileInputStream(args[0]);
    // Create a new specialized input stream to parse the class file
    JcfClassInputStream ist = new JcfClassInputStream(
      new BufferedInputStream(fist, Math.min(fist.available(), 32767)));
    // Read into JcfClassFile object
    JcfClassFile jcf = null;
    jcf = new JcfClassFile(ist);
    fist.close();
    // Perform complete dump
    int flags = 0;
    if ((args.length >= 2) && (args[1].equals("-noconstpool")))
      flags |= DumpClassToHtml.OMIT_CONSTPOOL;
    new DumpClassToHtml(jcf, System.out, flags).dump();
  }
}

Back to Article

Listing Two

public void enumerateAttributes (JcfMember member){
  JcfAttributeCollection attrs = member.getAttributes();
  JcfConstantPool cp = member.getConstPool();
  System.out.println("Listing " + attrs.size() + " attributes:");


</p>
  Enumeration enum = attrs.elements();
  while (enum.hasMoreElements())
  {
    JcfAttribute attribute = (JcfAttribute)enum.nextElement();
    System.out.println("  " + cp.utfAt(attribute.getNameIndex()));
  }
}

Back to Article

Listing Three

[lti\java\stripdebug\StripDebugClassFile.java]package lti.java.stripdebug;
import java.io.*;
import lti.java.jcf.*;


</p>
public class StripDebugClassFile extends JcfClassFile
{
  public StripDebugClassFile()
    { super(); }
  public StripDebugClassFile (JcfClassInput ist)
    throws IOException, ClassFormatError
    { super(ist); }
  // Override this factory method to return a JcfAttributeCollection subclass.
  public JcfAttributeCollection readAttributes(JcfClassInput ist)
    throws IOException
  { 
    return new StripDebugAttributeCollection(ist, getConstantPool());
  }
}

Back to Article

Listing Four

[lti\java\stripdebug\StripDebugAttributeCollection.java]package lti.java.stripdebug;
import java.io.*;
import lti.java.jcf.*;


</p>
public class StripDebugAttributeCollection   extends JcfAttributeCollection
{
  public void write (JcfClassOutput ost)     throws IOException
  {
    String attribName;
    int total = size();
    for (int i = 0; i < total; i++) {
      attribName = constPool.utfAt(attributeAt(i).getNameIndex());
      if (attribName.equals("LineNumberTable") ||
        attribName.equals("LocalVariables") ||
        attribName.equals("SourceFile")) 
      { removeElementAt(i); i--; total--; } 
    }
    super.write(ost);
  }
}

Back to Article


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.