GJ A Generic Java

Generic Java (GJ) adds generic types to the Java language. GJ is compatible with Java, the Java Virtual Machine, and existing libraries. It is also efficient, in that information about generic types is maintained only at compile time, not run time.


February 01, 2000
URL:http://www.drdobbs.com/windows/gj-a-generic-java/184403997

Feb00: GJ A Generic Java

Philip is a researcher at Bell Labs, Lucent Technologies. He can be reached at [email protected].


The Java programming language may be about to change. While the Java system has added and revised libraries at a furious pace, the Java language proper has remained frozen since the introduction of inner classes a few years ago. Now, Sun has invoked the Java Community Process to consider adding generics to the Java language.

Here is the problem that generics solve. Say you wish to process lists. Some may be lists of bytes, others lists of strings, and yet others lists of lists of strings. In Java, this is simple. You can represent all three by the same class, which has elements of class Object; see Table 1(a).

To keep the language simple, you are forced to do some of the work yourself: You must keep track of the fact that you have a list of bytes (or strings or lists), and when you extract an element from the list, you must cast it from class Object back to class Byte (or String or List) before further processing. For instance, the collection class framework in Java 2 treats collections (including lists) in just this way.

As Einstein said, everything should be as simple as possible, but no simpler. And some might say the above is too simple. If we extend the Java language with generic types, then it is possible to represent information about lists in a more direct way; see Table 1(b). The compiler could now keep track of whether you have a list of bytes (or strings or lists), and no explicit cast back to class Byte (or String or List<String>) would be required. In some ways, this is similar to generics in Ada or templates in C++. It is also similar to parametric polymorphism in so-called functional languages such as ML and Haskell.

Sun's work is influenced by a number of previous proposals, notably Generic Java (GJ). GJ was designed by Gilad Bracha and David Stoutamire at Sun, Martin Odersky at the Swiss Federal Institute of Technology, and myself. Bracha is now spearheading Sun's process for adding generics to Java, with Odersky and myself sitting on the associated experts committee. In this article, I'll describe GJ as a way of looking ahead at what may emerge from Sun. Further, GJ details are available at http://www.cs.bell-labs.com/~wadler/gj/. The GJ compiler, written by Odersky, is available for download from the site. Odersky's GJ compiler is also the basis of Sun's current Java compiler, although that compiler does not support generics (yet). GJ and the GJ compiler are not themselves Sun products.

Some of GJ's key features include:

The GJ compiler works by translating GJ back to ordinary Java. The translation simply erases type parameters and adds casts. For instance, it takes the GJ class List<Byte> back into the Java class List, and adds casts from Object to Byte where needed. The result is much the same Java you would have written if generics weren't available. This is why it is simple to interface GJ with existing Java libraries, and why GJ is as efficient as Java. Furthermore, GJ comes with a "cast-iron" guarantee: No cast inserted by the compiler will ever fail. In effect, the type system is a simple logic that lets the compiler prove that the cast is safe. On top of this, since GJ translates into JVM byte codes, all of the usual safety and security properties of the Java platform are preserved.

How GJ Works

Let's look at two examples of how GJ works. The first covers the most basic features of GJ, as used in building and examining generic lists. The second covers some more advanced features and shows how to write a generic method to find the largest element in a list. In both cases, I'll first consider the Java code for a task, then show how it is rewritten in GJ. I'll also describe how to retrofit legacy Java libraries to GJ and consider how C++ templates compare with GJ generics.

Example 1 shows list and iterator interfaces (both highly simplified from the Java collections library), a linked list class that implements the list interface, and a test class. The list interface provides a method to add an element to a list, add, and a method to return an iterator for the list, iterator. In turn, the iterator interface provides a method to determine if the iteration is done, hasNext, and (if it is not) a method to return the next element and advance the iterator, next. The linked list class implements the list interface, but I'll skip the details of how this is done. The add method expects an object, and the next method returns an object. Every class is a subclass of Object, so you can form lists with elements of type Byte, String, List itself, or any other class.

The test class builds some lists, then extracts elements from them. Users must recall what type of element is stored in a list, and cast to the appropriate type when extracting an element from a list. Extracting requires two casts. If the user accidentally attempts to extract a byte from a list of strings, it raises an exception at run time.

Example 2 shows lists, iterators, linked lists, and the test class rewritten in GJ. The interfaces and class take a type parameter A, written in angle brackets, representing the element type. Each place where Object appeared in the previous code is now replaced by A. Each place where List, Iterator, or LinkedList appeared in the previous code is now replaced by List<A>, Iterator<A>, or LinkedList<A>, respectively. Instead of relying on the user's memory, parameters document the type of each list's elements, and no casts are required. The code to extract an element from a list of lists is now more perspicuous. An attempt to extract a byte from a list of strings will now indicate an error at compile time.

In Java, lists are heterogenous -- they may have elements of any type, and there is no way to enforce that they all have the same type. In GJ, lists are homogenous -- they must have all elements of the same type, and the compiler enforces this. If you really need a list with elements of any type, you use List<Object>.

To translate from GJ into the Java programming language, you replace each type by its erasure. The erasure of a parametric type is obtained by deleting the parameter (so List<A> erases to List), the erasure of a nonparametric type is the type itself (so Byte erases to Byte), and the erasure of a type parameter is Object (so A erases to Object). A suitable cast is inserted around each method call where the return type is a type parameter. Translating the GJ code for lists yields the Java code you started with (except for the line that was in error). Thus, a new program compiled against the GJ code could be used with an old library compiled against the Java code.

Angle brackets were chosen for type parameters because they are familiar to C++ users, and because the other forms of brackets may lead to confusion. If round brackets are used, it is difficult to distinguish type and value parameters. If square brackets are used, it is difficult to distinguish type parameters and array dimensions. If curly brackets are used, it is difficult to distinguish type parameters from class bodies.

Phrases like LinkedList<LinkedList<String>> pose a problem to the parser, since >> is treated as a single lexeme. (Similarly for >>>.) In C++, users are required to add extra spaces to avoid this problem. In GJ, there is no worry for users, the problem is instead solved by a slight complication to the grammar.

Bridges, Generic Methods, and Bounds

The next example demonstrates more advanced features of generics, including bridges, generic methods, and bounds. I'll consider Java code to find the maximum element in a list, then show how this is rewritten in GJ.

In Java, objects that can be compared should be declared as implementing the Comparable interface. Each base type (such as byte or boolean) has a corresponding object type (such as Byte or Boolean).

Example 3 shows declarations for the Comparable interface and the Byte class that implements it (both simplified from the Java library). The method compareTo takes a parameter object and returns an integer that is negative, zero, or positive if the receiver object is less than, equal to, or greater than the parameter object. The Byte class defines two methods, one with signature compareTo(Byte) (which overloads the name compareTo) and one with signature compareTo(Object) (which overrides the corresponding method in the Comparable interface). The first method simply subtracts the 2 bytes to determine how they compare. The second method casts the object to a byte and then calls the method; if the method is passed an object other than a byte, then a run-time error will occur. The second method is required because overloading occurs only when types match exactly. It is called a "bridge" because it connects the first method (which is specific for bytes) to the interface method (which is generally all objects).

Example 3 also shows a utility class, Lists, with a static method to find the maximum element in a list, and a test class. Method max takes a nonempty list of comparable elements and returns the maximum element in the list. The test class shows two sample calls to the method. As before, users must keep track of the result type and cast the results as appropriate. Booleans do not implement the comparable interface, so an attempt to find the maximum of a collection of Booleans raises an exception at run time.

Example 4 is the code rewritten in GJ. The Comparable interface now takes a type parameter A, indicating the type compared to. For instance, the class Byte implements the interface Comparable<Byte>, indicating that a byte can be compared with another byte. The interface requires a method with signature compareTo(A), so you would expect this to be implemented in the class by a method with signature compareTo(Byte). Users do not need to write the bridge method, since it is automatically generated by the GJ compiler.

The signature of the method max illustrates two interesting features of GJ -- generic methods and bounds. Here is the signature:

public static <A implements Comparable<A>> A max (List<A> xs)

The signature says the method takes a list with elements of type A and returns an element of type A. The phrase in angle brackets at the beginning declares the type parameter A and indicates that the method can be instantiated for any type A that implements the interface Comparable<A>. A method with its own type parameter is called a "generic method," and a type parameter that must implement a given interface (or be a subclass of a given class) is called "bounded."

The test class shows two sample calls to the method. In the first call, the compiler automatically infers that the type parameter A in the method signature must be instantiated to Byte in the method call, and it checks that class Byte implements the bound Comparable<Byte>. In the second call, the inferred type parameter is Boolean, and it does not implement the bound Comparable<Boolean>. So, where Java raises an exception at run time, GJ indicates an error at compile time.

In general, a bound is introduced by following the parameter with "implements" and an interface, or "extends" and a class. Bounds are allowed wherever type parameters are introduced in either the head of a class or the signature of a generic method. The bounding interface or class may itself be parameterized, and may even be recursive, as in the example where the bound Comparable<A> contains the bounded type parameter A.

The definition of erasure is extended so that the erasure of a type variable is the erasure of its bound (so A erases to Comparable in max). As before, the translation introduces suitable casts, and it also introduces bridge methods where required. And as before, translating the GJ code yields the Java code you started with (except for the line that was in error).

As you have seen, GJ code compiles into Java that looks much like what you would write if generics had not been available. As a result, you can often take legacy code and add type information, even if only binary class files are available, using a special retrofitting mode of the GJ compiler.

For instance, say you have a class file for the Java version of the LinkedList class described earlier, but you wish to use it as if it has GJ types. Example 5 shows the necessary retrofitting file.

To support independent compilation, the GJ compiler stores extra type-signature information in JVM class files (the class file format is designed to permit such additions, which are ignored by the JVM at run time). The retrofitter takes the existing Java class file, checks that its type signatures are the erasures of the GJ signatures, and produces a new class file with the GJ signatures added.

The entire collection class library for Java 2 has been retrofitted in this way. Every public interface, class, and method in the library -- without a single exception -- has an appropriate corresponding GJ type. Because retrofitted class files differ only in the addition of GJ type signatures (which are ignored by the JVM at run time), you can run the resulting code in a browser that is Java 2 compliant without reloading the collection class library.

In most cases, you would anticipate eventually rewriting the source library with parameterized types. The advantage of retrofitting is that you may schedule this rewriting at a convenient time -- it is not necessary to rewrite all legacy code before new code can exploit generic types.

Conclusion

Java programmers often use a generic idiom where elements of a data structure are given the type Object. This is simple, but forces you to keep track of the actual type of the elements and add many casts. In contrast, adding generic types to the Java language adds a little in complexity, but now it is the compiler rather than you that keeps track of the type of the elements and adds the casts.

In particular, generic types have the advantage of turning run-time exceptions into compile-time errors. Here's what Bill Joy, one of the authors of the Java language specification, had to say about adding generics to Java (Java One Conference, June 1999):

We're continuing to work on the idea of catching more of the errors during development, putting a parameterized type system in the language. For me, it's not so much to make the language more expressive, but to get rid of casts so there are less errors found after you ship the code.

After you ship, it costs you about 10,000 times as much to fix a software bug, and as a programmer, it's also really annoying. If the bug is caught at compile time, you're sitting near what's going on. The code concepts are in your mind. If the bug report comes in from the field, it gets assigned to bug tracking and eventually makes it back to your desk or to somebody else's desk, and the thinking that went into the original design isn't there anymore. You have to reload your "cache" memory in your brain. So this whole idea of catching errors up front is a real advantage.

If you see the class Object mentioned in a Java program, it is usually a sign that the program would benefit from the use of generic types. You might say that the class is well named -- when you see it you should "object" and use generic types instead.

GJ is a language design that extends Java with generics. It demonstrates that it is possible to add generics to Java in a simple and usable way. Several large programs have been implemented in GJ, including the GJ compiler itself. As mentioned earlier, other languages support generic types in other ways. In particular, C++ supports generic types with templates. Although they have similar syntax and similar uses, C++ templates and GJ generics are implemented in quite different ways.

C++ templates are implemented by expansion, making one copy of the code for each type where it is used (for instance, in our first example, there would be three copies, one for bytes, one for strings, and one for lists of strings) -- this frequently leads to code bloat. Furthermore, because the template might be defined in one file and used in another file, errors caused by the expansion are often not reported until link time and can often be hard to track down. My colleagues Brian Kernighan and Rob Pike report on a small C++ program where templates generated a variable name 1594 characters long. (See The Practice of Programming, Addison-Wesley, 1999.)

In contrast, GJ generics are implemented by erasure, so there is no code bloat. All of the constraints that a type variable must satisfy are specified by the bounds on that variable, so all errors are reported at compile time, not link time. (This is just as well because Java has dynamic linking, so link time is the same as run time.) On the other hand, to make this work smoothly, type parameters must always be a reference type like Byte rather than a primitive type like byte. So, until Java compilation technology improves, GJ generic types may be less efficient than C++ templates. (Perhaps this is no big deal, since until compilation technology improves, Java is less efficient than C++ anyway.)

While GJ is designed to be eminently practical, it has its roots in some esoteric theory. Ideas that contributed to the design of GJ include Church's lambda calculus from the 1930s, Curry and Hindley's type inference system from the 1950s, and Girard and Reynold's polymorphic calculus from the 1970s. GJ programmers don't need to understand these concepts, but they helped the GJ designers to do a better job. Mathematics from the last century is relevant to designing languages for the next millennium.

DDJ

Feb00: GJ A Generic Java


interface List {
  public void add (Object x);
  public Iterator iterator ();
}
interface Iterator {
  public Object next ();
  public boolean hasNext ();
}
class LinkedList implements List { ... }
class Test {
  public static void main (String[] args) {

    // byte list
    List xs = new LinkedList();
    xs.add(new Byte(0)); xs.add(new Byte(1));
    Byte x = (Byte)xs.iterator().next();

    // string list
    List ys = new LinkedList();
    ys.add("zero"); ys.add("one");
    String y = (String)ys.iterator().next();

    // string list list
    List zss = new LinkedList();
    zss.add(ys);
    String z = (String)((List)zss.iterator().next()).iterator().next();

    // string list treated as byte list
    Byte w = (Byte)ys.iterator().next();  // run-time exception
  }
}

Example 1: List and iterator interfaces.

Feb00: GJ A Generic Java


interface List<A> {
  public void add(A x);
  public Iterator<A> iterator();
}
interface Iterator<A> {
  public A next();
  public boolean hasNext();
}
class LinkedList<A> implements List<A> { ... }
class Test {
  public static void main (String[] args) {

    // byte list
    List<Byte> xs = new LinkedList<Byte>();
    xs.add(new Byte(0)); xs.add(new Byte(1));
    Byte x = xs.iterator().next();

    // string list
    List<String> ys = new LinkedList<String>();
    ys.add("zero"); ys.add("one");
    String y = ys.iterator().next();

    // string list list
    List<List<String>> zss = new LinkedList<List<String>>();
    zss.add(ys);
    String z = zss.iterator().next().iterator().next();

    // string list treated as byte list
    Byte w = ys.iterator().next();  // compile-time error
  }
}

Example 2: Lists, iterators, linked lists, and the test class rewritten in GJ.

Feb00: GJ A Generic Java


interface Comparable {
  public int compareTo (Object that);
}
class Byte implements Comparable {
  private byte value;
  public Byte (byte value) { this.value = value; }
  public byte byteValue () { return value; }
  public int compareTo (Byte that) {
    return this.value - that.value;
  }
  // bridge
  public int compareTo (Object that) {
    return this.compareTo((Byte)that);
  }
}
class Lists {
  public static Comparable max (List xs) {
    Iterator xi = xs.iterator();
    Comparable w = (Comparable)xi.next();
    while (xi.hasNext()) {
      Comparable x = (Comparable)xi.next();
      if (w.compareTo(x) < 0) w = x;
    }
    return w;
  }
}
class Test {
  public static void main (String[] args) {

    // byte list
    List xs = new LinkedList();
    xs.add(new Byte(0)); xs.add(new Byte(1));
    Byte x = (Byte)Lists.max(xs);

    // boolean list
    List ys = new LinkedList();
    ys.add(new Boolean(false)); ys.add(new Boolean(true));
    Boolean y = (Boolean)Lists.max(ys);  // run-time exception
  }
}

Example 3: Declarations for the Comparable interface and the Byte class that implements this interface.

Feb00: GJ A Generic Java


interface Comparable<A> {
  public int compareTo (A that);
}
class Byte implements Comparable<Byte> {
  private byte value;
  public Byte (byte value) { this.value = value; }
  public byte byteValue () { return value; }
  public int compareTo (Byte that) {
    return this.value - that.value;
  }
}
class Lists {
  public static <A implements Comparable<A>> A max (List<A> xs) {
    Iterator<A> xi = xs.iterator();
    A w = xi.next();
    while (xi.hasNext()) {
      A x = xi.next();
      if (w.compareTo(x) < 0) w = x;
    }
    return w;
  }
}
class Test {
  public static void main (String[] args) {

    // byte collection
    LinkedList<Byte> xs = new LinkedList<Byte>();
    xs.add(new Byte(0)); xs.add(new Byte(1));
    Byte x = Collections.max(xs);

    // boolean collection
    LinkedList<Boolean> ys = new LinkedList<Boolean>();
    ys.add(new Boolean(false)); ys.add(new Boolean(true));
    Boolean y = Collections.max(ys);  // compile-time error
  }
}

Example 4: Code rewritten in GJ.

Feb00: GJ A Generic Java


class LinkedList<A> implements Collection<A> {
  public LinkedList ();
  public void add (A elt);
  public Iterator<A> iterator ();
}

Example 5: The retrofitting file.

Feb00: GJ A Generic Java

Table 1: Processing lists. (a) Java; (b) with generic types.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.