Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

GJ A Generic Java


Feb00: GJ A Generic Java

Philip is a researcher at Bell Labs, Lucent Technologies. He can be reached at [email protected].


The Java programming language may be about to change. While the Java system has added and revised libraries at a furious pace, the Java language proper has remained frozen since the introduction of inner classes a few years ago. Now, Sun has invoked the Java Community Process to consider adding generics to the Java language.

Here is the problem that generics solve. Say you wish to process lists. Some may be lists of bytes, others lists of strings, and yet others lists of lists of strings. In Java, this is simple. You can represent all three by the same class, which has elements of class Object; see Table 1(a).

To keep the language simple, you are forced to do some of the work yourself: You must keep track of the fact that you have a list of bytes (or strings or lists), and when you extract an element from the list, you must cast it from class Object back to class Byte (or String or List) before further processing. For instance, the collection class framework in Java 2 treats collections (including lists) in just this way.

As Einstein said, everything should be as simple as possible, but no simpler. And some might say the above is too simple. If we extend the Java language with generic types, then it is possible to represent information about lists in a more direct way; see Table 1(b). The compiler could now keep track of whether you have a list of bytes (or strings or lists), and no explicit cast back to class Byte (or String or List<String>) would be required. In some ways, this is similar to generics in Ada or templates in C++. It is also similar to parametric polymorphism in so-called functional languages such as ML and Haskell.

Sun's work is influenced by a number of previous proposals, notably Generic Java (GJ). GJ was designed by Gilad Bracha and David Stoutamire at Sun, Martin Odersky at the Swiss Federal Institute of Technology, and myself. Bracha is now spearheading Sun's process for adding generics to Java, with Odersky and myself sitting on the associated experts committee. In this article, I'll describe GJ as a way of looking ahead at what may emerge from Sun. Further, GJ details are available at http://www.cs.bell-labs.com/~wadler/gj/. The GJ compiler, written by Odersky, is available for download from the site. Odersky's GJ compiler is also the basis of Sun's current Java compiler, although that compiler does not support generics (yet). GJ and the GJ compiler are not themselves Sun products.

Some of GJ's key features include:

  • Compatibility with Java. GJ is a superset of Java. Every Java program is still legal and retains the same meaning in GJ.

  • Compatibility with the Java Virtual Machine (JVM). GJ compiles into JVM code. No change to the JVM is required. Thus, GJ runs everywhere Java runs, including in your browser.

  • Compatibility with existing libraries. Existing libraries work with GJ, even in compiled class binary form. Sometimes it is possible to retrofit an old library to have new types, without access to the source. For instance, the Java collection class library was retrofitted to add generics.

  • Efficiency. Information about generic types is maintained only at compile time, not at run time. This means that compiled GJ code is pretty much identical to Java code for the same purpose, and equally efficient.

The GJ compiler works by translating GJ back to ordinary Java. The translation simply erases type parameters and adds casts. For instance, it takes the GJ class List<Byte> back into the Java class List, and adds casts from Object to Byte where needed. The result is much the same Java you would have written if generics weren't available. This is why it is simple to interface GJ with existing Java libraries, and why GJ is as efficient as Java. Furthermore, GJ comes with a "cast-iron" guarantee: No cast inserted by the compiler will ever fail. In effect, the type system is a simple logic that lets the compiler prove that the cast is safe. On top of this, since GJ translates into JVM byte codes, all of the usual safety and security properties of the Java platform are preserved.

How GJ Works

Let's look at two examples of how GJ works. The first covers the most basic features of GJ, as used in building and examining generic lists. The second covers some more advanced features and shows how to write a generic method to find the largest element in a list. In both cases, I'll first consider the Java code for a task, then show how it is rewritten in GJ. I'll also describe how to retrofit legacy Java libraries to GJ and consider how C++ templates compare with GJ generics.

Example 1 shows list and iterator interfaces (both highly simplified from the Java collections library), a linked list class that implements the list interface, and a test class. The list interface provides a method to add an element to a list, add, and a method to return an iterator for the list, iterator. In turn, the iterator interface provides a method to determine if the iteration is done, hasNext, and (if it is not) a method to return the next element and advance the iterator, next. The linked list class implements the list interface, but I'll skip the details of how this is done. The add method expects an object, and the next method returns an object. Every class is a subclass of Object, so you can form lists with elements of type Byte, String, List itself, or any other class.

The test class builds some lists, then extracts elements from them. Users must recall what type of element is stored in a list, and cast to the appropriate type when extracting an element from a list. Extracting requires two casts. If the user accidentally attempts to extract a byte from a list of strings, it raises an exception at run time.

Example 2 shows lists, iterators, linked lists, and the test class rewritten in GJ. The interfaces and class take a type parameter A, written in angle brackets, representing the element type. Each place where Object appeared in the previous code is now replaced by A. Each place where List, Iterator, or LinkedList appeared in the previous code is now replaced by List<A>, Iterator<A>, or LinkedList<A>, respectively. Instead of relying on the user's memory, parameters document the type of each list's elements, and no casts are required. The code to extract an element from a list of lists is now more perspicuous. An attempt to extract a byte from a list of strings will now indicate an error at compile time.

In Java, lists are heterogenous -- they may have elements of any type, and there is no way to enforce that they all have the same type. In GJ, lists are homogenous -- they must have all elements of the same type, and the compiler enforces this. If you really need a list with elements of any type, you use List<Object>.

To translate from GJ into the Java programming language, you replace each type by its erasure. The erasure of a parametric type is obtained by deleting the parameter (so List<A> erases to List), the erasure of a nonparametric type is the type itself (so Byte erases to Byte), and the erasure of a type parameter is Object (so A erases to Object). A suitable cast is inserted around each method call where the return type is a type parameter. Translating the GJ code for lists yields the Java code you started with (except for the line that was in error). Thus, a new program compiled against the GJ code could be used with an old library compiled against the Java code.

Angle brackets were chosen for type parameters because they are familiar to C++ users, and because the other forms of brackets may lead to confusion. If round brackets are used, it is difficult to distinguish type and value parameters. If square brackets are used, it is difficult to distinguish type parameters and array dimensions. If curly brackets are used, it is difficult to distinguish type parameters from class bodies.

Phrases like LinkedList<LinkedList<String>> pose a problem to the parser, since >> is treated as a single lexeme. (Similarly for >>>.) In C++, users are required to add extra spaces to avoid this problem. In GJ, there is no worry for users, the problem is instead solved by a slight complication to the grammar.

Bridges, Generic Methods, and Bounds

The next example demonstrates more advanced features of generics, including bridges, generic methods, and bounds. I'll consider Java code to find the maximum element in a list, then show how this is rewritten in GJ.

In Java, objects that can be compared should be declared as implementing the Comparable interface. Each base type (such as byte or boolean) has a corresponding object type (such as Byte or Boolean).

Example 3 shows declarations for the Comparable interface and the Byte class that implements it (both simplified from the Java library). The method compareTo takes a parameter object and returns an integer that is negative, zero, or positive if the receiver object is less than, equal to, or greater than the parameter object. The Byte class defines two methods, one with signature compareTo(Byte) (which overloads the name compareTo) and one with signature compareTo(Object) (which overrides the corresponding method in the Comparable interface). The first method simply subtracts the 2 bytes to determine how they compare. The second method casts the object to a byte and then calls the method; if the method is passed an object other than a byte, then a run-time error will occur. The second method is required because overloading occurs only when types match exactly. It is called a "bridge" because it connects the first method (which is specific for bytes) to the interface method (which is generally all objects).

Example 3 also shows a utility class, Lists, with a static method to find the maximum element in a list, and a test class. Method max takes a nonempty list of comparable elements and returns the maximum element in the list. The test class shows two sample calls to the method. As before, users must keep track of the result type and cast the results as appropriate. Booleans do not implement the comparable interface, so an attempt to find the maximum of a collection of Booleans raises an exception at run time.

Example 4 is the code rewritten in GJ. The Comparable interface now takes a type parameter A, indicating the type compared to. For instance, the class Byte implements the interface Comparable<Byte>, indicating that a byte can be compared with another byte. The interface requires a method with signature compareTo(A), so you would expect this to be implemented in the class by a method with signature compareTo(Byte). Users do not need to write the bridge method, since it is automatically generated by the GJ compiler.

The signature of the method max illustrates two interesting features of GJ -- generic methods and bounds. Here is the signature:

public static <A implements Comparable<A>> A max (List<A> xs)

The signature says the method takes a list with elements of type A and returns an element of type A. The phrase in angle brackets at the beginning declares the type parameter A and indicates that the method can be instantiated for any type A that implements the interface Comparable<A>. A method with its own type parameter is called a "generic method," and a type parameter that must implement a given interface (or be a subclass of a given class) is called "bounded."

The test class shows two sample calls to the method. In the first call, the compiler automatically infers that the type parameter A in the method signature must be instantiated to Byte in the method call, and it checks that class Byte implements the bound Comparable<Byte>. In the second call, the inferred type parameter is Boolean, and it does not implement the bound Comparable<Boolean>. So, where Java raises an exception at run time, GJ indicates an error at compile time.

In general, a bound is introduced by following the parameter with "implements" and an interface, or "extends" and a class. Bounds are allowed wherever type parameters are introduced in either the head of a class or the signature of a generic method. The bounding interface or class may itself be parameterized, and may even be recursive, as in the example where the bound Comparable<A> contains the bounded type parameter A.

The definition of erasure is extended so that the erasure of a type variable is the erasure of its bound (so A erases to Comparable in max). As before, the translation introduces suitable casts, and it also introduces bridge methods where required. And as before, translating the GJ code yields the Java code you started with (except for the line that was in error).

As you have seen, GJ code compiles into Java that looks much like what you would write if generics had not been available. As a result, you can often take legacy code and add type information, even if only binary class files are available, using a special retrofitting mode of the GJ compiler.

For instance, say you have a class file for the Java version of the LinkedList class described earlier, but you wish to use it as if it has GJ types. Example 5 shows the necessary retrofitting file.

To support independent compilation, the GJ compiler stores extra type-signature information in JVM class files (the class file format is designed to permit such additions, which are ignored by the JVM at run time). The retrofitter takes the existing Java class file, checks that its type signatures are the erasures of the GJ signatures, and produces a new class file with the GJ signatures added.

The entire collection class library for Java 2 has been retrofitted in this way. Every public interface, class, and method in the library -- without a single exception -- has an appropriate corresponding GJ type. Because retrofitted class files differ only in the addition of GJ type signatures (which are ignored by the JVM at run time), you can run the resulting code in a browser that is Java 2 compliant without reloading the collection class library.

In most cases, you would anticipate eventually rewriting the source library with parameterized types. The advantage of retrofitting is that you may schedule this rewriting at a convenient time -- it is not necessary to rewrite all legacy code before new code can exploit generic types.

Conclusion

Java programmers often use a generic idiom where elements of a data structure are given the type Object. This is simple, but forces you to keep track of the actual type of the elements and add many casts. In contrast, adding generic types to the Java language adds a little in complexity, but now it is the compiler rather than you that keeps track of the type of the elements and adds the casts.

In particular, generic types have the advantage of turning run-time exceptions into compile-time errors. Here's what Bill Joy, one of the authors of the Java language specification, had to say about adding generics to Java (Java One Conference, June 1999):

We're continuing to work on the idea of catching more of the errors during development, putting a parameterized type system in the language. For me, it's not so much to make the language more expressive, but to get rid of casts so there are less errors found after you ship the code.

After you ship, it costs you about 10,000 times as much to fix a software bug, and as a programmer, it's also really annoying. If the bug is caught at compile time, you're sitting near what's going on. The code concepts are in your mind. If the bug report comes in from the field, it gets assigned to bug tracking and eventually makes it back to your desk or to somebody else's desk, and the thinking that went into the original design isn't there anymore. You have to reload your "cache" memory in your brain. So this whole idea of catching errors up front is a real advantage.

If you see the class Object mentioned in a Java program, it is usually a sign that the program would benefit from the use of generic types. You might say that the class is well named -- when you see it you should "object" and use generic types instead.

GJ is a language design that extends Java with generics. It demonstrates that it is possible to add generics to Java in a simple and usable way. Several large programs have been implemented in GJ, including the GJ compiler itself. As mentioned earlier, other languages support generic types in other ways. In particular, C++ supports generic types with templates. Although they have similar syntax and similar uses, C++ templates and GJ generics are implemented in quite different ways.

C++ templates are implemented by expansion, making one copy of the code for each type where it is used (for instance, in our first example, there would be three copies, one for bytes, one for strings, and one for lists of strings) -- this frequently leads to code bloat. Furthermore, because the template might be defined in one file and used in another file, errors caused by the expansion are often not reported until link time and can often be hard to track down. My colleagues Brian Kernighan and Rob Pike report on a small C++ program where templates generated a variable name 1594 characters long. (See The Practice of Programming, Addison-Wesley, 1999.)

In contrast, GJ generics are implemented by erasure, so there is no code bloat. All of the constraints that a type variable must satisfy are specified by the bounds on that variable, so all errors are reported at compile time, not link time. (This is just as well because Java has dynamic linking, so link time is the same as run time.) On the other hand, to make this work smoothly, type parameters must always be a reference type like Byte rather than a primitive type like byte. So, until Java compilation technology improves, GJ generic types may be less efficient than C++ templates. (Perhaps this is no big deal, since until compilation technology improves, Java is less efficient than C++ anyway.)

While GJ is designed to be eminently practical, it has its roots in some esoteric theory. Ideas that contributed to the design of GJ include Church's lambda calculus from the 1930s, Curry and Hindley's type inference system from the 1950s, and Girard and Reynold's polymorphic calculus from the 1970s. GJ programmers don't need to understand these concepts, but they helped the GJ designers to do a better job. Mathematics from the last century is relevant to designing languages for the next millennium.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.