Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Java References


Feb00: Java References

Jonathan is an Adjunct Associate Professor of Computer Science at New York University and president of Astrel. He can be reached at [email protected].


Of all the features in Java 1.2, references are the most accessible and the most mysterious. Accessible because they are closely tied to Java's garbage collector, mysterious because it is not entirely clear what they are.

The idea behind Java references is easy to understand: They let a program refer to objects without preventing those objects from being garbage collected. There is also a way to obtain control just before an object is collected, so that you can perform clean up actions. That's the story in a nutshell. But references are a low-level feature, difficult to reason about and to use correctly. In addition to explaining how references work, I'll present some useful abstractions that make working with references easier.

Why References?

While a program is running, the garbage collector occasionally seeks out all the objects that the program can access. These reachable objects consist of those pointed to by class variables, those pointed to by local variables in the currently active methods of all threads, and any other objects reachable from the aforementioned objects by following pointers. All other objects are unreachable -- the program will never be able to access them again. If they can't be accessed, they can't possibly affect the computation. So, these unreachable objects are garbage and their storage can be reclaimed.

The principle that an object is garbage if and only if it is unreachable is obviously correct. Unfortunately, it is a little too correct sometimes. To take one example, if you have a way of reconstructing an object, either by performing some computation or reloading it from a file, then you may be willing to let the garbage collector reclaim it if memory is tight. A "soft" reference can handle this case. Another situation occurs if you're keeping a table of information around, keyed by object, the only purpose of which is to serve other parts of the program. When one of the key objects becomes garbage, then you'd like to remove it and its associated information from the table. "Weak" references are used in this situation.

A third kind of reference -- "phantom" references -- are really just another way to be notified when an object is garbage collected, much like the finalize method.

Each kind of reference is represented by a subclass of Reference in the java .lang.ref package. By passing an object to the constructor of one of these classes, you obtain a reference to the object.

A clarification before delving into the details: In Java parlance, the term "reference" is used for the normal relationship between a variable and an object:

String s = "On Sense and Reference";

You might say that s holds a reference to the string. But I prefer to reserve references for one of the three special relationships just mentioned. Instead, I'll designate s as a standard pointer to the string.

Soft References

Consider a special case of the first situation I described. Say you have a large object that is stored in a file, perhaps in serialized form. You must load it into memory to work with it, and you'd like to keep it around, space permitting, but you also want to give the garbage collector the option of freeing the object when necessary.

By using a soft reference to your object (instead of a standard pointer), you can still access the object while allowing the garbage collector to reclaim it. More precisely, if an object can be reached only via soft references, then the object can be reclaimed. The garbage collector would never reclaim an object that is reachable through a standard pointer, no matter how many soft references to it existed.

To use soft references, first get your object:

Object obj = readObjectFromFile(...);

The variable obj holds the object in the normal Java way -- it's a standard pointer. Now pass the object to the SoftReference constructor:

SoftReference ref = new SoftReference(obj);

Then make sure that there are no standard pointers to your object:

obj = null;

Now, you may be able to retrieve your object with the get method:

obj = ref.get();

On the other hand, get may return null, indicating that the garbage collector has reclaimed your object and cleared the reference. In this case, if you really want the object back, you will have to recreate it.

Listing One is a class called SoftObject that embodies this pattern. It maintains a soft reference to an object. Its get method is guaranteed to return the object (if it doesn't throw an exception). If the reference has been cleared, the retrieve method is called and a new soft reference is created. (You can't reuse the old one -- except for being cleared, references are immutable.)

Subclasses of SoftObject may implement the retrieve method as desired. You might perform a computation or download the object over the network. (SoftObjects would be particularly useful for image and sound file downloading.) A third possibility, reading the object from a serialized file, is shown in the FileObject class in Listing Two. If the file sense.ser contained a serialized representation of a String, you could use FileObject to access the String like so:

SoftObject fo = new FileObject(''sense.ser");

...

String s = (String) fo.get();

display(s);

The call fo.get() will return the String immediately if it's available, or read it from the file. As long as there is a standard pointer to the object, such as s or the parameter of the display method, the object will not be garbage collected.

There are dangers in working with references akin to those involving multiple threads, because the garbage collector behaves much like a separate thread. Consider these two lines from the get method of SoftObject:

result = retrieve();

ref = new SoftReference(result);

A seemingly equivalent formulation is:

ref = new SoftReference(retrieve());

result = ref.get();

But there is a problem with this code. If the garbage collector runs after the first line but before the second, it may reclaim the object and ref.get() will return null. The first version doesn't have this problem because it puts the newly retrieved object into a standard pointer before creating the reference.

Java's only guarantee about a soft reference is that it will be cleared before the system runs out of memory. But the hope and the intent is that implementations will choose carefully, which soft references to clear when memory is low, to provide the best possible performance. For instance, an implementation might prefer to clear soft references that haven't been accessed in a while.

Weak References

Weak references share with soft references the property that the garbage collector is welcome to release the contained object if no standard pointers to it exist. The most important difference between them is that no clever algorithms will be applied to clearing weak references. A weak reference is used simply to allow an otherwise unreachable object to be reclaimed. The difference is subtle and is best illustrated with an example.

As you may know, any Java string literals in the same program that are spelled the same are represented by the same String object in memory. For instance, "Frege" == "Frege" is true, even though (in general) s.equals(t) should be used to compare two strings s and t. You can get the same effect by calling the intern method of the String class -- if s.equals(t), then s.intern() == t.intern(). In other words, intern returns the same object for all strings that are equal to one another.

When a single object is used to represent a potentially large group of equal objects, that object is called the "canonical object" for the group. (A related idea is the Flyweight pattern described in Design Patterns, by Erich Gamma et al.) Using canonical objects saves space, because fewer objects need actually be in memory. And it saves time, because the object identity (the == operator, which can be done in one machine instruction) can be used in place of object equality (the equals method).

Lisp symbols are another example of canonical objects. Symbols represent variables in Lisp and consist of a name and a value; the Java version would be:

class Symbol {

String name;

Object value;

}

In a running Lisp program, there is only one symbol with a given name. Symbols are interned, just like Java literal strings.

It is easy to implement canonical objects using a hashtable or other mapping data structure, such as Java 1.2's HashMap. You can keep the canonical objects in the table, and canonicalize (intern) new objects by looking them up in the table and adding them if not present.

A first version of a Lisp symbol class in Java appears in Listing Three. If the Symbol.intern method finds a symbol in the table corresponding to the string, it is the canonical symbol, and is returned XXXX. If it doesn't find a symbol, it creates a new one, which becomes the canonical symbol for that name. Since the constructor is private, the only way to create a symbol is via the intern method. Thus you can guarantee that there is only one Symbol with a given name -- if s.name.equals(t.name), then s == t.

There is just one problem: As more and more names are interned, the size of the table grows without limit. Even if a symbol is no longer reachable by the program, and should be garbage, the pointer to it in the symbol table will prevent it from being garbage collected. (This is actually the correct behavior for an interactive Lisp interpreter, where users can type in a symbol name at any time, but it is not right for a standalone program.)

The solution, of course, is to use Java references. If the symbol table holds weak references to Symbols instead of the Symbols themselves, then the table will not prevent the garbage collector from reclaiming an unreachable Symbol. To adapt the Symbol class to use weak references, change the intern method to that of Listing Four.

Weak references are more appropriate in this case than soft references, for two reasons. First, there is no complicated memory juggling going on here as there was in our first example. Of course, if space were infinite, we wouldn't need to bother with references. So, in that sense, memory is still our concern. But we don't want the system to waste its time applying clever algorithms to determine which weak references should be cleared first. Once a Symbol is garbage, it's garbage, and any weak reference to it should be cleared pronto.

There is a second reason for preferring weak references to soft references where canonical objects are involved. A soft reference may be cleared even if weak references still exist, but the opposite will not happen. In other words, consider a situation in which an object is reachable by one soft reference and one weak reference, and nothing else. Then the soft reference will be cleared before the weak one. This can violate the correctness of canonical objects if the table is implemented with soft references.

Here is a scenario that demonstrates the problem. Assume your symbol table were to use soft references. First, you intern a new symbol and keep a weak reference to it:

WeakReference r =

new WeakReference(Symbol .intern("Gottlob"));

There are now only two references to the symbol: the weak reference r and the soft reference inside the table. The soft reference will be cleared first. After it is cleared, you intern the same string, causing a new canonical symbol to be created:

Symbol s1 = Symbol.intern("Gottlob");

Now you retrieve the original canonical symbol from the weak reference:

Symbol s2 = (Symbol) r.get();

If the weak reference has not yet been cleared, then s1 and s2 are two different Symbols with the same name, violating the rule that governs canonical objects.

This problem occurs because the clearing of a soft reference does not imply that the object is completely unreachable. When a weak reference is cleared, there is truly no way to reach the object (not even through another weak reference -- the specification requires that all weak references to an object are cleared atomically).

To summarize, you use weak references instead of soft references to implement canonical objects when you don't need sophisticated memory management, but you do require that once removed from the table, the canonical object can never be retrieved.

Reference Queues

There is still a problem with the table of symbols. Although weak references allow the symbols to be garbage collected, the weak reference objects themselves -- and the space they use in the table -- are reachable through standard pointers and will not be reclaimed. The table entry for a symbol should be removed when the symbol is collected.

You will quickly appreciate that the solution is not to have another level of weak references to the existing weak references. Where would you store these new reference objects? Taking another track, you could set up a thread that periodically scans the table and removes cleared references and their keys, but much of that thread's effort would be wasted examining uncleared references. Ideally, you would like to be notified whenever a reference is cleared.

Reference queues do just that. If a reference is created with a reference queue, then it will be placed on that queue after it is cleared. The program can periodically check the queue and perform any cleanup operations associated with the queued references. It can do that in a separate thread, or as part of another activity. A simple and natural choice for the symbol table is to clean up each time intern is called.

Rather than modify our symbol table, let me present a generalization of it that incorporates reference queues. Called CanonicalTable, it resides in Listing Five.

You typically create a CanonicalTable with a factory object, which is used to create new canonical objects when one is not found in the table. The factory for the Symbol table would call the Symbol constructor. Besides a factory, an instance of CanonicalTable contains a HashMap and ReferenceQueue. (The ReferenceQueue class is also in java.lang.ref.)

Calling the canonicalize method with a key has the same effect as calling Symbol's intern method: The canonical object is returned if present, otherwise a new one is created (using the factory) and returned. A second version of canonicalize takes an object as well as a key, with the understanding that this object is to become the canonical object if none is found in the table.

In both cases, canonicalize begins by doing a cleanup, the details of which I'll examine shortly. It then proceeds much like Symbol.intern, looking up the key in the map and creating a new canonical object (or using the supplied one) if necessary. The only difference is that when the WeakReference is created, its constructor is given the reference queue.

The cleanup method, called at each invocation of canonicalize, dequeues references from the reference queue by calling the queue's poll method, which returns null when the queue is empty. If a reference is dequeued, that means the canonical object to which it refers is about to become garbage, so the key-value pair for that object should be removed from the map. Because references are cleared before being queued, there is no way to retrieve the canonical object. So, if WeakReferences were used directly, cleanup wouldn't be able to determine which key to remove. The solution is to write a subclass of WeakReference with an instance variable to hold the key. This subclass, WeakValue, is a private inner class of CanonicalTable. When a WeakValue is dequeued, its key can be extracted and used to remove the key-value pair from the map:

map.remove(((WeakValue) r).key);

The end result is a CanonicalTable that cleans up after itself, removing canonical objects that are eligible for reclamation.

It's worth mentioning another application of weak references. Say you wished to associate additional data with some objects. One approach would be to write a subclass with additional instance variables, but that wouldn't be viable if you didn't have control over the creation of the objects. For example, you might want to associate additional information with each thread of your program, even the threads created and used internally by the Java Virtual Machine. Java supports these thread-local variables with two java.lang classes, ThreadLocal and InheritableThreadLocal.

These classes could work by using a HashMap from threads to variables, except for the problem that a thread and its associated variables will never be garbage collected as long as the thread is present in the table. As you know by now, the fix is to use weak references to hold the threads. Unlike CanonicalTable, weak references here must hold the keys of the map instead of the values. The JDK supplies such a data structure as java.util.WeakHashMap. Its source code is required reading for students of references.

Phantom References

Like soft and weak references, phantom references have a get method, but it always returns null -- you can never retrieve the contained object. (That explains the ghoulish name.) So phantom references are useful only in conjunction with reference queues. When you dequeue a phantom reference, you know that an object is effectively garbage, so you can clean up after it. Specifically, no soft or weak references to the object exist -- a phantom reference is enqueued only after all other references have been cleared -- and the object's finalize method, if any, has been called.

Java's finalization mechanism might seem to render phantom references useless. An object's finalize method is called just before the object is garbage collected, to provide a chance for cleaning up. Moreover, the finalize method has access to the entire object, while a phantom reference does not.

Phantom references solve two problems with finalization. The first is that the finalize method is called by a thread you know nothing about at a time you cannot predict. finalize methods have to be written very carefully to avoid unwanted interactions with your program. And exceptions thrown by the finalize method are simply ignored, which, as you can imagine, makes debugging finalizers a delight. The safest thing to do in a finalize method is to place the object on a queue for later processing at the program's convenience. This is just the functionality phantom references provide.

The second problem with the finalize method is that there might not be one. If a class's objects need to be finalized but the class writer has neglected to write a finalize method, phantom references can help. For example, say a class acquires an external resource -- something outside the program, like a file descriptor or network connection -- but neglects to provide a finalize method to release it:

class Leaker {

int erToken = ExternalResource .acquire();

// no finalize method release

}

Here, I'm imagining that the class for the resource returns an integer token representing the resource. If your code creates Leakers, then you can subclass Leaker and write a finalize method. But if you don't have control over object creation, this solution isn't available. However, if you can access the external resource token inside a Leaker, you can create a phantom reference to each Leaker object and do the release yourself.

A PhantomReference itself can tell you nothing about the moribund object or what to do about it, so you must always create a subclass of PhantomReference that contains cleanup information. Here we hold the resource to be released:

class Releaser extends PhantomReference {

int token;

Releaser(Leaker lkr, ReferenceQueue q) {

super(lkr, q);

this.token = lkr.erToken;

}

}

A crucial subtlety lurks in this code: The object of the phantom reference -- the first argument to the superclass constructor -- must not be stored in an instance variable of the PhantomReference class. If it were, then there would be a standard pointer to the object -- the one in the instance variable -- and the object would never be eligible for garbage collection.

Now each time you are given a Leaker, you create a phantom reference to it, associated with a particular reference queue. The referencing object must itself be accessible by a standard pointer -- you don't want it to get garbage collected before it can do its job -- so you'll add it to a list. (I'm using Java 1.2 collections, but you can just as well use a Vector.)

ReferenceQueue leakerQueue = new ReferenceQueue();

List releasers = new ArrayList();

...

Leaker lkr = ...;

releasers.add(new Releaser(lkr, leakerQueue));

Now, whenever you want, you can do some cleaning up:

Releaser r = (Releaser) leakerQueue.poll();

if (r != null) {

ExternalResource.release(r.token);

r.clear();

releasers.remove(r);

}

Here, the queue is polled to obtain the next reference whose object is ready to be reclaimed. Then the data in that reference is used to clean up. The reference is removed from the list so it, too, can be garbage collected.

The call to clear is the final nail in the coffin of the Leaker object contained in the reference; after that call, it will be reclaimed. Calling clear is not strictly necessary in this case, because removing the Releaser object from the list renders it unreachable, and when the garbage collector runs again, it will reclaim both the Releaser and the Leaker that it contains. But calling clear explicitly can't hurt, and it may hasten the demise of the Leaker.

If this seems like a lot to go through for one call of a cleanup method, then you might be interested in my Cleanup class; see Listing Six. All you have to do is register a Cleanup.Handler with an object, and it takes care of the rest. Registration involves creating an instance of Cleanup .Handler and calling the register method:

Leaker lkr = ...;

final int token = lkr.erToken;

Cleanup.register(lkr, new Cleanup.Handler() {

public void cleanup() {

ExternalResource.release(token);

}})

;

It's important that you don't refer to lkr inside the cleanup method, for the same reason I discussed previously. If you do, lkr will never be garbage collected.

The actual cleaning up can be done directly, whenever you wish, by calling the doPending method of Cleanup:

try {

Cleanup.doPending();

} catch (Exception e) {...}

The doPending method propagates any exceptions thrown by Cleanup.Handlers.

Or you can start a thread to clean up continuously in the background, using the startBackground method. This thread uses the remove method of ReferenceQueue, which makes its calling thread wait until a reference is enqueued.

What happens to exceptions thrown by Cleanup.Handlers called from the background thread? In "Multithreaded Exception Handling in Java" (Java Report, August 1998), Joe De Russo III and Peter Haggar suggest using an event listener-like mechanism for communicating exceptions between threads. Here, I adopt a simpler, if less flexible, solution. Exceptions are accumulated into a list, which may be obtained at any time by calling Cleanup.getExceptions.

Conclusion

References are obviously not for the casual programmer. Leave consideration of references for late in the implementation phase of your project, and give precedence to abstractions like SoftObject, CanonicalTable, WeakHashMap, and Cleanup over naked references. When used correctly, references are a powerful tool for communicating with the garbage collector.

DDJ

Listing One

import java.lang.ref.*;

public abstract class SoftObject {
    private SoftReference ref = new SoftReference(null);
    public Object get() throws Exception {
        Object result = ref.get();
        if (result == null) {
            result = retrieve();
            ref = new SoftReference(result);
        }
        return result;
    }
    protected abstract Object retrieve() throws Exception;
}
                

Back to Article

Listing Two

import java.io.*;

public class FileObject extends SoftObject {
    private String filename;
  FileObject(String fn) {
        filename = fn;
    }
    protected Object retrieve() 
            throws IOException, ClassNotFoundException {
        ObjectInputStream in = 
            new ObjectInputStream(
                new FileInputStream(filename));
        try {
            return in.readObject();
        } finally {
            in.close();
        }
    }
}

Back to Article

Listing Three

import java.util.*;
import java.lang.ref.*;

class Symbol {
    private String name;
    Object value;

    private static Map table = new HashMap();
    private Symbol(String nm) { name = nm; }

    String getName() { return name; }

    static Symbol intern(String name) {
        Symbol s = (Symbol) table.get(name); 
        if (s == null) {
            s = new Symbol(name);
            table.put(name, s);
        }
        return s;
    }
}
            

Back to Article

Listing Four

static Symbol intern(String name) {
    Reference r = (Reference) table.get(name);
    Symbol s = null;
    if (r != null)
        s = (Symbol) r.get();
    if (r == null || s == null) {
        s = new Symbol(name);
        table.put(name, new WeakReference(s));
    }
    return s;
}       

Back to Article

Listing Five

import java.util.*;
import java.lang.ref.*;

/** This class is for maintaining canonical objects. */
public class CanonicalTable {
    private Map map = new HashMap();
    private ReferenceQueue queue = new ReferenceQueue();
    private Factory factory;

    public interface Factory {
        public Object create(Object key);
    }
    public CanonicalTable() {}
    public CanonicalTable(Factory f) {
        factory = f;
    }
    public synchronized Object canonicalize(Object key) {
        return canonicalize(key, null);
    }
    public synchronized Object canonicalize(Object key, Object o) {
        cleanup();
        Object value = map.get(key);
        if (value != null)
            value = ((WeakReference) value).get();
        if (value != null)
            return value;
        else {
            if (o == null)
                o = factory.create(key);
            map.put(key, new WeakValue(key, o, queue));
            return o;
        } 
    }
    public synchronized Object get(Object key) {
        cleanup();
        Object value = map.get(key);
        if (value != null)
            return ((WeakReference) value).get();
        else
            return null;
    }
    private void cleanup() {
        Reference r;
        while ((r = queue.poll()) != null)
            map.remove(((WeakValue) r).key);
    }
    ////////
    private static class WeakValue extends WeakReference {
        Object key;
        WeakValue(Object k, Object o, ReferenceQueue q) {
            super(o, q);
            key = k;
        }
    }
}

Back to Article

Listing Six

import java.util.*;
import java.lang.ref.*;
/** A class for simplifying the use of phantom references. */
public class Cleanup {
    // Doubly linked list of CleanupReferences, with an empty header.
    private static CleanupReference list = new CleanupReference();
    private static ReferenceQueue queue = new ReferenceQueue();
    private static Thread backgroundThread = null;
    private static ArrayList exceptions = null;

    public interface Handler {
        public void cleanup() throws Exception;
    }
    /** Register a cleanup handler with an object. */
    public static void register(Object o, Handler h) {
        synchronized (list) {
            CleanupReference r = new CleanupReference(o, h);
            r.linkAfter(list);
        }
    }
    /** Perform all pending cleanup operations. */
    public static void doPending() throws Exception {
        Reference r;
        while ((r = queue.poll()) != null)
            ((CleanupReference) r).cleanup();
    }
    /** Start a thread to do cleanup in the background. */
    public static synchronized void startBackground() {
        if (backgroundThread != null)
            return; // already running
        backgroundThread = new Thread(new Runnable() {
            public void run() {
                while (!Thread.interrupted()) {
                    try {
                      CleanupReference r = (CleanupReference) queue.remove();
                        r.cleanup();
                    } catch (InterruptedException e) {
                        // do nothing; loop will end
                    } catch (Exception e) {
                        addException(e);
                    }
                }
            }
        });
        backgroundThread.setPriority(Thread.MIN_PRIORITY);
        backgroundThread.start();
    }
    /** Stop the background cleanup thread. */
    public static synchronized void stopBackground() {
        if (backgroundThread != null) {
            backgroundThread.interrupt();
            backgroundThread = null;
        }
    }
    /** Get a list of all exceptions generated by cleanup 
        calls in the background thread. */
    public static synchronized List getExceptions() {
        ArrayList result = exceptions;
        exceptions = null;
        return result;
    }
    private static synchronized 
    void addException(Exception e) {
        if (exceptions == null)
            exceptions = new ArrayList();
        exceptions.add(e);
    }
    ////////////////////////////////////////////
    private static class CleanupReference 
                extends PhantomReference {
        private Handler handler;
        private CleanupReference next, prev;

        CleanupReference() {   // Used only for head of linked list.
            // Queue is never garbage; ensures 
            // no enqueuing.
            super(queue, queue);
            next = prev = this;
        }
        CleanupReference(Object o, Handler h) {
            super(o, queue);
            handler = h;
        }
        void linkAfter(CleanupReference c) {
            this.prev = c;
            this.next = c.next;
            c.next.prev = this;
            c.next = this;
        }
        void cleanup() throws Exception {
            try {
                handler.cleanup();
            } finally {
                this.clear();
                synchronized (list) {  // unlink
                    this.prev.next = this.next;
                    this.next.prev = this.prev;
                }
            }
        }
    }
} 

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.