Creating and Destroying Java Objects: Part 2

Anticipate problems before they occur.


September 17, 2008
URL:http://www.drdobbs.com/jvm/creating-and-destroying-java-objects-par/210602264

Editor's Note: This article is based on Effective Java, Second Edition, by Joshua Bloch, which presents proven rules ("items") for improving your programs and designs (a format, by the way, borrowed from Scott Meyers's Effective C++. All in all, Effective Java, Second Edition consists of 78 items.


Item 4: Enforce Noninstantiability With a Private Constructor

Occasionally you'll want to write a class that is just a grouping of static methods and static fields. Such classes have acquired a bad reputation because some people abuse them to avoid thinking in terms of objects, but they do have valid uses. They can be used to group related methods on primitive values or arrays, in the manner of java.lang.Math or java.util.Arrays. They can also be used to group static methods, including factory methods (Item 1), for objects that implement a particular interface, in the manner of java.util.Collections. Lastly, they can be used to group methods on a final class, instead of extending the class.

Such utility classes were not designed to be instantiated: an instance would be nonsensical. In the absence of explicit constructors, however, the compiler provides a public, parameterless default constructor. To a user, this constructor is indistinguishable from any other. It is not uncommon to see unintentionally instantiable classes in published APIs.

Attempting to enforce noninstantiability by making a class abstract does not work. The class can be subclassed and the subclass instantiated. Furthermore, it misleads the user into thinking the class was designed for inheritance (Item 17). There is, however, a simple idiom to ensure noninstantiability. A default constructor is generated only if a class contains no explicit constructors, so a class can be made noninstantiable by including a private constructor:

// Noninstantiable utility class
public class UtilityClass {
   // Suppress default constructor for noninstantiability
   private UtilityClass() {
      throw new AssertionError();
   }
   ... // Remainder omitted
}

Because the explicit constructor is private, it is inaccessible outside of the class. The AssertionError isn't strictly required, but it provides insurance in case the constructor is accidentally invoked from within the class. It guarantees that the class will never be instantiated under any circumstances. This idiom is mildly counterintuitive, as the constructor is provided expressly so that it cannot be invoked. It is therefore wise to include a comment, as shown above.

As a side effect, this idiom also prevents the class from being subclassed. All constructors must invoke a superclass constructor, explicitly or implicitly, and a subclass would have no accessible superclass constructor to invoke.

Item 5: Avoid Creating Unnecessary Objects

It is often appropriate to reuse a single object instead of creating a new functionally equivalent object each time it is needed. Reuse can be both faster and more stylish. An object can always be reused if it is immutable (Item 15).

As an extreme example of what not to do, consider this statement:


String s = new String("stringette"); // DON'T DO THIS!

The statement creates a new String instance each time it is executed, and none of those object creations is necessary. The argument to the String constructor ("stringette") is itself a String instance, functionally identical to all of the objects created by the constructor. If this usage occurs in a loop or in a frequently invoked method, millions of String instances can be created needlessly.

The improved version is simply the following:


String s = "stringette";

This version uses a single String instance, rather than creating a new one each time it is executed. Furthermore, it is guaranteed that the object will be reused by any other code running in the same virtual machine that happens to contain the same string literal [JLS, 3.10.5].

You can often avoid creating unnecessary objects by using static factory methods (Item 1) in preference to constructors on immutable classes that provide both. For example, the static factory method Boolean.valueOf(String) is almost always preferable to the constructor Boolean(String). The constructor creates a new object each time it's called, while the static factory method is never required to do so and won't in practice.

In addition to reusing immutable objects, you can also reuse mutable objects if you know they won't be modified. Here is a slightly more subtle, and much more common, example of what not to do. It involves mutable Date objects that are never modified once their values have been computed. This class models a person and has an isBabyBoomer method that tells whether the person is a "baby boomer," in other words, whether the person was born between 1946 and 1964:


public class Person {
   private final Date birthDate;
   // Other fields, methods, and constructor omitted

   // DON'T DO THIS!
   public boolean isBabyBoomer() {
      // Unnecessary allocation of expensive object
      Calendar gmtCal =
        Calendar.getInstance(TimeZone.getTimeZone("GMT"));
      gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
      Date boomStart = gmtCal.getTime();
      gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
      Date boomEnd = gmtCal.getTime();
      return birthDate.compareTo(boomStart) >= 0 &&
          birthDate.compareTo(boomEnd) < 0;
   }
}

The isBabyBoomer method unnecessarily creates a new Calendar, TimeZone, and two Date instances each time it is invoked. The version that follows avoids this inefficiency with a static initializer:


class Person {
   private final Date birthDate;
   // Other fields, methods, and constructor omitted
   /**
   * The starting and ending dates of the baby boom.
   */
   private static final Date BOOM_START;
   private static final Date BOOM_END;
   static {
      Calendar gmtCal =
      Calendar.getInstance(TimeZone.getTimeZone("GMT"));
      gmtCal.set(1946, Calendar.JANUARY, 1, 0, 0, 0);
      BOOM_START = gmtCal.getTime();
      gmtCal.set(1965, Calendar.JANUARY, 1, 0, 0, 0);
      BOOM_END = gmtCal.getTime();
   }
   public boolean isBabyBoomer() {
      return birthDate.compareTo(BOOM_START) >= 0 &&
         birthDate.compareTo(BOOM_END) < 0;
   }
}

The improved version of the Person class creates Calendar, TimeZone, and Date instances only once, when it is initialized, instead of creating them every time isBabyBoomer is invoked. This results in significant performance gains if the method is invoked frequently. On my machine, the original version takes 32,000ms for 10 million invocations, while the improved version takes 130 ms, which is about 250 times faster. Not only is performance improved, but so is clarity. Changing boomStart and boomEnd from local variables to static final fields makes it clear that these dates are treated as constants, making the code more understandable. In the interest of full disclosure, the savings from this sort of optimization will not always be this dramatic, as Calendar instances are particularly expensive to create.

If the improved version of the Person class is initialized but its isBabyBoomer method is never invoked, the BOOM_START and BOOM_END fields will be initialized unnecessarily. It would be possible to eliminate the unnecessary initializations by lazily initializing these fields (Item 71) the first time the isBabyBoomer method is invoked, but it is not recommended. As is often the case with lazy initialization, it would complicate the implementation and would be unlikely to result in a noticeable performance improvement beyond what we've already achieved (Item 55).

In the previous examples in this item, it was obvious that the objects in question could be reused because they were not modified after initialization. There are other situations where it is less obvious. Consider the case of adapters [Gamma95,p. 139], also known as views. An adapter is an object that delegates to a backing object, providing an alternative interface to the backing object. Because an adapter has no state beyond that of its backing object, there's no need to create more than one instance of a given adapter to a given object.

For example, the keySet method of the Map interface returns a Set view of the Map object, consisting of all the keys in the map. Naively, it would seem that every call to keySet would have to create a new Set instance, but every call to keySet on a given Map object may return the same Set instance. Although the returned Set instance is typically mutable, all of the returned objects are functionally identical: When one of the returned objects changes, so do all the others because they're all backed by the same Map instance. While it is harmless to create multiple instances of the keySet view object, it is also unnecessary.

There's a new way to create unnecessary objects in release 1.5. It is called autoboxing, and it allows the programmer to mix primitive and boxed primitive types, boxing and unboxing automatically as needed. Autoboxing blurs but does not erase the distinction between primitive and boxed primitive types. There are subtle semantic distinctions, and not-so-subtle performance differences (Item 49).

Consider the following program, which calculates the sum of all the positive int values. To do this, the program has to use long arithmetic, because an int is not big enough to hold the sum of all the positive int values:


// Hideously slow program! Can you spot the object creation?
public static void main(String[] args) {
   Long sum = 0L;
   for (long i = 0; i < Integer.MAX_VALUE; i++) {
      sum += i;
   }
   System.out.println(sum);
}

This program gets the right answer, but it is much slower than it should be, due to a one-character typographical error. The variable sum is declared as a Long instead of a long, which means that the program constructs about 2^31 unnecessary Long instances (roughly one for each time the long i is added to the Long sum).

Changing the declaration of sum from Long to long reduces the runtime from 43 seconds to 6.8 seconds on my machine. The lesson is clear: prefer primitives to boxed primitives, and watch out for unintentional autoboxing.

This item should not be misconstrued to imply that object creation is expensive and should be avoided. On the contrary, the creation and reclamation of small objects whose constructors do little explicit work is cheap, especially on modern JVM implementations. Creating additional objects to enhance the clarity, simplicity, or power of a program is generally a good thing.

Conversely, avoiding object creation by maintaining your own object pool is a bad idea unless the objects in the pool are extremely heavyweight. The classic example of an object that does justify an object pool is a database connection. The cost of establishing the connection is sufficiently high that it makes sense to reuse these objects. Also, your database license may limit you to a fixed number of connections. Generally speaking, however, maintaining your own object pools clutters your code, increases memory footprint, and harms performance. Modern JVM implementations have highly optimized garbage collectors that easily outperform such object pools on lightweight objects.

The counterpoint to this item is Item 39 on defensive copying. Item 5 says, "Don't create a new object when you should reuse an existing one," while Item 39 says, "Don't reuse an existing object when you should create a new one." Note that the penalty for reusing an object when defensive copying is called for is far greater than the penalty for needlessly creating a duplicate object. Failing to make defensive copies where required can lead to insidious bugs and security holes; creating objects unnecessarily merely affects style and performance.

Item 6: Eliminate Obsolete Object References

When you switch from a language with manual memory management, such as C or C++, to a garbage-collected language, your job as a programmer is made much easier by the fact that your objects are automatically reclaimed when you're through with them. It seems almost like magic when you first experience it. It can easily lead to the impression that you don't have to think about memory management, but this isn't quite true.

Consider the following simple stack implementation:


// Can you spot the "memory leak"?
public class Stack {
  private Object[] elements;
  private int size = 0;
  private static final int DEFAULT_INITIAL_CAPACITY = 16;
  public Stack() {
    elements = new Object[DEFAULT_INITIAL_CAPACITY];
  }
  public void push(Object e) {
    ensureCapacity();
    elements[size++] = e;
  }
  public Object pop() {
    if (size == 0)
      throw new EmptyStackException();
    return elements[--size];
  }
  /**
  * Ensure space for at least one more element, roughly
  * doubling the capacity each time the array needs to grow.
  */
  private void ensureCapacity() {
    if (elements.length == size)
    elements = Arrays.copyOf(elements, 2 * size + 1);
  }
}

There's nothing obviously wrong with this program (but see Item 26 for a generic version). You could test it exhaustively, and it would pass every test with flying colors, but there's a problem lurking. Loosely speaking, the program has a "memory leak," which can silently manifest itself as reduced performance due to increased garbage collector activity or increased memory footprint. In extreme cases, such memory leaks can cause disk paging and even program failure with an OutOfMemoryError, but such failures are relatively rare.

So where is the memory leak? If a stack grows and then shrinks, the objects that were popped off the stack will not be garbage collected, even if the program using the stack has no more references to them. This is because the stack maintains obsolete references to these objects. An obsolete reference is simply a reference that will never be dereferenced again. In this case, any references outside of the "active portion" of the element array are obsolete. The active portion consists of the elements whose index is less than size.

Memory leaks in garbage-collected languages (more properly known as unintentional object retentions) are insidious. If an object reference is unintentionally retained, not only is that object excluded from garbage collection, but so too are any objects referenced by that object, and so on. Even if only a few object references are unintentionally retained, many, many objects may be prevented from being garbage collected, with potentially large effects on performance.

The fix for this sort of problem is simple: null out references once they become obsolete. In the case of our Stack class, the reference to an item becomes obsolete as soon as it's popped off the stack. The corrected version of the pop method looks like this:


public Object pop() {
  if (size == 0)
     throw new EmptyStackException();
  Object result = elements[--size];
  elements[size] = null; // Eliminate obsolete reference
  return result;
}

An added benefit of nulling out obsolete references is that, if they are subsequently dereferenced by mistake, the program will immediately fail with a NullPointerException, rather than quietly doing the wrong thing. It is always beneficial to detect programming errors as quickly as possible.

When programmers are first stung by this problem, they may overcompensate by nulling out every object reference as soon as the program is finished using it. This is neither necessary nor desirable, as it clutters up the program unnecessarily. Nulling out object references should be the exception rather than the norm. The best way to eliminate an obsolete reference is to let the variable that contained the reference fall out of scope. This occurs naturally if you define each variable in the narrowest possible scope (Item 45).

So when should you null out a reference? What aspect of the Stack class makes it susceptible to memory leaks? Simply put, it manages its own memory. The storage pool consists of the elements of the elements array (the object reference cells, not the objects themselves). The elements in the active portion of the array (as defined earlier) are allocated, and those in the remainder of the array are free. The garbage collector has no way of knowing this; to the garbage collector, all of the object references in the elements array are equally valid. Only the programmer knows that the inactive portion of the array is unimportant. The programmer effectively communicates this fact to the garbage collector by manually nulling out array elements as soon as they become part of the inactive portion.

Generally speaking, whenever a class manages its own memory, the programmer should be alert for memory leaks. Whenever an element is freed, any object references contained in the element should be nulled out.

Another common source of memory leaks is caches. Once you put an object reference into a cache, it's easy to forget that it's there and leave it in the cache long after it becomes irrelevant. There are several solutions to this problem. If you're lucky enough to implement a cache for which an entry is relevant exactly so long as there are references to its key outside of the cache, represent the cache as a WeakHashMap; entries will be removed automatically after they become obsolete. Remember that WeakHashMap is useful only if the desired lifetime of cache entries is determined by external references to the key, not the value.

More commonly, the useful lifetime of a cache entry is less well defined, with entries becoming less valuable over time. Under these circumstances, the cache should occasionally be cleansed of entries that have fallen into disuse. This can be done by a background thread (perhaps a Timer or ScheduledThreadPoolExecutor) or as a side effect of adding new entries to the cache. The LinkedHashMap class facilitates the latter approach with its removeEldestEntry method. For more sophisticated caches, you may need to use java.lang.ref directly.

A third common source of memory leaks is listeners and other callbacks. If you implement an API where clients register callbacks but don't deregister them explicitly, they will accumulate unless you take some action. The best way to ensure that callbacks are garbage collected promptly is to store only weak references to them, for instance, by storing them only as keys in a WeakHashMap.

Because memory leaks typically do not manifest themselves as obvious failures, they may remain present in a system for years. They are typically discovered only as a result of careful code inspection or with the aid of a debugging tool known as a heap profiler. Therefore, it is very desirable to learn to anticipate problems like this before they occur and prevent them from happening.


Joshua Bloch is Chief Java Architect at Google, and previously a Distinguished Engineer at Sun Microsystems.

Related Article

Creating and Destroying Java Objects: Part 1

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.