Optimized Java

Virtual machine features in Java 6 can provide performance improvements--if you can avoid the pitfalls.


June 01, 2006
URL:http://www.drdobbs.com/jvm/optimized-java/188700760

Matt is a software development manager with Parasoft. He can be contacted at [email protected].


The performance dynamics of Java programs are related to fundamental designs in the language itself. Java source code compiles to an intermediate language that a virtual machine (VM) can run on any platform for which the VM has been implemented. Microsoft's .NET is a similar intermediate language and VM design. Most engineers believe that programming languages that offer more abstraction from the hardware and operating system suffer from slower performance. Interpreted languages executed directly from source code without compilation (Perl, for example) are generally used only for scripting because full-blown applications run faster without the overhead of interpreting source code every time.

At the other end of the spectrum, compiled languages such as C++ are typically used when performance is a priority because they produce executables optimized for specific hardware and operating systems. Portability and maintainability are also deciding factors in selecting a programming language. Compiled languages require a dedicated build process for each supported platform, while intermediate languages and interpreted languages execute similarly on multiple platforms as-is (although interpreted languages are not used for commercial software when the original source must be kept hidden). Thus intermediate languages, such as Java and .NET, emerge as a good trade-off between performance optimizations and portability.

Memory management is a major factor affecting software application performance. Typically, more time is spent allocating/deallocating memory than performing actual data computation. While C++ offers direct control over when memory is allocated and freed, Java attempts to abstract memory management by using garbage collection to reclaim memory that the program no longer needs. However, the "pause" associated with garbage collection has been the central argument against using Java when real-time performance is required. Typically, garbage collection is a periodic process that pauses normal program execution to analyze object references and reclaim memory that was allocated but can no longer be accessed by reference. In large Java applications, the pause for garbage collection can last several seconds, which is enough to disrupt any type of real-time communication or control system. Consequently, the memory abstraction provided by garbage collection requires performance-oriented developers to think more carefully about memory management. Even though Java does not provide the same level of control over memory deallocations as C++, programming patterns can still make a huge difference in the memory performance of Java applications.

Overall, the performance improvements that Sun has made to Java over the past decade, combined with proper coding patterns, lets Java compete with other interpreted and compiled languages for the performance crown. Changes to the language grammar in Java 5.0 increase ease of development by including celebrated elements of other languages. However, some language changes contain hidden pitfalls and sacrifice performance in favor of ease-of-use. In this article, I explore the performance implications of implementing some common algorithms using old and new language features to help you decide which patterns should be encouraged or avoided, relative to performance.

String Building

A straightforward example for programming with strings of text shows how memory management, standard API selection, and Java runtime version affect performance.

Whenever characters are read one at a time from an input stream, a string of text may be constructed from individual characters. If the total number of characters is not known, then using the constructor for java.lang.String that takes an array of characters as an input parameter is not an option. In this example, I only examine the time spent in string construction, and a constant 'a' character is used in place of reading from an input stream.

The first implementation leverages Java's ease of development to create a string of 100,000 characters without writing much code:

String string = "";
for (int i = 0; i < 100000; i++)
   string += 'a';

The Java "+" operator makes string concatenation trivial. Most Java developers use "+" or "+=" to concatenate strings when developing in a hurry. One or two characters for the operator is not much to type and it gets the job done. However, under the hood Java defines strings as immutable objects in memory. This means that any time a string value is changed (as with "+=" in this example), a new String object is allocated. The reference to the old String object is replaced, so it is available for garbage collection. Over the course of building this string, 100,000 String objects are allocated and 99,999 are available for garbage collection when execution finishes. This process ran in 325 seconds with a Java 5.0 runtime. The same code ran in 125 seconds with a Java SE 6 runtime on the same machine. Java 5.0 spent 160 percent more time than Java SE 6. Garbage-collection speed for this example is much improved in the latest version of the Java VM.

Experienced Java developers immediately recognize that significantly improved performance can be achieved by changing the implementation. The java.lang.StringBuffer class can be used to reduce the number of object allocations. This class uses an internal array of characters as a buffer that may be larger than the number of characters in the buffer at any given time. The advantage to this approach is that the internal array can be resized in chunks to accommodate several new characters before being resized again. Consequently, the example code can be rewritten to use java.lang.StringBuffer:

StringBuffer buffer =
     new StringBuffer ();
for (int i = 0; i < 100000; i++)
     buffer.append ('a');

The new implementation surprisingly executed in 0 milliseconds. In reality, some time was spent filling the StringBuffer, but java.lang.System.currentTimeMillis(), which was used to measure the time difference, has an approximate 10-millisecond resolution on Windows. Increasing the number of characters from 100,000 to 10,000,000 yielded measurable times. The Java 5.0 runtime executed the loop with 10,000,000 characters in 1310 milliseconds, and the Java SE 6 runtime did the same in 1230 milliseconds. Java SE 6 still provides the best performance, but the difference between the two versions is much narrower than with the original implementation.

Changing the implementation again to leverage the new Java 5.0 API class java.lang.StringBuilder leads to even better performance. java.lang.StringBuilder works as a drop-in replacement for java.lang.StringBuffer with an important difference—StringBuffer is thread safe, so that methods designed to access or modify the contents synchronize with a monitor to ensure that multithreaded interactions never see the buffer in an intermediate state. StringBuilder does not have those protections. It is suitable for single-threaded access, or for multithreaded access when explicit protection for simultaneous access has been implemented in surrounding code. If java.lang.StringBuffer is changed to java.lang.StringBuilder, this example executes in 810 milliseconds in Java 5.0 and 640 milliseconds in Java SE 6. Even with code optimizations, Java 5.0 still requires 27 percent more time than Java SE 6.

The new version of the Java runtime should provide faster execution for most string operations that involve memory allocations. Even though executing in the new runtime produces noticeable improvements, it is no substitute for optimized programming. Awareness of how memory management is done in Java, combined with the new API, improved the performance of string building in this example beyond measure. With 100,000 characters, the improved implementations returned in 0 time; with 10,000,000 characters, the original implementation would never finish in a reasonable time. Unfortunately, this slow implementation is the simplest to program and easiest to overlook performance implications. Poor implementations have given Java a bad reputation for being slower than most programming languages. Java's reputation would be much improved if compilers could detect—or even replace—similarly slow code.

Autoboxing

Changes to the language specification for Java 5.0 added implicit conversion of primitive values to their corresponding wrapper objects and vice versa. Previously, Java versions required that primitive values be wrapped in objects before they could be used with collections or other methods that operate in a general way on any java.lang.Object. Although a similar requirement remains in Java 5.0 and later, the compiler will handle wrapping and unwrapping primitive values automatically. Sun calls this "Autoboxing." Similar to the "+" operator with Strings, Autoboxing's ease of development comes at the expense of extra memory allocations under the hood. Autoboxing is best tested for performance in an isolated example, even though it is typically used with collections or the reflection API.

Example 1 executes a method call and field assignment 100,000,000 times using only primitive values and using Autoboxing and unboxing. The compiler creates new java.lang.Integer objects for each primitive int value passed to the autobox method. The primitive value is then extracted from the java.lang.Integer object and assigned to the value field. Java 5.0 executes this example in 840 milliseconds and 11,670 milliseconds for primitive values and autoboxed values, respectively. Java SE 6 executes in 530 milliseconds and 10,900 milliseconds, respectively. The overhead of Autoboxing is more than a full order of magnitude.

// public field to prevent compiler optimizations
public static int value;
public static void autobox (Integer i) { value = i; }
public static void primitive (int i) { value = i; }
public static void main(String[] args) {
   for (int i = 0; i < 100000000; i++)
     primitive (i);
   for (int i = 0; i < 100000000; i++)
      autobox (i);
}

Example 1: Autoboxing.

Sun's solution to the performance overhead of Autoboxing relies on the old trade-off between memory and speed. Autoboxed values are frequently cached and reused. This is safe because wrapper objects, such as java.lang.Integer, are immutable and can be reused without fear of the wrapped value changing. The performance improvement from caching is seen when the example is changed to pass a constant 7 to the autobox method instead of variable i. Executing with a constant value consumes approximately 6 seconds for both Java 5.0 and Java SE 6.

An implementation that avoids Autoboxing is far more efficient, even with the caching optimization in place. Nevertheless, Autoboxing is still a tempting addition to Java. It eliminates extra coding and, if only executed a few times, does not affect performance. Autoboxing should be avoided for processing large volumes of data. For example, it is beneficial to re-implement a hash map using int primitive values as keys instead of using java.lang.HashMap<Integer,Object> when processing millions of int values. Autoboxing is one more nice addition to the Java language in Version 5.0 to ease development by writing less code—as long as you are aware of its hidden performance overhead.

Escape Analysis

"Escape analysis" is a memory optimization added to Java SE 6. Java VMs operate on memory in a stack that is independent for each running thread and a heap that is shared by all threads. Memory allocations to the stack are faster because the extra synchronization performed by the heap is not needed. Deallocations are essentially free because stack memory is cleared when an execution block returns to its caller. However, there is no way to explicitly code an object allocation to go to the stack instead of the heap. Java SE 6 adds the ability to identify objects that will not escape the execution block, where they are allocated and directly allocate to the stack. This results in a faster allocation and no accumulation of objects for garbage collection.

Example 2(a) allocates a new Rectangle object 10,000,000,000 times to calculate the area of each size combination. Java 5.0 executes this loop in 212 seconds, while Java SE 6 executes it in 189 seconds. Unnecessary synchronization in Java 5.0 for allocations to the heap is likely the reason for the extra 12 percent in time consumed.

(a)
for (int x = 0; x < 100000; x++) {
  for (int y = 0; y < 100000; y++) {
    Rectangle rectangle = new Rectangle (x, y);
    rectangle.area ();
  }
}

(b)
Rectangle rectangle = new Rectangle ();
for (int x = 0; x < 100000; x++) {
  for (int y = 0; y < 100000; y++) {
    rectangle.setHeight (y);
    rectangle.setWidth (x);
    rectangle.area ();
  }
}

Example 2: (a) Allocating a new object; (b) Escape analysis.

Escape analysis can be applied to many common objects in a typical Java application. An additional improvement would be to reuse the allocated memory for each subsequent pass, but the Java SE 6 runtime does not seem to be doing that for this example. Careful programming can again produce benefits beyond compiler and VM optimizations.

Moving the Rectangle object allocation outside the loop, see Example 2(b), causes the same object to be reused (rather than reallocated) during each loop. This implementation change optimizes the example to run in approximately 20 seconds—one order of magnitude faster. Typical programs will not be allocating in a loop, but usually allocate some object or array temporarily in a method body. If the method were called repeatedly in a thread-safe manner, it would be worthwhile to allocate the object only once and store it in a class field.

Although optimizations in new versions of Java VM are nice, they are still no replacement for careful coding. Enhancements to reuse objects with escape analysis certainly helps with this example, but in general, think seriously about memory allocations if performance is important.

Heap and Garbage-Collector Tuning

Garbage collection in Java operates incrementally on separate generations of objects rather than on all objects every time. As with escape analysis, most objects are only needed for a short duration and quickly become eligible for garbage collection. Since J2SE 1.2, the heap has been divided into a young and old generation. The young generation quickly fills up with fresh objects and is garbage collected efficiently without analyzing objects in the old generation. The entire heap is garbage collected only when the old generation reaches a certain capacity. Java 5.0 adds the ability to customize the sizes of specific generations in memory and set ratios for when garbage collections should be performed.

Java 5.0 adds a permanent generation for VM data that is never garbage collected. Information about classes and methods is stored in the permanent generation. This separation makes garbage collections faster by not analyzing permanent data. Unfortunately, the permanent generation can run out of space independent of the traditional heap. If a large number of classes are going to be loaded in the VM, it is recommended to increase the permanent generation size using a VM argument. Otherwise, a "java.lang.OutOfMemoryError: PermGen space" is thrown by the VM. This argument increases the total space for permanent memory to 128 MB:

-XX:PermSize=128m

The permanent generation is particularly useful with class data sharing in Java 5.0. The information about classes and methods can be dumped from the permanent generation to disk and reloaded straight into memory upon the next VM startup. This saves time by not recalculating that information from class files. The end result is that the VM startup time is much faster in Java 5.0 when the permanent class information has been calculated once already.

Other memory characteristics for garbage collection are determined automatically based on machine hardware when no arguments are explicitly provided. Every Java application adheres to its own unique memory pattern. Java garbage collection and memory settings should be adjusted to find the optimal settings for each application. Typical options are optimized for memory and performance profiles of small application startup or server throughput. Sun's web site provides more information on the options and strategies for tuning garbage collection in recent Java versions. It is important to adjust garbage collection and memory settings for every Java application because each will have a slightly different memory profile.

Conclusion

Sun has Java evolution on the right track. Each new version of the Java Runtime Environment (JRE) contains optimizations to execute existing Java code faster than was previously possible. Each new version of the JDK contains additions to the API that future programs can use for even better performance. Widespread exposure to the performance improvements in Java 5.0 and later should dispel the urban legend that interpreted languages are always slow.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.