.NET

Java 2Graphics Rendering

By Torpum Jannak, September 01, 1999

The optimized graphics-rendering pipeline Torpum presents here addresses performance concerns you may encounter when developing scientific visualization, action-based games, or other resource-demanding Java 2 applications.

Sep99: Java 2Graphics Rendering

Torpum is the president of ViaTech Solutions. He can be contacted at [email protected].

The Object Pool

With all the features available through the Java Media APIs, writing graphically based applications ranging from static presentations to rendering 3D scenes can become nothing more than an exercise in proper API usage. However, applications such as scientific visualizations and action-based games, which demand the highest in performance from the operating system and virtual machine, push the performance edge of the rendering pipeline. Conceptually, the rendering pipeline (or rendering engine) is the path that a graphical element takes from object instantiation to the final display.

The optimized graphics-rendering pipeline I present here demonstrates an architectural approach to addressing performance concerns. My approach relies on proven optimization strategies to reduce garbage collection interruptions, while utilizing a lean application-rendering pipeline for processing and manipulating output. By making this approach the foundation of your graphically intensive applications, the rendering-pipeline portion of your application can also become as straightforward as just about any other Java API.

Issues Facing Developers

When designing a rendering-engine architecture in Java, several issues immediately come to mind. One of the most severe issues is the garbage collection process of the Java virtual machine. Animation applications often create and destroy tens to hundreds of graphical object instances per frame, while demanding at least 24 frames per second (fps) to achieve visually pleasing output. This unregulated resource usage frequently invokes the garbage collection process at unpredictable times within the animation's rendering process. Object instantiation in itself is also a very time-consuming operation. The timing of the garbage collector is unpredictable, as it varies from one virtual machine implementation to another. Garbage collecting also elongates the response time to user-input events, causes hiccups or stalls in the animation, and reduces user satisfaction and/or confidence in the application.

Another concern involves the unacceptable increase in size and cost of downloading applications when you are optimizing for speed. A robust pipeline solution that can accommodate many applications tends to be unnecessarily complex and significantly larger than a simplified solution. You should balance rendering-engine features with just the requirements of your current application. A good example is the difference in complexity between supporting 2D and 3D applications.

Optimization Approaches

Central to my rendering-pipeline architecture are two optimization techniques -- object pooling and asynchronous message queuing. Object pooling minimizes the overhead of the garbage collection and object instantiation processes. Object pooling preserves memory allocations by providing and maintaining a set number of recyclable object instances for a given object class, thus reducing the overall garbage collecting workload and duration. See the accompanying text box entitled "The Object Pool" for more information.

For better CPU load balancing, the pipeline employs a multithreaded architecture with asynchronous message queuing between threads. With synchronous communications, thread A has to wait for thread B to finish processing the request from A before A can continue with other tasks. The asynchronous approach lets threads communicate with each other without having to wait for the other thread to finish processing the service request. Also, benefiting from a multithreaded rendering pipeline is the overall response time to user-input events because you have finer control over the various thread priorities.

Object pooling and asynchronous message queuing can apply to any Java application, not just multimedia intensive applications. Other rendering-pipeline-specific optimizations include content-driven rendering cycles, display segmentation, and an abridgement of various rendering options in Java 2.

Basic Rendering Process Flow

As Figure 1 illustrates, the process flow for a graphics-rendering engine contains four phases. The first phase is the application logic that determines what objects, visual or not, are active for the next frame of animation. Graphical primitives (polygons, points, and curves), textual elements, and images (sprites and background art) are examples of visual objects. Examples of nonvisual elements include rendering commands, rendering hints, filters, transformation parameters, audio commands, or other event triggers. The exact nature and design of the elements are specific to each application.

The first phase produces the elements to render into the next frame and passes these elements to the second phase. The second phase immediately processes all nonvisual elements and sorts all the visual elements into a discrete rendering order, also known as the "z-depth buffer" or "z-buffer." Visual objects that have a smaller z-depth value are rendered first, allowing visual objects with greater z-depth values to overlay the previous object. One common use for z-depth sorting is ensuring that text messages display on top of other visual elements for every frame.

The z-depth class presented here operates like a hash table, where the z-depth value is the index into the hash table. Because the z-depth indices form a small, discrete range of usually less than 256 levels, the z-depth class can be an array of vectors without the need of a hash function; see Listing One. Each vector in the array stores the visual elements to display a specific z-depth index.

The initial depth of the z-buffer is the smallest amount of simultaneous layers that you will need to properly display all the possible application content. Over-allocation of this depth resource can cause significant performance issues. For example, when viewing live radar feed, you need one layer for the radar images and another for any textual content (such as a legend of colors). An optional third layer can be an animation loop of the radar sweep that applies filters to highlight the image layer beneath the sweep; see Figure 2. None of these elements would normally traverse across z-depth layers from frame to frame, allowing the z-depth data representation to be a member variable of a visual object class.

When all visual objects for the frame are in the z-buffer, the first phase sends a render command to the second phase. This lets the second phase finish preparing the data, then pass the render command over to the third phase. A rendering thread representing this third phase enumerates over all the objects in the z-depth buffer and applies them onto the image buffer associated with the z-buffer.

By supporting enumeration, the z-buffer class guarantees the sort order during this rendering phase. When the rendering of the image buffer is complete, the rendering thread signals the fourth and final phase, the paint thread. It also signals the application logic that a z-buffer instance is now free to begin receiving objects for the next frame. The final phase is the rendering of the image buffer into the current graphics context of the application's display area.

The painting of the display is not a function of a constant time interval, but actually triggers from the availability of the image buffer content. This is an example of a content-driven rendering engine. Content-driven engines can achieve high frame rates for smoother animations because they do not have a timeout restriction such as those of the constant-interval rendering engines.

Constant-interval rendering engines, on the other hand, systematically paint the screen based on a set timer interval, thereby capping the maximum achievable frame rate of the application. For example, if the timeout value is 41.6 milliseconds (ms), the application can achieve a maximum of 24 fps. Even if input event processing takes 10 ms, the initial visual response will still be at least 31.6 ms or worse case 51.6 ms behind if caught on the back edge of the clock cycle. Constant-interval rendering engines are also usually single threaded, meaning that the pipeline will not be able to process any visual objects until the clock triggers, allowing for wasted CPU cycles.

With either engine type, complex frames with a large number of objects may need an expiration time parameter to dictate when to stop frame content processing and simply drop the frame in its entirety. This expiration of the frame can occur in any of the phases, allowing for better load balancing of complex animations. If you choose to disable the frame expiration parameter, the overall frame rate will degrade, but will also provide a way to see which frames may require simplification.

Incorporating Asynchronous Threads

Fundamental to asynchronous threads is the notion that as long as there is something to do, go ahead and do it, otherwise, block and wait. By using the message queuing technique in Figure 3, threads waiting for data to process can wake up only as needed, otherwise freeing the CPU to handle other application logic. Once awake, these threads process all messages on their message queue before going back to sleep.

By dividing the tasks into the four separate phases previously described, each of these phases can become a separate thread that communicates asynchronously. The first thread, the application logic, is the source thread. This thread pulls visual and nonvisual objects from a pool of available objects, formats them, and sends them to the second thread's message queue.

At first, it does not look as if the second thread for z-depth sorting would be necessary, because the first thread can simply take each item and enter them into the z-buffer sequentially. This direct approach is best if you are working with a single display region. However, in many cases, the application may want to divide the display into multiple, nonoverlapping segments, and possibly even have different application source threads driving each of the display segments. An example of display segmentation can be the separation of navigational status, scoring, and/or chat sections. By not allowing overlapped display segments, each segment can simply employ its own independent z-buffer. More importantly, each display segment will have its own rendering thread, and optionally, its own paint thread.

The third thread is the rendering thread, which, for simple applications, is normally part of the paint method of an application's main window component. Here again, if you have multiple nonoverlapping display segments, you can obtain better performance by not combining the rendering thread with the paint thread. By utilizing independent, content-driven rendering cycles in each display segment, you will refresh only the portion of the screen that needs rendering, and perform pixel-pushing operations over a smaller display area. Both facilitate faster performance overall. For example, an action game will render its game-play area many times per second, while updating the player's score area only if there is a change in value.

The rendering thread enumerates over the elements from the z-buffer, renders them into the image buffer, and then sends two asynchronous messages. The first message tells the paint thread that a final image buffer is ready for display. The second message tells the application logic thread that the z-buffer area just processed is available for the next set of elements.

The last phase of the paint thread waits for a fully rendered image buffer in its message queue; see Example 1. The object in the paint thread's message queue contains a rendered image, display positions, and other specific drawing options. Listing Two defines a sample class for this object. The paint thread pulls rendered image objects off the message queue, calls the drawImage() method of the graphics context, and recycles the object back into the object pool; see Listing Three.

Incorporating Object Pools

The simplest model of object pooling involves creating a set-sized vector of the specific class of objects to pool; see Listing Four. Rather than creating a new instance of the object each time, obtain an instance from the front of this vector. Once the object's role is complete, simply place the object back into the object pool. You need to take the extra step to push the object instance back onto the end of the vector for later reuse. Simply letting the object expire at the end of a method, as in normal Java programming protocol, will lead to the eventual garbage collection of that object.

One vital aspect of the object pool is the handling of more demand than supply. If, for example, the object pool only allocates 10 instances, but all the threads together are requesting more than 10, what should it do? It can make the thread wait until another thread releases an instance back into the pool. Perhaps supplying a time limit to the wait duration and throwing an exception if the time limit expires is more practical. Alternately, the pool can grow by allocating and creating more objects, but there must be some logical limit on the growth, otherwise physical memory limits will play a role.

The simplest aforementioned solution to implement is blocking the thread until another thread releases an object back into the pool. The problem is that this may cause a deadlock condition where all threads are blocking out each other waiting for objects to become free. An elementary and disturbing example of this deadlock condition comes when you underestimate the demand of graphical or command elements. The application logic wants to generate more items to render, but the object pool is empty. The rendering thread cannot start until all the objects are available and sorted, and thus cannot release any of its current objects back into the pool. The other two threads will not wake up until the rendering thread finishes a frame.

These deadlock situations will force you to either automatically increase your object pool size, or use a maximum wait duration that throws an exception if the wait was unsuccessful. A more aggressive approach may be to force a render of the lowest level z-depth elements when encountering a resource exception. This minimalist rendering will reduce the visual impact while freeing up some objects back into the pool.

Other Optimization Techniques

There are several other optimizations techniques that you can employ to enhance the rendering-pipeline architecture I've described here. These techniques include:

Copying Resources from a Previous Frame. In most animated applications -- graphically intensive or not -- the active visual elements in a frame will likely remain the same on subsequent frames. To reduce overhead, provide interfaces in both the visual elements and the z-buffer to permit delta operations. Delta operations include copying the entire z-depth buffer, changing an element's position, applying new filters and transformations, as well as removing elements for subsequent frames.

Additional Z-Depth Sorting Parameters. Additional z-depth sorting parameters that affect the position of elements when inserting into the z-buffer are beyond the scope of this article. One example includes ground-point offset sorting within the same z-depth. The ground-point offset requires a y-offset per object, but allows objects of the same z-depth to sort against each other. This additional sorting ability allows a sprite to traverse both behind and in front of another object on the same z-plane, without additional complex logic. A character sprite walking around a tree or other small obstacle is an example of this feature.

Using Rendering Hints. Java 1.2 incorporates the class java.awt.RenderingHints that epitomizes the struggle between rendering speed and rendering quality. Applicable within the java.awt.Graphics2D class, these RenderingHints give you some control over the final rendering algorithms. You should always explicitly declare the RenderingHints that you want to incorporate in your application, since system defaults may vary from platform to platform. Table 1 identifies the RenderingHints values in terms of speed versus quality. The preference of animation-intensive applications is normally speed over quality, with adequate atonement for the quality built into the original art content. Figures 4 and 5 show sample display output for both quality and speed renderings, respectively.

Using Transformations. All graphical primitives in the Graphics2D context use the current transform parameters. For better performance, you should minimize the number of primitives that require a transform operation due to the high computational overhead of these operations. In addition, larger images require larger allocations of CPU cycles to complete the transformation operations. Rotations are especially costly to render and without a doubt will decrease your overall frame rate throughput.

For example, if you want an image to move or translate across the screen, three interfaces exist that provide that capability:

1. Graphics2D.drawImage(BufferedImage, BufferedImageOp, int, int).

2. Graphics2D.drawImage(Image, AffineTransform, ImageObserver).

3. Graphics.drawImage(Image, int, int, ImageObserver).

The last method is a JDK 1.1 interface to directly place the image on an x- y-coordinate without a transformation operation. A simple performance benchmark, using a 52×88 image running on a 350-MHz PC, yields the results in Table 2. Although the time per draw is on the order of a millisecond, the accumulated time in displaying upwards of 30 or more objects or a few very large images can degrade the frame rate.

Conclusion

As with any resource-demanding application, you need to pay special attention to the architecture of the core processes. Many optimization techniques are available. Which ones apply to your application becomes a balancing act between quality, development cost, and time-to-market. Although the techniques I describe here provide the foundation for an efficient rendering pipeline, your application's requirements should dictate the final implementation and applicable optimization techniques.

DDJ

Listing One

import java.util.*;

public class ZBuffer extends Object {
  public static final int    MIN_Z_DEPTH     = 2;
  public static final int    MIN_GROWTH_SIZE = 10;
  public static final int    MAX_GROWTH_SIZE = 100;
  protected           int    m_maxZDepth;
  protected           Vector m_zDepthVectors[];
  
  public ZBuffer(int maxZDepth, int zGrowthSize)
      throws InstantiationException {
    // Validate and set internal variable from initialization param.
    m_maxZDepth   = Math.max(maxZDepth, MIN_Z_DEPTH);
    zGrowthSize   = Math.min(MAX_GROWTH_SIZE,
                      Math.max(zGrowthSize, MIN_GROWTH_SIZE));
    // Create the array of vectors
    m_zDepthVectors = new Vector[m_maxZDepth];
    for (int i = 0; i < m_maxZDepth; ++i) {
      m_zDepthVectors[i] = new Vector(zGrowthSize);
    }
  }
  public void addElement(int zDepth, Object obj) {
    m_zDepthVectors[zDepth].addElement(obj);
  }
  public int getMaxZDepth() { return m_maxZDepth; }

  public void removeElementAt(int zDepth, int index) {
    m_zDepthVectors[zDepth].removeElementAt(index);
  }
  public void removeAllElementsAtDepth(int zDepth) {
    m_zDepthVectors[zDepth].removeAllElements();
  }
  public void removeAllElements() {
    for (int i=0; i < m_maxZDepth; ++i)
      m_zDepthVectors[zDepth].removeAllElements();
  }
  /* Returns an enumeration of the components of this zBuffer
   * along the z axis from back to front.
   * @return  an enumeration of the components of this zBuffer.
   * @see     java.util.Enumeration
   */
  public final synchronized Enumeration elements() {    
    return new ZBufferEnumerator(this);
  }
  /* Supports the enumerator class
   * @return  the total number of elements in the ZBuffer
   */
  public int size() {
    int count = 0;
    for (int i=0; i < m_maxZDepth; --i)
      count += m_zDepthVectors[zDepth].size();
    return count;
  }
  /* Supports the enumerator class
   * @return Vector at the given zDepth
   */
  public Vector getVectorAtDepth(int depth) {
    return m_zDepthVectors[zDepth];
  }
}
final class ZBufferEnumerator implements Enumeration
{
  ZBuffer m_zb;
  int     m_count, m_depthCount, m_totalCount;
  int     m_zLevelCount, m_zLevelElementCount;
  ZBufferEnumerator(ZBuffer zb) {
      m_zb    = zb;
      m_count = m_zLevelCount = m_zLevelElementCount = 0;
    m_totalCount = zb.size();
  }
  public boolean hasMoreElements() { return m_count < m_totalCount; }
  public Object nextElement()
      throws NoSuchElementException {
    Object retObject = null;

      synchronized (m_zb) {
        // Check if we are done with the enumeration, otherwise
        // there is definitely something to return and we can
        // trust the while loop to exit at some point.
        if (m_count < m_totalCount) {
        Vector vCurrent;
        while (retObject == null) {
          vCurrent = m_zb.getVectorAtDepth(m_zLevelCount);
          if (m_zLevelElementCount < vCurrent.size()) {
            retObject = vCurrent.elementAt(m_zLevelElementCount++);
          } else {
            // Move to next zdepth level.
            ++m_zLevelCount;
            m_zLevelElementCount = 0;
          }
        }
        ++m_count;
          return retObject;
      }

    }
      throw new java.util.NoSuchElementException("ZBufferEnumerator");
  }
}

Back to Article

Listing Two

import java.awt.Point;
import java.awt.Image;

public class FinalRenderImage extends Object {
  public Point  screenOrigin;    // Upper left corner placement.
  public long   expirationTime;  // time when frame expires
  public Image  imageRef;        // Image for this segment.

  public FinalRenderImage(int id) {
    screenOrigin   = new Point(0,0);
    expirationTime = 0L;
    imageRef       = null;
  }
}

Back to Article

Listing Three

public void paint(Graphics g)
{
  // Pop the element off the paint queue
  RenderImage fri     = (RenderImage)(m_paintQueue.popElement());

  //Check if frame expired. If so, just drop it.
  long frameTimeStamp = System.currentTimeMillis();
  if (fri.expirationTime < frameTimeStamp) {
    if (fri.imageRef != null) {
      g.drawImage(
        fri.imageRef, fri.screenOrigin.x, fri.screenOrigin.y, null);
    }
  }
  // Push the used element back into the object pool's free queue.
  m_freePaintQueue.pushElement(fri);
}

Back to Article

Listing Four

public abstract class ObjectPool {
  
  // Make sure threads waiting on a resource does not block forever
  private static final int TIMEOUT_LIMIT = 100;
  
  // ObjPoolElement is the data structure to hold and monitor
  // the object's usage
  private class ObjPoolElement {
    public boolean isAvailable;
    public Object  theObject;
    public void setObject(Object o) { theObject = o; }
  }
  private int m_maxSize;          // Pool's capacity
  private int m_currentFreeIndex; // round-robin index for getObject
  private ObjPoolElement m_Pool[];// Array of objects in pool
  
  // It is the responsibility of the derived class to define
  // what object is being pooled.  So a pool of visual objects
  // might be declared as 'public VisualObjPool extends ObjectPool'
  // and defines the create() method as 'return new VisualObj();'
  public abstract Object create();

  // Construct and initiate the pool
  public ObjectPool(int maxSize) {
    m_maxSize = Math.max(1, maxSize);
    m_Pool = new ObjPoolElement[m_maxSize];
    for (int i=0; i < m_maxSize; ++i) {
      m_Pool[i] = new ObjPoolElement();
      m_Pool[i].isAvailable = true;
      m_Pool[i].theObject  = create();
    }
  }
  public synchronized int capacity() { return (m_maxSize); }

  // Provide a method to mark all elements as available.
  public synchronized void reset() {
    for (int i=0; i < m_maxSize; ++i)
      m_Pool[i].isAvailable = true;
  }
  // Object allocation method
  public synchronized Object getObject()
  {
    Object retObj = null;        
    int numTries = 0;
    int upperLimit;
    do {
      upperLimit = m_maxSize;
      for (int i=m_currentFreeIndex; i<upperLimit; ++i) {
        if (m_Pool[i].isAvailable) {
          retObj = m_Pool[i].theObject;
          m_Pool[i++].isAvailable = false;
          m_currentFreeIndex = (i==m_maxSize) ? 0 : i;
          break;
        }
        // Couldn't find a free object right of the m_currentFreeIndex
        // so now reset the loop counter and limit to check the left
        if ((i >= upperLimit-1) && (retObj == null)) {
upperLimit = m_currentFreeIndex;
          i = 0;
        }
      }
      // None available, wait once, and then try one more time.
      // Otherwise, return null.
      if ((retObj == null) && (numTries < 2)) {
        ++numTries;
        try { wait(TIMEOUT_LIMIT); }
        catch (java.lang.InterruptedException ie) { /* Do Nothing */ }
      }
    } while ((retObj == null) && (numTries < 2));
    return retObj;
  }
  public synchronized void releaseObject( Object anObj )
    throws ArrayStoreException
  {
    boolean cleared = false;
    for (int i=0; i < m_maxSize; ++i) {
      if (m_Pool[i].theObject == anObj) {
        m_Pool[i].isAvailable = true;
        cleared= true;
        break;
      }
    }
    if (! cleared) {
      throw new ArrayStoreException("Object does not belong to pool");
    }
  }
}

Back to Article

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

Java 2Graphics Rendering

Issues Facing Developers

Optimization Approaches

Basic Rendering Process Flow

Incorporating Asynchronous Threads

Incorporating Object Pools

Other Optimization Techniques

Conclusion

Listing One

Listing Two

Listing Three

Listing Four

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

Java 2Graphics Rendering

Issues Facing Developers

Optimization Approaches

Basic Rendering Process Flow

Incorporating Asynchronous Threads

Incorporating Object Pools

Other Optimization Techniques

Conclusion

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content