Dr. Dobb's | OpenGL & Mobile Devices

OpenGL & Mobile Devices

OpenGL is the de facto standard for cross-platform real-time 3D graphics. OpenGL ES extends these capabilities to mobile devices.

May 16, 2006
URL:http://www.drdobbs.com/embedded-systems/opengl-mobile-devices/187203532

When it comes to games, visualization, and even video-stream processing, OpenGL is the de facto standard for cross-platform real-time 3D graphics. So when I first heard that OpenGL was being adapted for things like cell phones and PDAs, I was, well, skeptical. After being spoiled by workstation-class 3D graphics, going back to software rendering on a display half the size of a note card was underwhelming, to say the least.

Two things changed my mind—Sony's PSP and Nintendo's GameBoy DS. When I saw their displays, I knew 3D hardware in handheld devices was a reality. For the first time, I had a real reason to dive into OpenGL ES, a royalty-free, cross-platform API for full-function 2D and 3D graphics for the "embedded space" (www.khronos.org/opengles). About the same time I was working to incorporate OpenGL ES into my OpenGL class at Full Sail (www.fullsail.com), I was also working with Software Bisque (www.bisque.com) on some OpenGL-based astronomy projects. NVidia heard what we were up to and contacted us about moving TheSky Pocket Edition to OpenGL ES-enabled platforms. NVidia supplied me with a development kit (developer.nvidia.com/page/handheld.html), which included the Gizmondo handheld gaming unit. Alas, the company behind the unit is gone, but the Gizmondo still proved to be a valuable prototyping and evaluation tool. Getting something running on the Gizmondo was straightforward because it runs on Windows CE, which lets me use Microsoft's Embedded Visual C++.

The Gizmondo is not a Pocket PC or PDA — it was solely intended as a handheld gaming device—so there are a few hurdles most PDA developers are spared. For instance, I had to use a special "Developers" version of the USB driver to form a regular partnership with the device and copy files back and forth to protected areas. There was also a special program that had to be integrated into Embedded Visual C++'s build steps to digitally sign the executables, or the Gizmondo would not execute them.

The Gizmondo unit contains a 400-MHz ARM9 CPU, and has 64 MB of RAM. The 320×240, 16-bit color (5-6-5 dither) display was driven by a 72-MHz NVidia GoForce 4500 3D GPU, which supports OpenGL ES 1.0 with some extensions. This is more power than was available when 3D games first came to PCs, and I was eager to see my first 3D program running on a handheld device.

Using the EGL Interface

The EGL API was designed to be a platform-independent way to interface OpenGL ES with the underlying window or display system. A single evening was all I needed to get an OpenGL ES buffer up and ready for rendering. Dave Astle and Dave Durnil's book OpenGL ES Game Development (PTR, 2004) was invaluable, covering many aspects of OpenGL ES game development, and NVidia provided the headers and libraries:

#include <gl/egl.h>
#include <gl/egltypes.h>
#include <gl/gl.h>

for bringing in the EGL interface and OpenGL ES functions.

The EGL interface works much like desktop binding interfaces, in that you need handles to the display contexts, arrays of attribute flags for your buffer, and so on. Here are the variable declarations needed for these settings:

EGLDisplay hDisplay = NULL;
EGLint  attrs[3] = { EGL_DEPTH_SIZE, 16, EGL_NONE };
EGLint  nConfigs;
EGLConfig *pConfigs;
EGLSurface hSurface;
EGLContext hContext;

Listing One is the skeleton code for initialization when the window is created, and the cleanup code for when the window is destroyed. The SetupRC() and ShutdownRC() functions are my own for regular one-time OpenGL setup and cleanup, such as loading and freeing textures, and the like. Also included are the WM_PAINT and WM_SIZE message handlers. These handle the standard tasks of setting up the projection and rendering. The paint handler renders, does the buffer swap, and then invalidates the window, creating a looping behavior useful for animation on Windows' event-driven architecture. (The complete code for this project is available at http://www.ddj.com/code/.)

Listing One

case WM_CREATE:
    {
    hDisplay = eglGetDisplay(EGL_NO_DISPLAY);
    eglInitialize(hDisplay, NULL, NULL);
    eglChooseConfig(
        hDisplay, attrs, NULL, 0, &nConfigs);
    pConfigs = (EGLConfig *)malloc(nConfigs * sizeof(EGLConfig));
    eglChooseConfig(hDisplay, attrs, pConfigs, nConfigs, &nConfigs);
    hSurface = e
        glCreateWindowSurface(hDisplay, *pConfigs, hWnd, NULL);hContext = 
            eglCreateContext(hDisplay, *pConfigs, 0, 0);
    eglMakeCurrent
        (hDisplay, hSurface, hSurface, hContext);
    SetupRC();
    }
case WM_DESTROY:
    ShutdownRC();
    eglMakeCurrent(EGL_NO_DISPLAY, EGL_NO_SURFACE, 
         EGL_NO_SURFACE, EGL_NO_CONTEXT);
    eglDestroyContext(hDisplay, hContext);
    eglDestroySurface(hDisplay, hSurface);
    eglTerminate(hDisplay);
    free(pConfigs);
    PostQuitMessage(0);
    break;
case WM_SIZE:
         {
    int iWidth = LOWORD(lParam);
    int iHeight = HIWORD(lParam);
    ChangeSize(iWidth, iHeight);
    }
    break;
case WM_PAINT:
    {
    Render();
    eglSwapBuffers(hDisplay, hSurface);
    InvalidateRect(hWnd, NULL, FALSE);
    }
    break;

OpenGL ES: A Subset of OpenGL

OpenGL ES 1.0 is defined as a subset of OpenGL 1.3. Because of the limits of embedded systems (small footprints, lower power consumption, and so on), many standard OpenGL features were not included in OpenGL ES. For instance, redundant primitives such as GL_QUADS and later are gone, along with the entire glBegin/glEnd immediate-mode paradigm. All geometry must be submitted via vertex arrays. Using immediate mode for geometry batches burns CPU cycles, which are at a premium on the devices for which OpenGL ES is intended.

Also gone from OpenGL ES are display lists, color index mode (good riddance), user-defined clipping planes, stippling, the imaging subset, feedback-and-selection, evaluators, edge flags, polygon modes, and saving-and-restoring attributes with the glPushAttrib family of functions. Although vertex arrays are now the standard way to submit geometry, interleaved arrays and glArrayElement have also been eliminated, and the entire GLU helper library is gone.

However, OpenGL ES does support fixed-point math via the data type GLfixed, and many standard OpenGL functions now have fixed-point variants, each appended with the letter x to indicate the arguments are fixed point. Functions that normally accept double-precision arguments (another feature dropped from OpenGL ES) are further updated to accept either floating-point or fixed-point values. For example, the glFrustum function, which accepted double precision (type GLdouble) parameters in OpenGL, is available in OpenGL ES as glFrustumx() and glFrustumf(), which accept GLfixed parameters and GLfloat parameters, respectively.

Porting to the Gizmondo

The best introduction to OpenGL ES is porting existing OpenGL programs to the Gizmondo. I selected from my book The OpenGL SuperBible (Pearson Education, 1999) a vertex-array example that rendered a 3D textured model on screen. The .TGA loader for textures compiled without any problems, and the main body of code that created and rendered the indexed array required only a few minor changes.

I also decided to load compressed textures via the .dds file format; I recommend this because texture memory is at a premium. According to my SDK notes, once the frame buffer is allocated on the Gizmondo, a mere 830 KB remain for textures. My source Targa file was 192 KB, but dropped to 33 KB when compressed and saved as a .dds file using an NVidia-provided Photoshop plug-in. It is also recommended (but not required) that you forgo mipmapping when possible to conserve even more texture space. I used code provided by NVidia for loading .dds files on Windows and found that it compiled and worked without any change on Windows CE.

The first problem with my existing code was that I used gluPerspective to set up my 3D projection. The GLU library (ee.1asphost.com/vmelkon/glu.html) is not included, and I was forced to use glFrustum. I find glFrustum a pain to use, and anyone spoiled by gluPerspective has to either learn to use glFrustum, or write and use their own equivalent function. Of course, glFrustum really isn't available; you must use either glFrustumf or glFrustumx.

The last hurdle to moving my sample was calculating and displaying the frame rate. Measuring the frame rate is necessary if you want to objectively measure any performance impacts to code changes and optimizations. On Windows, I created a C++ class that worked like a stopwatch. It uses the Win32 QueryPerformanceCounter functions because there is no such hardware timing available on a handheld device. I was pleasantly surprised to learn that this function was supported by Windows CE, although not with the same accuracy and resolution as was available on the desktop. Because I was going to average 100 frames to determine the running frame rate, a little slop was going to be lost as noise, and I would still have a reasonably accurate measure of performance.

The real headache was in my text rendering code. Most desktop OpenGL implementations have some extensions that let you make use of OpenGL native fonts (these extensions are also missing from OpenGL ES). Game developers avoid these as they are often not very fast and don't look that great. The standard game technique for text is to load a texture with your character set and render billboarded quads on the screen, using texture coordinates to access individual characters. This is fast and allows for a good deal of creative freedom (slimy-looking or rust-themed letters, for instance).

My code that implemented this technique used immediate mode. I'd call glBegin at the beginning of text rendering, send down any number of GL_QUADS in an orthographic projection, and then finish with glEnd. However, GL_QUADS or the glBegin/glEnd mechanism for submitting geometry are supported by OpenGL ES. Reengineering my text class to use small triangle strips and send down one letter at a time via vertex arrays was the only real nuisance to using OpenGL ES. Of course, this is the sort of thing you only have to do once and is easy to reuse from project to project.

Results

Finally, I had my first 3D model textured, lit, and spinning on the screen. Not content with the same old example, Full Sail's Ed Womack dug up a better-looking model and I placed a sky background behind his game model of a tri-plane. Figure 1 is the result, running live on the Gizmondo. The plane model is lightweight, only 977 triangles, and I was getting a frame rate of just over 67 fps.

[Click image to view at full size]

Figure 1: Sample image, running live on a typical mobile device.

On small screens, you don't need a lot of geometry for nice-looking models. Using low poly models, 3D games can easily get playable frame rates. Also, in my example, I was using floating-point math. There is no native floating-point support on the ARM9 CPU, and I knew that I was falling back to floating-point emulation, and that biting the bullet and going to fixed point could have dramatic performance benefits. I couldn't have been more wrong.

Float Versus Fixed

Switching to fixed-point math is probably the biggest single change in mindset that developers will face when moving to the embedded space. My SDK had a library for doing fast fixed-point math on the ARM9 CPU, and the documentation encouraged use of this library and fixed point.

In reality, I wasn't doing a lot of fixed-point math. The problem was that I wasn't writing a real game—I was just displaying and rotating a 3D model. So what I was really benchmarking was geometry submission and processing speed. Using floating-point emulation is certainly not going to be faster than fixed-point math for most game-time calculations, and there is quite a bit of math going on behind today's games.

When I converted my model loading code to store the model as arrays of fixed-point values, the frame rate dropped considerably. The conversion of my model data from float to fixed occurs only at startup, and the data sits in a vertex array and is simply dropped to the hardware every frame. When I did this, my frame rate dropped from 67 fps to 52 fps. Even higher resolution test models resulted in a drop of almost one half! What was going on?

According to the spec sheet, the NVidia 4500 processor supports both fixed-point and floating-point geometry batches. However, what is really going on is that fixed-point geometry is being converted to floating-point data before the hardware actually starts the transform and lighting process. My NVidia contact was a bit vague as to whether this was a hardware issue (really supporting only floats), or if the Gizmondo was to blame by insisting in its driver that all fixed-point values go to the hardware as floats. Regardless (at least on this platform), by sending down geometry as fixed point, I was incurring a fixed-to-float conversion cost with every single frame. The lesson learned? At least on the 4500 Go Force platforms, store your static geometry as floating-point vertex data—not fixed point!

My last performance observation was the cost of lighting. If possible, lighting should be avoided in your games. Turning off the lights in my plane sample (and eliminating the submission of normals in the vertex batch) resulted in a jump in frame rate to 106 fps. That's about a 50-percent boost in performance if you can forgo using the standard OpenGL lighting model. The same boost occurred with fixed-point vertex and normal data. There are a number of multitexture lighting tricks that old PC OpenGL games used to avoid lighting in the days of CPU transform—lighting that should prove beneficial to today's OpenGL ES games.

Mobile Phones Get Smaller, Software Builds Get Bigger

The feature and form-factor war in mobile handsets has generated a growing headache for developers of software for mobile devices: Longer and longer compile times have been met with shorter and shorter release cycles. At the same time, feature proliferation has made build times grow longer. This math obviously does not add up.

Case in point: One of the leading global developers of wireless telecommunication products found itself with a build matrix so complex that the build team at one business unit had more than 100 unique software builds to support at any given time. At the same time that complexity was increasing, build times had increased dramatically, such that a single build grew from 20 minutes to more than two hours in just two years. Multiply that by 100 unique builds and the usual overnight build cycle was just not feasible.

To address this bottleneck, the company turned to parallel builds enabled, in this case, by Electric Cloud (www.electric-cloud.com). The wireless company quickly achieved an 84 percent reduction in build speed while enabling engineers to make incremental code changes they avoided in the past due to long compile times. They're now running 30-40 simultaneous builds during peak times of the day with more than 500 developers building to a cluster of 288 CPUs (432 build agents) at one site alone. They've now rolled out parallel build clusters across four sites in three countries.

Conclusion

A frame rate of over 60 fps for my first attempt was encouraging. Although my model contained less than 1000 triangles, NVidia claims that this hardware can transform and draw 3000 to 4500 triangles every 1/60th of a second. These are most certainly unlit triangles. As in the old days of 3D game development, low poly models are a necessary prerequisite. Texture memory is also tight, and using compressed textures is paramount. While game-time math calculations should be fixed point, static geometry submissions may be significantly faster using floating-point values.

Richard writes science visualization and educational software at Starstone Software Systems and teaches OpenGL programming at Full Sail Real World Education. He can be reached at [email protected].