In CUDA, Supercomputing for the Masses: Part 16 and Part 17 of this article series, I discussed new features in the CUDA Toolkit 3.0 release that can make day-to-day development tasks easier, less error prone, and more consistent. Essentially expanded, consistent coverage appears to have been the thinking behind the release that includes memory checking, runtime and driver interoperability, C++ class and template inheritance as well as Fermi and OpenCL enhancements.
This installment returns to the topic of mixing OpenGL and CUDA C within the same application first introduced in Part 15 of this series. Part 15 demonstrated how to create 2D images with CUDA C on a pixel-by-pixel basis and display them with OpenGL through the use of PBOs (Pixel Buffer Objects). This article will complete that discussion by demonstrating how to use VBO (Vertex Buffer Objects) to create 3D images with CUDA C and render them using OpenGL as 3D collections of points, wire frame images, and surfaces.
The provided examples will demonstrate how to achieve very high rendering and compute performance through the use of primitive restart, an OpenGL extension CUDA programmers can exploit to by-pass PCIe bottlenecks. On a GTX 285, primitive restart can be used to render at 60-90 frames per second faster than other optimized OpenGL routines such as multiDraw. Even highly-experienced OpenGL programmers should find this article and working programming examples to be both new and informative as the OpenGL standards compliant primitive restart capability can deliver high-performance high-quality graphics even when the images require irregular meshes.
Readers should note that care was taken in the design of the software framework to make it as easy as possible to adapt to new applications. With only minor refactoring, the same framework used for the PBO examples in the Part 15 article has been extended to create animated 3D point, wireframe and surface images. Different CUDA kernels can then be used to create different visualization applications ranging from simple to complex. (Similarly it was converted in Part 17 to a C++ class framework that can use difference CUDA kernels and define new textures via inheritance.) Also, the examples in this article (and Part 17) support both the CUDA 3.0 and deprecated pre-3.0 graphics interoperability APIs.
This article first creates an animated sinusoidal surface based on the NVIDIA simplGL.cu CUDA kernel. A single image from this example is shown in Figure 1. Then the Perlin noise generator from the simple PBO article will be slightly modified to create a virtual terrain model that the user can fly around in as well as dynamically alter with keyboard commands. Figure 2 shows a sample surface, Figure 3 a pilot's eye view, Figure 4 a wireframe version and Figure 5 shows an artificial terrain rendered with points.
Essentially, creating new, complicated applications can be as simple as compiling with a different CUDA kernel. For clarity, separate 3D vertex and color arrays are used within the source code for both data creation and display. This should help speed understanding and make data visualization as easy as writing a new kernel or loading data from disk to alter the 3D vertex array, color array, or both. Those readers who choose to create their own CUDA kernels should gain a strong practical sense of how easy and flexible visualization can be with a combined CUDA/OpenGL approach.
The examples are known to compile and run on Linux and Windows although this article discusses how to build the codes under Linux. Go to the GPUcomputing.net website for information about building the examples with Microsoft Visual Studio.