In CUDA, Supercomputing for the Masses: Part 14 of this article series, I focused on debugging techniques and the use of CUDA-GDB to effectively diagnose and debug CUDA code -- with an emphasis on how to speed the process when looking through large amounts of data and how to use the thread syntax and semantic changes introduced in CUDA-GDB. In this article, I discuss mixing CUDA and OpenGL by utilizing a PBO (Pixel Buffer Object) to create images with CUDA on a pixel-by-pixel basis and display them using OpenGL. A subsequent article in this series will discuss the use of CUDA to generate 3D meshes and utilize OpenGL VBOs (Vertex Buffer Objects) to efficiently render meshes as a colored surface, wireframe image or set of 3D points. All demonstration code compiles and runs under both Windows and Linux.
The articles in this discussion on mixing CUDA with OpenGL cannot do more than provide a cursory introduction to OpenGL. Interested readers should look to the plethora of excellent books and tutorials that are readily available in bookstores and on the Internet. Here are a few that I have found to be useful:
- GPU Gems and NVIDIA OpenGL whitepapers.
- The OpenGL tutorials by Song Ho Ahn.
- The gamedev.net tutorials.
To focus on CUDA rather than OpenGL, I use an OpenGL framework that can mix CUDA with both pixel and vertex buffer objects. It is anticipated that this framework will be used and adapted by many others as they investigate various aspects of mixing CUDA and OpenGL not covered in my articles.
In many cases, only the CUDA kernels that generate the data will need to be modified to create and view your own content -- as will be shown in a second example at the end of this article that generates and allows interactive movement over an artificial landscape. Finally, this same framework will be used in the next article with minor modifications to discuss and demonstrate vertex buffer objects.
In a nutshell, creating a working OpenGL application requires the following steps that are instantiated through the files in the framework as illustrated in the schematic below:
simpleGLmain.cpp
: Create an OpenGL window and performs basic OpenGl/GLUT setup.simplePBO.cpp
: Perform CUDA-centric setup; in this case for a Pixel Buffer Object (PBO).callbacksPBO.cpp
: Define keyboard, mouse, and other callbacks.kernelPBO.cu
: The CUDA kernel that calculates the data to be displayed.
I anticipate that many readers will just copy and paste these four files and build the example. This is fine. Similarly, many readers will also cut and paste the additional two files, perlinCallbacksPBO.cpp and perlinKernelPBO.cu used in the second example to see the artificial landscape in action.
For many, working with the source code of these two examples will be sufficient (along with the comments) to provide the basic visualization functionality needed for their work or to establish a known-working code base that can be leveraged and adapted to create other more advanced CUDA applications.
Following is the source code combined with a discussion of the essential features needed to combine CUDA and OpenGL in the same application to create images.
Interested readers might also like to watch the video of Joe Stam's presentation given at the 2009 NVIDIA GTC (GPU Technology Conference) entitled What Every CUDA Programmer Needs to Know about OpenGL. Joe's presentation discussed many of the OpenGL concepts covered in my articles and provides a live demonstration of the simplest PBO and VBO demonstrations from this and the follow-on article.
Framework and Rational for Combining CUDA and OpenGL
Just as in the NVIDIA SDK samples, GLUT (a window system independent OpenGL Toolkit) was utilized for Windows and Linux compatibility. Figure 1 illustrates the relationship between the four files used in the framework.

As we will see, CUDA and OpenGL interoperability is very fast!
The reason (aside from the speed of CUDA) is that CUDA maps OpenGL buffer(s) into the CUDA memory space with a call to cudaGLMapBufferObject()
. On a single GPU system, no data movement is required! Once provided with a pointer, CUDA programmers are then free to exploit their knowledge of CUDA to write fast and efficient kernels that operate on the mapped OpenGL buffers. However, the separation between OpenGL and CUDA is distinct so OpenGL should not operate on any buffer while it is mapped into the CUDA memory space.
There are two very clear benefits of the separation (yet efficient interoperability) between CUDA and OpenGL:
- From a programming view: When not mapped into the CUDA memory space, OpenGL gurus are free to exploit existing legacy code bases, their expertise and the full power of all the tools available to them such as GLSL (the OpenGL Shading Language) and Cg.
- From an investment view: Efficient exploitation of existing legacy OpenGL software investments is probably the most important benefit this mapped approach provides. Essentially, CUDA code can be gradually added into existing legacy libraries and applications just by mapping the buffer into the CUDA memory space. This allows organizations to test CUDA code without significant risk and then enjoy the benefits once they are confident in the performance and productivity rewards delivered by this programming model.