Channels ▼
RSS

Tools

CUDA, Supercomputing for the Masses: Part 15


Final Steps to Render an Image from a CUDA Application

To draw an image from a CUDA application requires the following steps:

  1. Allocate OpenGL buffer(s) that are the size of the image.
  2. Allocate OpenGL texture(s) that are the size of the image.
  3. Map OpenGL buffer(s) to CUDA memory.
  4. Write the image from CUDA to the mapped OpenGL buffer(s).
  5. Unmap the OpenGL buffer(s).
  6. Bind the texture to the OpenGL buffer.
  7. Draw a Quad that specifies the texture coordinates at each corner.
  8. Swap front and back buffers to draw to the display.

As previously discussed, createPBO() allocates the OpenGL PBO buffer(s) with glBufferData(), thereby fulfilling step 1. Similarly, createTexture() allocates the OpenGL texture(s) specified in step 2 that can be used for rendering the image to the display.

The routine display() calls the CUDA kernel that creates or modifies the data in the OpenGL buffer, then renders the new image to the screen as in Figure 3:

Figure 3

As you can see in Figure 3, the routine runCuda() performs steps 3-5 by mapping the OpenGL buffer to CUDA through the call to cudaGLMapBufferObject(), As can be seen below, runCuda() maps the OpenGL buffer, launches the CUDA kernel with launch_kernel() (discussed later in this article) and unmaps the buffer with cudaGLUnmapBufferObject().


// Run the Cuda part of the computation
void runCuda()
{
  uchar4 *dptr=NULL;

  // map OpenGL buffer object for writing from CUDA on a single GPU
  // no data is moved (Win & Linux). When mapped to CUDA, OpenGL
  // should not use this buffer
  cudaGLMapBufferObject((void**)&dptr, pbo);

  // execute the kernel
  launch_kernel(dptr, image_width, image_height, animTime);

  // unmap buffer object
  cudaGLUnmapBufferObject(pbo);
}

It is important to clarify the distinction between registering and mapping an OpenGL buffer. Registering sets up the buffer for CUDA/OpenGL interoperability, but doesn't actually make the buffer available to CUDA. Registering a buffer is expensive, so it is only done at creation. Mapping, in contrast, actually hands access of the memory over to CUDA through a pointer. Mapping/unmapping operations are very fast these calls can be used in frequently called sections of code.

Finally, steps 6-8 occur in the callback method display(), contained in the file callbacksPBO.cpp listed below:


//callbacksPBO.cpp (Rob Farber)

#include <GL/glew.h>
#include <cuda_runtime.h>
#include <cutil_inline.h>
#include <cutil_gl_inline.h>
#include <cuda_gl_interop.h>
//#include <cutil_gl_error.h>
#include <rendercheck_gl.h>

// variables for keyboard control
int animFlag=1;
float animTime=0.0f;
float animInc=0.1f;

//external variables
extern GLuint pbo;
extern GLuint textureID;
extern unsigned int image_width;
extern unsigned int image_height;

// The user must create the following routines:
void runCuda();

void display()
{
  // run CUDA kernel
  runCuda();
  
  // Create a texture from the buffer
  glBindBuffer( GL_PIXEL_UNPACK_BUFFER, pbo);

  // bind texture from PBO
  glBindTexture(GL_TEXTURE_2D, textureID);


  // Note: glTexSubImage2D will perform a format conversion if the
  // buffer is a different format from the texture. We created the
  // texture with format GL_RGBA8. In glTexSubImage2D we specified
  // GL_BGRA and GL_UNSIGNED_INT. This is a fast-path combination

  // Note: NULL indicates the data resides in device memory
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, image_width, image_height, 
		  GL_RGBA, GL_UNSIGNED_BYTE, NULL);


  // Draw a single Quad with texture coordinates for each vertex.

  glBegin(GL_QUADS);
  glTexCoord2f(0.0f,1.0f); glVertex3f(0.0f,0.0f,0.0f);
  glTexCoord2f(0.0f,0.0f); glVertex3f(0.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,0.0f); glVertex3f(1.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,1.0f); glVertex3f(1.0f,0.0f,0.0f);
  glEnd();

  // Don't forget to swap the buffers!
  glutSwapBuffers();

  // if animFlag is true, then indicate the display needs to be redrawn
  if(animFlag) {
    glutPostRedisplay();
    animTime += animInc;
  }
}

//! Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
{
  switch(key) {
  case(27) :
    exit(0);
    break;
  case 'a': // toggle animation
  case 'A':
    animFlag = (animFlag)?0:1;
    break;
  case '-': // decrease the time increment for the CUDA kernel
    animInc -= 0.01;
    break;
  case '+': // increase the time increment for the CUDA kernel
    animInc += 0.01;
    break;
  case 'r': // reset the time increment 
    animInc = 0.01;
    break;
  }

  // indicate the display must be redrawn
  glutPostRedisplay();
}

// No mouse event handlers defined
void mouse(int button, int state, int x, int y)
{
}

void motion(int x, int y)
{
}

The routine display() calls runCuda() (which, as we discussed, completes all of steps 1-5). The following code segment after the call to runCuda() binds a texture to the buffer as is required for step 6.


// Create a texture from the buffer
  glBindBuffer( GL_PIXEL_UNPACK_BUFFER, pbo);

  // bind texture from PBO
  glBindTexture(GL_TEXTURE_2D, textureID);


  // Note: glTexSubImage2D will perform a format conversion if the
  // buffer is a different format from the texture. We created the
  // texture with format GL_RGBA8. In glTexSubImage2D we specified
  // GL_BGRA and GL_UNSIGNED_INT. This is a fast-path combination

  // Note: NULL indicates the data resides in device memory
  glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, image_width, image_height, 
		  GL_RGBA, GL_UNSIGNED_BYTE, NULL);

Finally, we can now draw the image!

The following code segment actually draws the image even though (at first glance) it looks like only a square is being drawn! It actually draws a single quad with the texture coordinates for each vertex. (In a sense, the 2D texture is "stretched" and "held" at each of the four corners of the 3D quad as if it were a piece of fabric suspended by four poles.) This is actually a very simple example of texture mapping.

To be more precise, the texture is not "created" by glTexSubImage2d(), rather the data is copied from the buffer to the texture, and then since texturing is enabled and our texture is bound the quad we draw has the texture "glued" on it.


// Draw a single Quad with texture coordinates for each vertex.

  glBegin(GL_QUADS);
  glTexCoord2f(0.0f,1.0f); glVertex3f(0.0f,0.0f,0.0f);
  glTexCoord2f(0.0f,0.0f); glVertex3f(0.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,0.0f); glVertex3f(1.0f,1.0f,0.0f);
  glTexCoord2f(1.0f,1.0f); glVertex3f(1.0f,0.0f,0.0f);
  glEnd();

More complex mappings are clearly possible such as mapping a texture onto a sphere as in Figure 4.

Figure 4

All drawing is performed in an off-screen framebuffer. Once all drawing is completed, the front and back buffers are swapped to make the new image visible. This use of double-buffering prevents visual artifacts from marring the image while the image is being updated.

The remaining callbacks in the file callbackPBO.cpp are straightforward and will not be discussed. Please note that this example does not perform any mouse or motion related activities so the routines mouse() and motion() do nothing. The empty routines are placeholders that will be used in the VBO examples in the next article of this series and permit linking of the executable.

A few simple keyboard commands are defined in the keyboard() routine:

  • <ESC> terminates the application.
  • a or A toggles the animation. Stopping the animation will freeze the picture and stop the frames per second calculation.
  • + and - increases/decreases the animation time increment.
  • r resets the time increment.

While all this might seem overly complicated for our simple example, please remember that OpenGL is general purpose and quite powerful and was designed to enable graphics applications that are far more challenging than the simple code provided here. As mentioned at the beginning of this article, we have done little more than pay cursory attention to the capabilities and features within OpenGL. Interested readers are strongly encouraged to look at the references provided at the beginning of this article -- especially the GPU Gems series of books.

The source code for the CUDA test pattern kernel is extremely simple as can be seen in the method kernel() below:


//kernelPBO.cu (Rob Farber)

#include <stdio.h>

void checkCUDAError(const char *msg) {
  cudaError_t err = cudaGetLastError();
  if( cudaSuccess != err) {
    fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) ); 
    exit(EXIT_FAILURE); 
  }
} 

//Simple kernel writes changing colors to a uchar4 array
__global__ void kernel(uchar4* pos, unsigned int width, unsigned int height, 
		       float time)
{
  int index = blockIdx.x * blockDim.x + threadIdx.x;
  unsigned int x = index%width;
  unsigned int y = index/width;

  if(index < width*height) {
    unsigned char r = (x + (int) time)&0xff;
    unsigned char g = (y + (int) time)&0xff;
    unsigned char b = ((x+y) + (int) time)&0xff;
    
    // Each thread writes one pixel location in the texture (textel)
    pos[index].w = 0;
    pos[index].x = r;
    pos[index].y = g;
    pos[index].z = b;
  }
}

// Wrapper for the __global__ call that sets up the kernel call
extern "C" void launch_kernel(uchar4* pos, unsigned int image_width, 
			      unsigned int image_height, float time)
{
  // execute the kernel
  int nThreads=256;
  int totalThreads = image_height * image_width;
  int nBlocks = totalThreads/nThreads; 
  nBlocks += ((totalThreads%nThreads)>0)?1:0;

  kernel<<< nBlocks, nThreads>>>(pos, image_width, image_height, time);
  
  // make certain the kernel has completed 
  cudaThreadSynchronize();

  checkCUDAError("kernel failed!");
}

The launch_kernel() routine performs any setup prior to starting the method kernel() on the GPU. For this example only the kernel execution configuration is specified and a few simple parameters are calculated so they can be passed to the GPU.

The actual kernel for this example, kernel(), is extremely simple as can be seen above. Each thread assigns an RGB value to each location within the PBO based on animTime and grid coordinates.

It is important to note that cudaThreadSynchronize() is called in launch_kernel() after the kernel execution to ensure all GPU work has completed before control is returned to the host.

Under Linux the executable testpattern can be created with the following command. (Please substitute the path to the CUDA SDK libraries for CUDA_LIBRARIES. Similarly, substitute the path to the CUDA SDK include files for CUDA_INCLUDES.)


nvcc –O3 -L CUDA_LIBRARIES -I CUDA_INCLUDES simpleGLmain.cpp simplePBO.cpp callbacksPBO.cpp kernelPBO.cu -lglut -lGLEW –lcutil –o testpattern

Running the testpattern executable displays a colorful animated test pattern that changes in color over time. Something like Figure 5 should appear on the screen.

Figure 5


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video