Using Primitive Restart for 3D Performance
As mentioned in the introduction, the examples in this article utilize an OpenGL extension called "primitive restart" to minimize communications across the PCIe bus and speed rendering. Simply, primitive restart allows the programmer to specify a data value that is interpreted by the OpenGL state machine as token indicating the current graphics primitive has completed. The next data item is assumed to be at the start of the next graphics primitive. Valid graphics primitives include TRIANGLE_STRIP, TRIANGLE_FAN, and others.
The following illustrates this process. Assume the variable qIndices
contains the indexes of data points that are to be used in drawing a triangle strip:
unsigned int qIndicies[] = { 0, 1, 2, 3, 65535, 2, 3, 4, 5};
The call to glDrawElements
shown below will draw seven triangles. Note: size
is the number elements in qIndicies
.
glDrawElements(GL_TRIANGLE_STRIP, size, GL_UNSIGNED_INT, qIndices);
The following code snippet calls glPrimitiveRestartIndexNV
to specify that the value 65535 (passed via RestartIndex
) is the primitive restart token. The routine glEnableClientState
is then called to tell the OpenGL state machine to start using primitive restart:
glPrimitiveRestartIndexNV(RestartIndex); glEnableClientState(GL_PRIMITIVE_RESTART_NV);
Now a single call to glDrawElements
using qIndicies
will draw four triangles because the value 65535 tells OpenGL to act as if two separate glDrawElements
calls were made.
glDrawElements(GL_TRIANGLE_STRIP, size, GL_UNSIGNED_INT, qIndices);
The advantages of the primitive restart approach are many-fold as:
- All control tokens and data for viewing can be generated and kept on the GPU.
- Variable numbers of items can be specified between the primitive restart tokens. This allows irregular grids and surfaces to drawn as arbitrary numbers of line segments, triangle strips, triangle fans, etc, can be specified depending on the drawing mode passed to
glDrawElements
. - Rendering performance can be optimized by arranging the indices to achieve the highest reuse of the data cache in the texture units.
- Higher quality images can be created by alternating the direction of tessellation as noted in the primitive restart specification and illustrated in Figures 6 and 7.


More information on various OpenGL optimizations including multiDraw (an alternative OpenGL method to draw multiple items with one call) can be found here on the OpenGl website. In particular, the primitive restart specification notes that multiDraw "still remain[s] more expensive than one would like".
A rough performance comparison using example code from this article on Linux demonstrates the speed of primitive restart compared to other techniques. Source code for the examples in this article that can utilize different OpenGL rendering techniques (selectable using preprocessor #define
statements) can be found here on the GPUcomputing.net website. For clarity, the #ifdef
preprocessor statements were not included in the source code provided in this example. Of course, performance results can vary depending on the machine and GPU combination as well as driver version and settings. In addition, no attempt was made to fully optimize any of the drawing methods; see Table 1.

It is important to stress that these frame rates include the time required to re-compute the 3D position and color for every vertex and color in the image. This represents a worst case frame rate scenario that demonstrates the power and speed possible with hybrid CUDA/OpenGL applications. Real applications will undoubtedly deliver much higher performance by recalculating all the data only when necessary.