Channels ▼


CUDA, Supercomputing for the Masses: Part 20

OpenGL Rendering Methods Trace Analysis

The following Trace All trace was taken with SIMPLE_ONE_BY_ONE defined.

The Compute timeline shows that the k_perlin() CUDA kernel only takes 0.1% of the time, which indicates this rendering method is clearly not limited by the performance of the CUDA kernel. The thinness of the vertical line showing the time taken in k_perlin() relative to the other activities required to render this 3D artificial world visually illustrates the speed of the CUDA kernel. Also note that the Thread State timeline is solid red. A mouseover tells us that the thread is idle.

[Click image to view at full size]

The Tools Extension Events report shows the initialization of the mesh actually takes very little time or roughly 202 μs. We also clearly see the nesting of the methods as recorded by the NVTX calls. Follow-on calls to this rendering method show that the triangle fan mesh initialization is correctly called only once.

[Click image to view at full size]

Utilizing the OpenGL API Call Summary, as seen below, shows that most of the time in the SIMPLE_ONE_BY_ONE rendering code is spent in glDrawElements(), which consumed the vast majority of the capture time.

[Click image to view at full size]

In comparison, the Compute timeline taken when using the PRIMITIVE_RESTART rendering code shows that the k_perlin() CUDA kernel takes 1.1% of the time. In addition, the Thread State timeline rapidly alternates between red and green indicating GPU activity. This is also shown in the Device % timeline.

[Click image to view at full size]

Still, the OpenGL API Call Summary shows that swapping buffers for rendering is easily the dominant runtime component.

[Click image to view at full size]

Zooming in on the first rendering operation with primitive restart shows that the complex, computationally intensive GPU Perlin Noise generation k_perlin() kernel visually appears to take roughly twice the time of the simple triangle fan mesh generation on the host!

[Click image to view at full size]

The primitive restart Tools Extension Events report shows that the mesh initialization only takes 126 μs.

[Click image to view at full size]

In contrast, we see that the MULTI_DRAW rendering code again spends little time in the k_perlin() kernel (circled for clarity in the figure below) -- although the k_perlin() kernel appears to make good use of the device when active. Most of the rendering time is spent in the OpenGL glMultiDrawElements() elements call.

[Click image to view at full size]

This is confirmed with the Tools Extension Events report.

[Click image to view at full size]

Examining the three Tool Extensions Events reports, we gain a much better understanding why primitive restart is so fast. Except for primitive restart, the k_perlin() kernel (even though it purposely has a couple performance issues for readers to find) is overwhelmed by the time taken by the OpenGL rendering calls.

Among the three OpenGL rendering methods, primitive restart is clearly the fastest as the NVTX annotated rendering regions take the following approximate times:

  • Primitive restart: around 60 μs.
  • Multidraw: around 3,900 μs.
  • Iteratively drawing each triangle fan: approximately 1,100,000 μs.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.