Channels ▼


CUDA, Supercomputing for the Masses: Part 20

Analysis of Asynchronous I/O

Although asynchronous I/O streams have not yet been covered in this tutorial series, we can use the NVIDIA GPU Computing SDK version 3.1 sample simpleMultiCopy to show how Parallel Nsight handles codes with complex asynchronous behavior.

The following steps were used to build this SDK example:

  1. Download the Windows version of the CUDA 3.1 SDK. It can be found here.
  2. Run the executable, which will install the SDK examples in C:\ProgramData\NVIDIA Corporation.
  3. Copy the SDK folder NVIDIA GPU Computing SDK to one of your folders.
  4. Change to the folder NVIDIA GPU Computing SDK\C\src\simpleMultiCopy.
  5. Double-click on the simpleMultiCopy_vc90.sln icon. The Visual Studio Conversion Wizard will appear to create a version of this solution that can be used with Parallel Nsight.
  6. Build the project.
    • Don’t forget to set the Nsight User Properties | Connection Name to specify your remote machine! Note that the connection name may also be set on the Activity page itself.
    • Analysis does not necessarily require a remote connection. The name localhost can be used for Connection Name so long as the monitor is installed on the local machine.

  7. Be certain that the monitor is running. Depending on if the SDK is installed on the target machine or not, it might be necessary to copy some DLLs so the executable can run.

Now run the executable using the analyzer. From the File menu on the top toolbar select New | File …| and any of the options in the NVIDIA selection as shown in the screen below:

[Click image to view at full size]

Once an option is selected (in this case Trace All), the following screen appears:

[Click image to view at full size]

Scrolling down, it is clear there is a wealth of options from which to choose.

[Click image to view at full size]

Click on the Launch button. When the program pauses at the end, press Enter on the target machine keyboard to terminate the simpleMultiCopy application (or click the Kill button). Once Parallel Nsight has retrieved the trace from the target machine, a summary report will appear. The Capture Control icon will change from red, to yellow (indicating data is being transferred), to green.

As can be seen in Trace timeline below, Parallel Nsight provides a tremendous amount of information that is easily accessible via mouseover and zooming operations as well as various filtering operations. Given the volume of information available in these traces, it is essential to know that regions of the timeline can be selected by clicking the mouse on the screen at a desired starting point of time. A vertical line will appear on the screen. Then press the Shift key and move the mouse (with the button pressed) to the end region of interest. This will result in a grey overlay as shown below. A nice feature is that the time interval for the region is calculated and displayed.

[Click image to view at full size]

We see that it took a short while, about 0.02884 seconds for the asynchronous transfers to get started and a somewhat longer interval for all four streams to really start moving data. Clicking within the grey region will zoom the display to show the just the selected time interval. This makes it very convenient to select and zoom into intervals in the timeline. Other useful controls (that are consistent with typical timeline interfaces in CAD and audio software) are:

  • Ctrl + Mousewheel: smoothly zoom into or out of the timeline.
  • Ctrl + Drag: pan around in the timeline.

General workflow tips when using Parallel Nsight: the Application or System Trace options can be used to determine if the application is CPU bound, memory bound, or kernel bound. This can be done by looking at the Timeline.

  1. CPU bound. There will be large areas where no kernel or memory copy is occurring but the application threads (Thread State) is Green
  2. PCIe transfer limited. Kernel execution is blocked while waiting on memory transfers to or from the device. This can be seen by looking at the Memory row. If much time is being spent doing memory copies then consider using the Streams API to pipeline the application, which can overlap memory transfers and kernels. Before changing code, compare the duration of the transfers and kernels to ensure a performance gain will be realized.
  3. Kernel bound. If the majority of the application time is spent waiting on kernels to complete then switch to the "Profile CUDA" activity, re-run the application, to collect information from the hardware counters. This can help guide how to optimize kernel performance.

Zooming into a region of the timeline view allows Parallel Nsight to provide the names of the functions and methods as sufficient space becomes available in each region. This really helps the readability of the traces.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.