Channels ▼
RSS

Tools

Debugging 256 GPU Threads in Visual Studio 2013


As happened with the example in the previous article, the GPU Threads window provides valuable information that enables us to understand the C++ AMP code being debugged. Click on the Expand Thread Switcher button located at the upper-left corner and a new panel will display both the Tile and the Thread coordinates that are active in the debugger. In addition, the GPU Threads window will always display the valid coordinate ranges for both the Tile and the Thread. In this case, the valid range for the Tile coordinates is [0..63, 0..63], and the valid range for the Thread coordinates is [0..15, 0..15]. Figure 1 shows Tile[0, 0] Thread[0, 0] as the active thread and the information about the coordinate ranges. You can also use the Parallel Watch window to freeze and thaw GPU threads as you are used to with CPU threads.

Evaluating Expressions for Each GPU Thread in the Parallel Watch Window

The Parallel Watch window allows you to simultaneously display the values that one expression holds on multiple GPU threads. You just need to click on the <Add Watch> column and enter the expression. For example, you can add the following expressions as columns in the Parallel Watch window:

  • sum
  • tiled_idx.global
  • row
  • col
  • i
  • j
  • tile_static_a[row][col]
  • tile_static_b[row][col]

This way, you can set a breakpoint at the line "tiled_idx.barrier.wait()" and execute until the debugger stops at this breakpoint many times. You will be able to see how the watches display the values for the expressions in the different GPU threads (Figure 2).

C++AMPPart2
Figure 2: The Parallel Watch 1 window displaying the values that each expression holds on the different GPU threads.

With the row and col values evaluated for each thread, you can easily identify what piece of data each thread is working on. You will notice that the code takes some time to execute because the GPU software emulator allows you to work with four threads (and evaluating so many variables for 256 threads consumes CPU resources). However, each Visual Studio update and GPU device driver might bring new features, so it is always wise to click on the "Dump statistics to Output window" button located at the right-hand side of the top of the GPU Threads window. In this case, the Output window will display the features of the GPU software emulator. Notice the information about the grid dimensions, group dimensions, active groups, completed groups, and not started groups.

GPU Device Created.
'cppamp.exe' (GPU Device): 
  Loaded 'C:\Users\gaston\Documents\Visual Studio 2013\Projects\cppamp\
    Debug\cppamp.exe'. Symbols loaded.
Information for 'cppamp.cpp_line_57' kernel on device 'DirectX Reference 
    Rasterizer' with warp size 4:
Grid Dimensions : 64x64x1
Group Dimensions : 16x16x1
Shared Memory Usage per Group : 2048 bytes
Register Usage per Thread : 2064 bytes
Active Groups : 1
Completed Groups : 0
Not Started Groups : 4095

Now, set a breakpoint at the line "int row = tiled_idx.local[0]" and execute the application until the debugger stops at this breakpoint. Click on the"Dump statistics to Output window" button located at the top-right of the GPU Threads window. In this case, the Output window will indicate that one group has been completed and that the number of not started groups is 4094 (4095 - 1):

Active Groups : 1
Completed Groups : 1
Not Started Groups : 4094

As you can see from the information, the execution has moved to Tile[0, 1], so Tile[0, 0] (also known as the first group) has completed its execution. If you continue the execution and take a look at the Parallel Watch 1 window (Figure 3), you will see 256 threads listed for Tile[0, 1]. Obviously, it would be a bit complex to analyze the information provided by the Parallel Watch 1 window for 256 GPU threads within Visual Studio. If you are a Microsoft Excel user, you can click on the Open in Excel button at the top of the Parallel Watch 1 window and use Excel to analyze the snapshot for all the evaluated expressions in each thread. If aren't a Microsoft Excel user, you can click on the dropdown, select the Export to CSV option, and use your favorite application to analyze the contents of the exported data.

C++AMPPart2
Figure 3: The Parallel Watch 1 window displaying the evaluated expressions for the 256 threads related to Tile[0, 1].

The code has two calls to the tiled_idx.barrier.wait() method that block the execution of all the threads in a tile until all the threads within that tile have reached the call. You can use the GPU Threads window to visualize the blocked and active threads. Figure 4 shows the GPU Threads window displaying 24 GPU threads that are blocked in the call to tiled_idx.barrier.wait(). You can click on the flag icon at the left-hand side of the Thread Count column for the blocked threads, and the Parallel Watch 1 window will flag the 24 blocked threads in the grid. This way, you can easily identify the 24 Tile and Thread coordinates that are blocked and switch to them by using the grid in the Parallel Watch 1 window.

C++AMPPart2


Figure 4: The GPU Threads window displaying 24 GPU threads that are blocked in the call to tiled_idx.barrier.wait().

Conclusion

Visual Studio 2012 and 2013 added useful enhancements that make it possible to understand what happens in GPU kernels, even when they launch dozens and dozens of GPU threads. If you take advantage of the debugging features, you will be able to optimize your algorithms and resolve many defects.


Gaston Hillar is a frequent contributor to Dr. Dobb's.

Related Article

Debugging GPU Code in Microsoft C++ AMP


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video