Channels ▼
RSS

Tools

CUDA, Supercomputing for the Masses: Part 14


In CUDA, Supercomputing for the Masses: Part 13 of this article series, I resumed the discussion of "texture memory" began in Part 11 of this series and included information such as the recently introduced CUDA Toolkit 2.2 that added ability to write to global memory on the GPU that has a 2D texture bound to it. This installment focuses on debugging techniques and how CUDA-GDB can be used to effectively diagnose and debug CUDA code -- with an emphasis on how to speed the process when looking through large amounts of data as well as the thread syntax and semantic differences needed to debug kernels while they are running on the GPU.

Upfront, youo need to be aware of two important points about the current state of CUDA-GDB:

  • It only runs on UNIX-based systems. Microsoft users in particular should look to the recently announced Visual Studio debugger, which I'll cover in a future column.
  • X11 cannot be running on the GPU that is used for debugging. Instead use either a multi-GPU system or kill X11 and remotely access the single-GPU system via ssh, VNC, or some other method.

This article also provides an admittedly contrived debugging example. I recommend everyone look though it -- regardless of their previous experience with GDB -- for the following reasons:

  • Everyone new to CUDA-GDB: Example commands demonstrate how to use the new CUDA thread syntax and highlight semantic changes that allow debugging and stepping through kernels running on CUDA-enabled graphics processors.
  • Beginners: This example shows how to set breakpoints (both symbolically and by line number) as well as performing other basic debugging operations including: running the program, stepping execution through the source code line-by-line, continuing program execution, and exiting the debugger.
  • Experts: GPUs are fast, which generally means that debugging a CUDA application requires digging through large amounts of data to find a problem. Putting a little thought into debugging strategies including the application of debugger helper functions and GDB artificial arrays can really save debugging time (and personal sanity).

Essentially CUDA-GDB is a port of the GNU GDB debugger version 6.6. Programmers who have used GDB in the past will find they are already familiar with CUDA-GDB and should look to this article for CUDA specific tips. Newcomers to GDB will be able to use this article to begin debugging their software right away, but should look to one of the many extensive tutorials and references on the Internet to find out more about the comprehensive set of features available within this powerful debugging tool. One obvious starting place is the GNU documentation for GDB. Regardless of skill level, all CUDA developers should at least glance over the latest version of CUDA-GDB: The NVIDIA CUDA Debugger.

Debugging Methods Prior to CUDA-GDB

Prior to the creation of CUDA-GDB, the easiest and least elegant method of debugging a CUDA program was to add print statements to the source code and compile to run in the emulator (initiated by passing the -deviceemu or -device-emulation flag to the nvcc compiler). Since the emulator runs on the host processor (and not on the GPU), the print statements can be compiled and linked so the programmer can examine whatever program values might be important. This awkward method was one of the most unobtrusive ways to see what was going on inside a CUDA program in the early days of CUDA. (This debugging method was discussed way back in Part 1 of this series in April 2008.) I mention this method again because it might -- as a method of last resort -- help someone find bugs in their code. Basically, if you don't trust what the GPU is doing -- try running on the emulator. If the code still fails, you know it is not the GPU. Keep in mind that the emulator does not precisely reproduce what happens on the GPU, which means that bugs and behavior that occur on the GPU (including race conditions) may not happen in the emulated environment.

An alternative method that actually utilizes the GPU and permits examination of GPU calculated results utilizes cudaMemcpy() to transfer any variables of interest from the GPU to a scratch location on the host. Host-based methods (including GDB and/or print statements) can then be used to examine the information in the scratch location to hopefully diagnose the problem. Later in this article, I will use this technique to demonstrate a simple helper debugging function for CUDA-GDB to identify an error when writing to a large CUDA vector.

Mapped Memory and the Importance of Regression Testing

It is worthwhile mentioning at this point that the new mapped memory capability, introduced in the CUDA 2.2 release, provides an important (and convenient) new capability to facilitate regression testing when porting legacy application code to CUDA. Without question, regression testing is an essential software practice. It cannot be emphasized too strongly how important this technique is in creating and verifying correctly working software!

In the case of legacy software, the developer already has a working code base that can be used for comparison with GPU generated results to help identify errors. Mapped memory (discussed in Part 12, "CUDA 2.2 Changes the Data Movement Paradigm") greatly facilitates this process by transparently maintaining a synchronized version of data between both the host and device memory spaces. With care, the programmer can exploit this transparent synchronization to keep the original software functional throughout the entire porting project. As a result, there will be a known working version that can be used to compare all GPU results and intermediate results.

Essentially new CUDA kernels can be incorporated into the legacy code without having to think about explicitly moving data off the host and onto the GPU, which allows easy switching between the new CUDA kernel(s) and corresponding original host code. The new GPU version can then be evaluated on one or more test cases to see if it produces correct results. If an error is identified, then the original host code for that phase of the calculation -- plus intermediate results -- can be used to quickly identify the first appearance of the error in the GPU code. Eventually, enough of the calculation will reside on the GPU so it no longer becomes necessary to maintain synchronization with the host and mapping can be disabled or removed -- thus creating a GPU only version of the legacy code that can run a full-speed without any PCI bottlenecks.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video