Channels ▼
RSS

C/C++

CUDA, Supercomputing for the Masses: Part 16


Using CUDA-GDB Enhancements

Let's put the following example code into a file called assign.cu. Please note that this code contains an out-of-bounds memory error because cudaMalloc() was called with the incorrect number of items: (N-1) rather than N.


#include <stdio.h>
#include <stdlib.h>

// Simple assignment test
#define N 256

__global__ void kernel(unsigned int *data, int n)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if(idx < n) data[idx] = idx;
}

int main(void)
{
    int i;
    unsigned int *d = NULL;
    unsigned int odata[N];

    cudaMalloc((void**)&d, sizeof(int) * (N-1) );

    kernel<<<1, N>>>(d,N);

    cudaMemcpy(odata, d, sizeof(int)*N, cudaMemcpyDeviceToHost);

    // Test to see if the array retrieved from the GPU is correct
    for (i = 0; i < N; i++) {
      if(odata[i] != i) {
	break;
      }
    }
    if(i == N) printf("PASSED\n");
    else printf("FAILED\n");

    cudaFree((void*)d);
    return 0;
}

The following command will build it:


nvcc -g -G assign.cu -o assign

Now start the executable under CUDA-GDB and run it. (Please note that X Windows cannot be running on the GPU that will be used to debug the code.) As can be seen, the program appears to run correctly because it indicates the golden test was passed.


$ cuda-gdb assign
NVIDIA (R) CUDA Debugger
BETA release
Portions Copyright (C) 2008,2009 NVIDIA Corporation
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
(cuda-gdb) r
Starting program: PATHNAME/assign 
[Thread debugging using libthread_db enabled]
[New process 3128]
[New Thread 140278242694928 (LWP 3128)]
PASSED

Program exited normally.
(cuda-gdb) 

Now tell cuda-gdb to check memory accesses by typing the command:


set cuda memcheck on

Rerunning the program under CUDA-GDB now generates a segmentation fault, which indicates there is a memory error in the program.


(cuda-gdb) run
Starting program: PATHNAME/assign 
[Thread debugging using libthread_db enabled]
[New process 3145]
[New Thread 140147342771984 (LWP 3145)]

<u>Program received signal SIGSEGV, Segmentation fault.</u>
[Switching to Thread 140147342771984 (LWP 3145)]
<u>[Current CUDA Thread <<<(0,0),(224,0,0)>>>]</u>
0x0000000000feed18 in kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (
    data=0x101000, n=256) at assign.cu:10
10	  if(idx < n) data[idx] = idx;
(cuda-gdb) 

The CUDA-GDB line (underlined in the previous output) tells us that a thread in the range block 0,0 thread 224-256 manifests the problem.

The command info cuda warp verifies the number of threads in a warp with the underlined value of 32 below:


(cuda-gdb) info cuda device
DEV:  0/1    Device Type: gt200   SM Type: sm_13   SM/WP/LN: 30/32/32   Regs/LN: 128
 … more output

Examining the program should verify the issue is in allocating one too few integers. Double-check this by setting a breakpoint at line 10 in the CUDA kernel and rerun the program.


b 10
Breakpoint 1 at 0xfeea50: file assign.cu, line 10.
(cuda-gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

CUDA-GDB stops at the breakpoint.


Starting program: PATHNAME/assign 
Breakpoint 1 at 0x410bfa: file assign.cu, line 10.
[Thread debugging using libthread_db enabled]
[New process 3597]
[New Thread 140121555752720 (LWP 3597)]
Breakpoint 1 at 0x1575a50: file assign.cu, line 10.
[Switching to Thread 140121555752720 (LWP 3597)]
[Current CUDA Thread <<<(0,0),(224,0,0)>>>]

Breakpoint 1, kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, 
    n=256) at assign.cu:10
10	  if(idx < n) data[idx] = idx;

Now repeatedly step through the program to find when the problem manifests itself. Recall that CUDA-GDB supports stepping GPU code only at the finest granularity of a warp. This means that typing next or step from the CUDA-GDB command line (when in the focus of device code) advances all threads in the same warp as the current thread of focus. (Multiple commands can be issued with s #, where # is some integer value.)


((cuda-gdb) s
[Current CUDA Thread <<<(0,0),(192,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(32,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(64,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(96,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(128,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(160,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
[Current CUDA Thread <<<(0,0),(224,0,0)>>>]
kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x101000, n=256)
    at assign.cu:11
11	}
(cuda-gdb) s
0x00007fc0dee29eb7 in sched_yield () from /lib/libc.so.6
(cuda-gdb) 

From the output, it is clear that a thread in the range 224 – 256 generates the memory fault.

If your Linux shell supports it, please suspend CUDA-GDB in the current window. Then modify the source to replace the (N-1) with N as shown in the corrected source below and rebuild the executable. (Generally, the editor is running in another window so it does not need to be stopped and restarted.)


__global__ void kernel(unsigned int *data, int n)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if(idx < n) data[idx] = idx;
}

int main(void)
{
    int i;
    unsigned int *d = NULL;
    unsigned int odata[N];

    cudaMalloc((void**)&d, sizeof(int) * N );

    kernel<<<1, N>>>(d,N);

    cudaMemcpy(odata, d, sizeof(int)*N, cudaMemcpyDeviceToHost);

    // Test to see if the array retrieved from the GPU is correct
    for (i = 0; i < N; i++) {
      if(odata[i] != i) {
	break;
      }
    }
    if(i == N) printf("PASSED\n");
    else printf("FAILED\n");

    cudaFree((void*)d);
    return 0;
}

Now restart CUDA-GDB and tell it to run the program with the run command. CUDA-GDB indicates that the program has changed as seen below:


run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
<u>'PATHNAME/assign' has changed; re-reading symbols.</u>
Breakpoint 1 at 0x410bfa: file assign.cu, line 10.
Starting program: PATHNAME/assign 
[Thread debugging using libthread_db enabled]
[New process 4276]
[New Thread 139863222687504 (LWP 4276)]
Breakpoint 1 at 0x14fc730: file assign.cu, line 10.
[Switching to Thread 139863222687504 (LWP 4276)]
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

Breakpoint 1, kernel <<<gridDim=(1,1), blockDim=(256,1,1)>>> (data=0x210000, n=256) at assign.cu:10
10	  if(idx < n) data[idx] = idx;
(cuda-gdb) 

Note that the break point at line 10 in the source is still set, so type continue (abbreviated to c in the example below). CUDA-GDB indicates the program exited normally and the program itself indicates it passed the test.


c
Continuing.
PASSED

Program exited normally.
(cuda-gdb) 

Finally, note that cudaMalloc() does not allocate in character (single-byte) sizes. For this reason, a common typographical error (e.g., forgetting the parenthesis by typing N-1) shown below will not be caught even through the incorrect number of bytes were specified in the allocation of d. Both our test program and CUDA-GDB will miss this error problem -- even when memory checking is enabled.


__global__ void kernel(unsigned int *data, int n)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if(idx < n) data[idx] = idx;
}

int main(void)
{
    int i;
    unsigned int *d = NULL;
    unsigned int odata[N];

    cudaMalloc((void**)&d, sizeof(int) * N -1 );

    kernel<<<1, N>>>(d,N);

    cudaMemcpy(odata, d, sizeof(int)*N, cudaMemcpyDeviceToHost);

    // Test to see if the array retrieved from the GPU is correct
    for (i = 0; i < N; i++) {
      if(odata[i] != i) {
	break;
      }
    }
    if(i == N) printf("PASSED\n");
    else printf("FAILED\n");

    cudaFree((void*)d);
    return 0;
}

Again, suspend CUDA-GDB, modify the cudaMalloc() line as shown above, rebuild the executable, and restart CUDA-GDB.

For convenience, delete breakpoint 1 and run the program. Note that the program indicates success and the debugging session completes without a segmentation fault even though memory checking is still enabled.


delete 1
(cuda-gdb) r
'PATHNAME/assign' has changed; re-reading symbols.
Starting program: PATHNAME/assign 
[Thread debugging using libthread_db enabled]
[New process 3988]
[New Thread 139924018698000 (LWP 3988)]
PASSED

Program exited normally.
(cuda-gdb) 

Summary

CUDA 3.0 is a major revision number release that delivers important benefits for C++, OpenCL, CUDA driver API and CUDA runtime API developers that will likely make this a "must install" version. In addition, removal of interoperability barriers between driver and runtime API as well as DirectX and OpenGL creates new opportunities for code development and integration of existing software projects. The next article in this series will discuss how to utilize these expanded capabilities.

As noted, debugging has taken a big step forward in this release. CUDA-GDB in particular has been dramatically improved so that it provides a seamless and consistent debugging experience for both runtime and driver API developers. The addition of memory checking makes CUDA-GDB an even more powerful tool. The inclusion of the cuda-memcheck command-line utility can help automate testing and support field debugging.

For More Information


Rob Farber is a senior scientist at Pacific Northwest National Laboratory. He has worked in massively parallel computing at several national laboratories and as co-founder of several startups. He can be reached at rmfarber@gmail.com.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video