Concerns and Known Issues
Please be aware of the following concerns and known issues as of the CUDA 2.3 release:
- Word size of the host system is no longer a concern as CUDA-GDB (as of the CUDA 2.2 Beta release) now supports both 32- and 64-bit systems
- There are reports in the forums that CUDA-GDB will sometimes hang. Be aware that this is a complex port of GDB and that NVIDIA appears to be doing a good job in fixing problems
- Any of the following might affect program behavior or performance when using the debugger:
- X11 cannot be running on the GPU that is used for debugging. Suggested workarounds include:
- Remote access to a single GPU (VNC, ssh, etc.)
- Use two GPUs, where X11 is running on only one of the graphics processors
- As of CUDA 2.2, the CUDA driver will automatically exclude the device running X11 from being picked by the application being debugged
- Compiling with the
-Goption causes variables to be spilled to local memory, which can significantly reduce program performance. (As noted in Part 5, local memory can be up to 150x slower than register or shared-memory) - Kernel launches are no longer asynchronous as the debugger enforces blocking kernel launches
- X11 cannot be running on the GPU that is used for debugging. Suggested workarounds include:
- Scope shadowing is not supported. This means that if a variable is introduced in an inner scope that has the same name as a variable in the outer scope, only the outer scope's value can be seen. The AssignArray.cu example demonstrates this restriction
- The debugger must be stopped in the kernel to examine device memory (allocated via
cudaMalloc()) as device memory is not visible outside of the kernel function. - Host memory that was allocated with
cudaMallocHost()is not visible in CUDA-GDB - Multi-GPU applications are not supported
- Not all illegal program behavior can be caught in the debugger, such as out-of-bounds memory accesses or divide-by-zero situations.
- It is not currently possible to step over a subroutine in device code
- These columns focus on using the runtime interface. Any programs using the device driver API cannot be debugged with CUDA-GDB because the device driver API is not supported
Summary
Proven software development strategies such as assertions and regression testing will greatly assist in developing software that is correct and bug free. When porting legacy software, look into the useful mapped memory features that are now available in CUDA. When bugs manifest themselves, CUDA-GDB can be used to track them down. Since GPU problems generally manipulate large amounts of data, artificial array and having a few simple routines in your code that can interactively be called from CUDA-GDB can really speed the debugging process.
For More Information
- CUDA, Supercomputing for the Masses: Part 13
- CUDA, Supercomputing for the Masses: Part 12
- CUDA, Supercomputing for the Masses: Part 11
- CUDA, Supercomputing for the Masses: Part 10
- CUDA, Supercomputing for the Masses: Part 9
- CUDA, Supercomputing for the Masses: Part 8
- CUDA, Supercomputing for the Masses: Part 7
- CUDA, Supercomputing for the Masses: Part 6
- CUDA, Supercomputing for the Masses: Part 5
- CUDA, Supercomputing for the Masses: Part 4
- CUDA, Supercomputing for the Masses: Part 3
- CUDA, Supercomputing for the Masses: Part 2
- CUDA, Supercomputing for the Masses: Part 1
Rob Farber is a senior scientist at Pacific Northwest National Laboratory. He has worked in massively parallel computing at several national laboratories and as co-founder of several startups. He can be reached at rmfarber@gmail.com.


