Summary
As this article has shown, the analysis capabilities of NVIDIA's Parallel Nsight v1.0 are powerful indeed! Still, this tutorial has barely scratched the surface plus it is difficult to demonstrate how interactive Parallel Nsight makes the analysis process. To get a better sense of the capabilities of this software, please view some of the online video demonstrations or better yet, download the trial version and try it for yourself.
This article also provides an example hybrid CPU/GPU code that can concurrently share a computation between both a host system processing core and a GPU. Through the Parallel Nsight traces and reports, it should also be clear why primitive restart is such a fast rendering technique for applications that mix graphics and CUDA.
It is also worth noting that Parallel Nsight is an evolving project, with Parallel Nsight 1.5 coming soon with additional improvements to Analysis and NVTX. You should be aware of the following when using version 1.0 of Parallel Nsight for analysis:
- Asynchronous data transfers when using mapped pinned memory do not appear in the application traces.
- It is unclear if OpenMP will be supported.
- Call
cudaThreadExit()at the end ofmain()to ensure that the Parallel Nsight receives all the trace information from short-lived applications like test programs. If the application exit behavior is complicated, a fallback solution is to usecudaThreadSynchronize()orcudaThreadExit()in anatexit()registered function.
- CUDA, Supercomputing for the Masses: Part 20
- CUDA, Supercomputing for the Masses: Part 19
- CUDA, Supercomputing for the Masses: Part 18
- CUDA, Supercomputing for the Masses: Part 17
- CUDA, Supercomputing for the Masses: Part 16
- CUDA, Supercomputing for the Masses: Part 15
- CUDA, Supercomputing for the Masses: Part 14
- CUDA, Supercomputing for the Masses: Part 13
- CUDA, Supercomputing for the Masses: Part 12
- CUDA, Supercomputing for the Masses: Part 11
- CUDA, Supercomputing for the Masses: Part 10
- CUDA, Supercomputing for the Masses: Part 9
- CUDA, Supercomputing for the Masses: Part 8
- CUDA, Supercomputing for the Masses: Part 7
- CUDA, Supercomputing for the Masses: Part 6
- CUDA, Supercomputing for the Masses: Part 5
- CUDA, Supercomputing for the Masses: Part 4
- CUDA, Supercomputing for the Masses: Part 3
- CUDA, Supercomputing for the Masses: Part 2
- CUDA, Supercomputing for the Masses: Part 1
Rob Farber is a senior scientist at Pacific Northwest National Laboratory. He has worked in massively parallel computing at several national laboratories and as co-founder of several startups. He can be reached at rmfarber@gmail.com


