In CUDA, Supercomputing for the Masses Part 18, I demonstrated how to achieve very high rendering and compute performance by mixing CUDA and OpenGL in the same program and through the use of primitive restart, an OpenGL extension CUDA programmers can exploit to bypass PCIe bottlenecks to increase rendering performance by nearly 100 frames per second. This is the first of a two-part article that focuses on how to use NVIDIA's highly anticipated Visual Studio based Parallel Nsight debugging and profiling environment for Microsoft Windows to create and profile applications. Specifically, this article discusses the thinking behind Parallel Nsight; how to install and configure the software; plus walk through the steps to create a CUDA project from scratch and debug it. The example code from Part 14 that was used to demonstrate cuda-gdb will be built with Visual Studio and debugged with Parallel Nsight. The next article uses the Parallel Nsight 1.0 analysis capabilities to compare the Part 18 primitive restart OpenGL example with more conventional OpenGL rendering methods.
Regular readers of this series will note that the use of Visual Studio represents a departure from the previous articles in this tutorial series, which utilized the Linux tool chain to edit and create CUDA applications. With the release of Parallel Nsight, NVIDIA has made a commitment to the debugging and profiling needs of a huge base of Microsoft Windows developers ranging from game developers to commercial High Performance Computing (HPC) users. In addition to CUDA, Parallel Nsight also provides developers the ability to analyze and debug HLSL textures plus OpenCL application tracing is also supported. All the features discussed in this article are part of the standard version that is available without charge. The analysis features require the professional version, which must be purchased.
The Thought Behind Parallel Nsight
Parallel Nsight was designed from the very beginning to fully support remote debugging. Instead of building an application and running it under a debugger, Parallel Nsight utilizes a host machine to build the executable with Visual Studio 2008 service pack 1 or higher. (Visual Studio 2010 support is coming soon.) The executable is then exported to a target machine via the Parallel Nsight monitor process, which runs the application and handles debugging and tracing operations. Some form of network connection is assumed between the host and target machine(s) be they virtual or real network links. Of course, a remote execution model fits naturally into debugging applications for both cloud and cluster environments. Secure connections are supported, which opens up interesting possibilities for remote application debugging and profiling at customer sites.
Rather than relying on specific hardware provided debugging capabilities, Parallel Nsight takes the different approach of generating code that is patched into the executable. The benefit is that only those debugging and performance tracing capabilities requested by the user are introduced when and where desired. This actually is an extremely flexible model that can minimally impact the application(s) running on the monitor machine. Future capabilities can be added by the Parallel Nsight team as needed and arbitrary hardware capabilities (e.g. new counters and signals) can be exploited in newer generations of hardware to reduce overhead and provide greater insight into application behavior. Basically, Parallel Nsight is designed to give developers and hardware designers the flexibility needed to best meet future needs -- however unanticipated they might be at the current time.
As is to be expected, there is a fair amount of integration with the Visual Studio GUI. This is nice as the mouse can be used to click sections of code to set or remove breakpoints. Mouse-overs work and the scroll wheel can be used to zoom in on application traces to see finer details.
Parallel Nsight provides three basic capabilities:
- Debugging CUDA kernels. Similar to cuda-gdb, Parallel Nsight lets you set breakpoints, examine variables on the GPU and check for memory errors. This video demonstrates the debugging capabilities in action.
- Tracing CUDA applications so the programmer can understand where and how the application is spending time be it in the operating system, calculating with CUDA, transferring data, or working on the host processors. Check out this video to get a sense of the tracing and analysis capabilities.
- Shader debugging. Parallel Nsight 1.0 supports debugging Direct3D HLSL shaders as in this video at this URL. (Note that debugging OpenGL shaders is not supported at this time.) To debug an HLSL shader, the user clicks on Start Graphics Debugging and then uses either the shaders toolwindow to open a shader or use Pixel History to work backward from a render target to a particular shader. In other words, it is possible to see geometry for every draw call, watch the frame getting built up, and review the pixel history by clicking on a render target's pixel to show all the draw calls that touched that particular pixel! However, debugging shaders will not be discussed further in this article.
Of course, Visual Studio provides C/C++ debugging as well. Please note that CUDA debugging, shader debugging, and trace analysis are considered as separate operations. This means that Parallel Nsight 1.0 does not have the ability to have C/C++ breakpoints and CUDA C/C++ breakpoints both hit at the same time. Instead, the debugging session must be redone using either Parallel Nsight or Visual Studio. (A helpful post in the NVIDIA forums can provide some of this mixed breakpoint functionality.)
Information Sources about Parallel Nsight
Following are several excellent sources of information about Parallel Nsight:
- The NVIDIA Parallel Nsight forum is an excellent place to look for information and post questions.
- The Parallel Nsight Developer Zone is the main entry point for Parallel Nsight on the NVIDIA site. It provides links to documentation, videos and webinars.
In addition to Internet resources, Parallel Nsight installs a wealth of documentation into Visual Studio. (An online version of the installed user guide can be found here.) The release notes in particular are important as the Parallel Nsight team is rapidly improving the software and adding capabilities. First time users will find the example projects included in the host install to be the quickest way to start working with Parallel Nsight without requiring any project configuration. Please note that under Windows 7, the C:\ProgramData directory that contains the walkthrough projects is a hidden directory. The Visual Studio document, "Walkthrough: Debugging A CUDA Application" contains more information about using the Parallel Nsight installed samples. It is accessed via the Start Page | Getting Started window | How Do I…? | NVIDIA GPU Development | Parallel Nsight User Guide | Existing Projects and CUDA C Debugging | Walkthroughs.
Starting Microsoft Visual Studio 2008 after installing Parallel Nsight will show a start page similar to the following screenshot. Highlighted in Figure 1 are three key areas that first-time Parallel Nsight users need to be aware of: How do I …?, The Solution Explorer window, and the Nsight tab on the toolbar.
Clicking on the How do I…? link and expanding the Contents window, lets users browse the documents provided by the Parallel Nsight team as can be seen in Figure 2. Documents include walkthroughs based on some projects that were installed along with the documentation. Be aware that the name or address of the target machine must be specified before these walkthroughs can be used. This article, along with the Parallel Nsight documents, describes how to set the remote machine name. The How to Set Build Options link is an extremely important document as Parallel Nsight v1.0 requires the user manually configure many of the options needed to build, package, and run the application on the remote machine.