Nvidia has updated the CUDA 5 pervasive parallel computing platform and programming model in a production release version. Targeted at scientific and engineering applications driven by GPU acceleration technology, the company's developer zone has now seen more than 1.5 million free downloads.
CUDA has been popular up until now, so Nvidia almost certainly knew that it would need to bring something new to the table to ignite developer interest in the technology. As such, new support for dynamic parallelism has been included along with GPU-callable libraries, the firm's own GPUDirect technology support for RDMA (remote direct memory access), and the Nvidia Nsight Eclipse Edition integrated development environment (IDE).
The firm cites approval from an unnamed pool of developers who it says have witnessed "dramatic application acceleration and improved programmability" with the pre-release version of CUDA 5.
With GPU-accelerated applications in use in the defense and aerospace industries, this type of programming model is capable of processing images, video, and "sensor data" such as radar. One customer reports success with streaming sensor data directly into the GPU with low latency using the GPUDirect support for RDMA on new Kepler GPUs.
CUDA 5 has been designed to take advantage of the Nvidia Kepler compute architecture. A technical PDF on the company's website refers to Kepler as a GPU comprised of 7.1 billion transistors to produce a "computational workhorse" with teraflops of integer, single precision and double precision performance, and the highest memory bandwidth.
"GPU threads can dynamically spawn new threads, allowing the GPU to adapt to the data. By minimizing the back and forth with the CPU, dynamic parallelism greatly simplifies parallel programming. [It also] enables GPU acceleration of a broader set of popular algorithms, such as those used in adaptive mesh refinement and computational fluid dynamics applications," said the company.
Expanded Feature Set
Other new features include a CUDA BLAS library to allow developers to use dynamic parallelism for their own GPU-callable libraries. They can design plug-in APIs that allow other developers to extend the functionality of their kernels, and allow them to implement callbacks on the GPU to customize the functionality of third-party GPU-callable libraries.
The "object linking" capability provides an efficient and familiar process for developing large GPU applications by enabling developers to compile multiple CUDA source files into separate object files, and link them into larger applications and libraries.
GPUDirect technology enables direct communication between GPUs and other PCI-E devices and supports direct memory access between network interface cards and the GPU. It also significantly reduces MPISendRecv latency between GPU nodes in a cluster and improves overall application performance.
NVIDIA Nsight Eclipse Edition enables programmers to develop, debug, and profile GPU applications within the Eclipse-based IDE on Linux and Mac OS X platforms. An integrated CUDA editor and CUDA samples speed the generation of CUDA code, and automatic code refactoring enables easy porting of CPU loops to CUDA kernels.
An integrated expert analysis system provides automated performance analysis and step-by-step guidance to fix performance bottlenecks in the code, while syntax highlighting makes it easy to differentiate GPU code from CPU code.