Graphics processing unit pioneer NVIDIA has announced the latest version of its CUDA toolkit for developing parallel programming-compliant applications using NVIDIA GPUs. The toolkit's central proposition is to build a gateway for more developers to port their applications to GPUs by providing NVIDIA GPUDirect 2.0, a technology that offers support for peer-to-peer communication among GPUs within a single server or workstation -- a feature which the company says will enable easier and faster multi-GPU programming and application performance.
CUDA also features two other main parallel programming-enabling functions. Unified Virtual Addressing (UVA) to provide a single merged-memory address space for the main system memory and the GPU memories, enabling quicker and easier parallel programming -- and second, Thrust C++ template performance primitives libraries, a collection of open source C++ parallel algorithms and data structures intended to ease programming for C++ developers. With Thrust, routines such as parallel sorting are five to one hundred times faster than with Standard Template Library (STL) and Threading Building Blocks (TBB).
The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:
- Multi-thread sharing of GPUs -- Multiple CPU host threads can share contexts on a single GPU, making it easier to share a single GPU by multi-threaded applications.
- MPI integration with CUDA applications -- Modified MPI implementations like OpenMPI automatically move data from and to the GPU memory over Infiniband when an application does an MPI send or receive call.
- Multi-GPU sharing by single CPU thread -- A single CPU host thread can access all GPUs in a system. Developers can then coordinate work across multiple GPUs for tasks such as "halo" exchange in applications.
- New NPP image and computer vision library -- A set of image transformation operations that enable rapid development of imaging and computer vision applications.
A release candidate of CUDA Toolkit 4.0 will be available free of charge beginning March 4, 2011, by enrolling in the CUDA Registered Developer Program at: www.nvidia.com/paralleldeveloper.


