Nvidia has reached the production release stage of its CUDA 6.5 GPU-accelerated parallel computing platform and programming model.
Available as a free download, version 6.5 of the CUDA Toolkit supports 64-bit ARM platforms (it also supports x86 CPU-based systems) for compute-intensive high-performance computing (HPC), advanced scientific, and enterprise datacenter workloads.
Additional performance and productivity features of the CUDA 6.5 platform include support for Microsoft Visual Studio 2013. It also expands host compiler support to include Microsoft Visual Studio 2013 for Windows.
Also in this release, cuFFT callbacks capability is intended to deliver higher performance custom processing (on either input or output data) by enabling programmers to specify callback functions that manipulate data in GPU memory before and during FFT processing.
Improved debugging for CUDA Fortran applications (preview) is also here; this includes new debugging support for Fortran arrays (Linux only), improved source-to-assembly code correlation, and improved documentation.
Nvidia is also happy to tell us about Application Replay mode, which is said to enable faster analysis of complex scenarios using multiple hardware counters. The firm also calls out the updated CUDA Occupancy Calculator API — intended to free programmers from having to manually configure kernel launches for each GPU architecture.
A new "nvprune" utility exists here that prunes object files and libraries to only contain the device code needed for the specified target architectures, reducing application size and improving load-time performance.