CUBLAS and CUFFT Enhancements
CUBLAS users will be happy that complex numbers are now supported. The release notes indicate the following routines have been added:
- BLAS1
- cublasZaxpy()
- cublasZcopy()
- cublasZswap()
- BLAS2
- cublasDtrmv()
- cublasCtrmv()
- cublasCgemv()
- cublasCgeru()
- cublasCgerc()
- cublasZtrmv()
- cublasZgemv()
- cublasZgeru()
- cublasZgerc()
- BLAS3
- cublasCtrsm()
- cublasCtrmm()
- cublasCsyrk()
- cublasCsymm()
- cublasCherk()
- cublasZtrsm()
- cublasZtrmm()
- cublasZsyrk()
- cublasZsymm()
- cublasZherk()
The site oscarbg.blogspot notes that batched 2D and 3D transforms are supported in CUFFT with the new cufftPlanMany()
API. This is defined in cufft.h, as follows:
cufftResult CUFFTAPI cufftPlanMany(cufftHandle *plan, int rank, int *n, int *inembed, // Unused: pass NULL int istride, // Unused: pass 1 int idist, // Unused: pass 0 int *onembed, // Unused: pass NULL int ostride, // Unused: pass 1 int odist, // Unused: pass 0 cufftType type, int batch);
The arguments are:
*plan
-- The plan is returned here, as for other cufft calls
rank
--The dimensionality of the transform (1, 2 or 3)
*n
-- An array of size [rank], describing the size of each dimension
type
-- Transform type (e.g. CUFFT_C2C), as per other CUFFT calls
batch
-- Batch size for this transform
Return values are as for all other cufftPlan…()
functions. Creating a plan for 1,000 2D double-precision, complex-to-complex transforms of size (128, 256) will look something like the following:
cufftHandle *myplan; cufftPlanMany(myplan, 2, {128, 256}, NULL, 1, 0, NULL, 1, 0, CUFFT_Z2Z, 1000);
Note that for CUFFT 3.0, the layout of batched data must be side-by-side and not interleaved. The inembed
, istride
, idist
, onembed
, ostride
, and odist
parameters are for enabling data windowing and interleaving in a future version.