Channels ▼
RSS

JVM Languages

CUDA, Supercomputing for the Masses: Part 9


Introducing SWIG

An excellent software development tool that connects modules written in C and C++ to a wide variety of high-level programming languages is SWIG which supports Perl, PHP, Python, Tcl, Java, C#, Common Lisp, Octave, R and many more (see www.swig.org/compat.html#SupportedLanguages for more languages.

Here are some links to get you started for three common languages. Check out the web for your favorite if not listed below:

The following is a simple Python example, contributed by a colleague at NVIDIA, which demonstrates the simplicity and speed of calling a CUDA kernel from Python. This example actually implements a useful method for financial applications -- namely matrix exponentiation. Unfortunately, the reasoning behind why such a method is useful is beyond the scope of this article. See the discussion starting on page 19 in the paper at http://arxiv.org/pdf/0710.1606 for more information. Be forewarned, this paper is quite dense.

In the spirit of this article, this example module makes efficient use of the GPU. The reason it performs so well is because this module lets Python programmers call SGEMM, a high flop per data item level-3 BLAS routine in the NVIDIA CUBLAS library. It also demonstrates that it is possible to map variables -- in this case an array -- very efficiently between Python and CUDA.

The full listing for the Python code exponentiationTest.py is:

#! /usr/bin/env python

import copy
import numpy
import FastMatrixExp

# Read input matrix using a user defined function
a = myInputReader()
b = copy.copy(a)

steps = 100

# Matrix exponentiation using CPU SGEMM
for i in range(steps):
  a = numpy.dot(a,a)

# Matrix exponentiation using CUBLAS SGEMM
FastMatrixExp.matrixMulLoop([steps,b])

numpy.testing.assert_array_almost_equal(a, b, decimal = 6)
print 'Error = %f' % numpy.linalg.norm(a-b)

Within the exponentiationTest.py, a custom module is imported with the line:


import FastMatrixExp
 

The reader is required to define its own Python method to input a matrix into variable a, which is then duplicated in variable b for purposes of comparing the speed and accuracy of the CPU and GPU:


# Read input matrix using a user defined function
a = myInputReader()
b = copy.copy(a)

Matrix a is then raised to the power specified in the variable steps (specifically 100) on the host processor with this code snippet:


steps = 100

# Matrix exponentiation using CPU SGEMM
for i in range(steps):
  a = numpy.dot(a,a)

After which the SGEMM routine from the CUBLAS library is called from Python and utilized on the GPU to perform the matrix exponentiation with the following:


# Matrix exponentiation using CUBLAS SGEMM
FastMatrixExp.matrixMulLoop([steps,b])

Both the GPU and CPU generated results are then checked to see if they are equal within a reasonable tolerance via a numpy comparision as seen below. (Numpy is an excellent numerical Python package that has matrix operations.


numpy.testing.assert_array_almost_equal(a, b, decimal = 6)
print 'Error = %f' % numpy.linalg.norm(a-b)

The following is the SWIG interface code:

%module FastMatrixExp

%header
%{
#include <oldnumeric.h>
#include <cublas.h>
%}

%include exception.i

/* Matrix multiplication loop for fast matrix exponentiation. */

%typemap(python,in) (int steps, float *u, int n)
{
  $1 = PyInt_AsLong(PyList_GetItem($input,0));
  $2 = (float *)(((PyArrayObject *)PyList_GetItem($input,1))->data);
  $3 = ((PyArrayObject *)PyList_GetItem($input,1))->dimensions[0];
}

extern void matrixMulLoop(int steps, float *u, int n);

%{
void matrixMulLoop(int steps, float *u, int n)
{
  int i;
  float *ud;
  cublasStatus status;

  /* Allocate memory and copy u to the device. */
  status = cublasAlloc(n*n, sizeof(float), (void **)&ud); 
  status = cublasSetMatrix(n, n, sizeof(float), (void *)u,n, (void *)ud, n);

  /* Do "steps" updates. */
  for(i=0; i<steps; i++)
    cublasSgemm('n','n',n,n,n,1.0f,ud,n,ud,n,0.0f,ud,n);

  /* Copy u back to the host and free device memory. */
  status = cublasGetMatrix(n, n, sizeof(float), (void *)ud,n, (void *)u, n);
  status = cublasFree((void *)ud);
}
%}

%init
%{
  import_array(); 
  cublasStatus status;
  status = cublasInit();
%}

The module name, FastMatrixExp, is defined in the first line of CUBLAS.i:

#module FastMatrixExp

The iterated calls to cublasSgemm occur in the following C subroutine, which is defined between the %{ and %} for SWIG:

%{
void matrixMulLoop(int steps, float *u, int n)
{
  int i;
  float *ud;
  cublasStatus status;

  /* Allocate memory and copy u to the device. */
  status = cublasAlloc(n*n, sizeof(float), (void **)&ud); 
  status = cublasSetMatrix(n, n, sizeof(float), (void *)u,n, (void *)ud, n);

  /* Do "steps" updates. */
  for(i=0; i<steps; i++)
    cublasSgemm('n','n',n,n,n,1.0f,ud,n,ud,n,0.0f,ud,n);

  /* Copy u back to the host and free device memory. */
  status = cublasGetMatrix(n, n, sizeof(float), (void *)ud,n, (void *)u, n);
  status = cublasFree((void *)ud);
}
%}

To gain a greater understanding of the remaining parts of the SWIG file, I recommend consulting the SWIG documentation. You can also find out more about SWIG in David Beazley's article SWIG and Automated C/C++ Scripting Extensions, and Daniel Blezek's article Rapid Prototyping with SWIG.

For more advanced numerical packages that combine Python and CUDA, checkout pystream or GPUlib (which can be downloaded after submitting an email request).


Rob Farber is a senior scientist at Pacific Northwest National Laboratory. He has worked in massively parallel computing at several national laboratories and as co-founder of several startups. He can be reached at rmfarber@gmail.com.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video