Channels ▼
RSS

Parallel

Easy GPU Parallelism with OpenACC


How To Try Out OpenACC

An extended 30-day trial license for the PGI software can be obtained by registering with NVIDIA. The Portland Group also provides a free 15 day OpenACC trial license, which can be obtained by following the following three steps:

1. Download any of the available software packages for your operating system.
2. Review the PGI Installation Guide [PDF] or the PGI Visual Fortran Installation Guide [PDF] and configure your environment.
3. Generate the trial license keys. Note the trial keys and all executable files compiled using them will cease operating at the end of the trial period.

The following set of examples multiply two matrices a and b and store the result in matrix c. They utilize a useful set of basic OpenACC data transfer, parallelization, and memory creation/access clauses. A C-language OpenMP matrix multiply is also provided to show the similarity between OpenACC and OpenMP and provide CPU and GPU performance comparisons. While the PGI matrix multiplication performance is good, please look to the highly optimized BLAS (Basic Linear Algebra Subroutines) packages such as CUBLAS and phiGEMM for production GPU and hybrid CPU + GPU implementations.

Following is our first OpenACC program, matix-acc-check.c. This simple code creates a static set of square matrices (a,b,c,seq), initializes them, and then performs a matrix multiplication on the OpenACC device. The test code then performs the matrix multiplication sequentially on the host processor and double-checks the OpenACC result.

/* matrix-acc-check.c */
#define SIZE 1000
float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];
float seq[SIZE][SIZE];

int main()
{
  int i,j,k;
  
  // Initialize matrices.
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      a[i][j] = (float)i + j;
      b[i][j] = (float)i - j;
      c[i][j] = 0.0f;
    }
  }
  
  // Compute matrix multiplication.
#pragma acc kernels copyin(a,b) copy(c)
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      for (k = 0; k < SIZE; ++k) {
	c[i][j] += a[i][k] * b[k][j];
      }
    }
  }

  // ****************
  // double-check the OpenACC result sequentially on the host
  // ****************
  // Initialize the seq matrix
  for(i = 0; i < SIZE; ++i) 
    for(j = 0; j < SIZE; ++j) 
      seq[i][j] = 0.f;
  
  // Perform the multiplication
  for (i = 0; i < SIZE; ++i) 
    for (j = 0; j < SIZE; ++j) 
      for (k = 0; k < SIZE; ++k) 
	seq[i][j] += a[i][k] * b[k][j];
  
  // check all the OpenACC matrices
  for (i = 0; i < SIZE; ++i)
    for (j = 0; j < SIZE; ++j)
      if(c[i][j] != seq[i][j]) {
	printf("Error %d %d\n", i,j);
	exit(1);
      }
  printf("OpenACC matrix multiplication test was successful!\n");
  
  return 0;
}

Example 1: matrix-acc-check.c source code.

The OpenACC pragma tells the compiler the following:

  • #pragma acc: This is an OpenACC pragma.
  • kernels: A kernels region.
    No jumps are allowed into/out of the kernels region.
    Loops will be sent to the OpenACC device.
    The scope of the kernels region code block is denoted by the curly brackets in a C program.

  • copyin(): copy the contiguous region of memory from the host to the device.
    The variables, arrays or subarrays in the list have values in the host memory that need to be copied to the device memory.
    If a subarray is specified, then only that subarray of the array needs to be copied.
  • copy(): copy the contiguous memory region from the host to the device and back again.
    The variables, arrays or subarrays in the list have values in the host memory that need to be copied to the device memory.
    If a subarray is specified, then only that subarray of the array needs to be copied.
    The data is copied to the device memory before entry to the kernles region, and data copied back to the host memory when the code block is complete.

The source code is compiled with the pgcc compiler and a successful test is indicated after the application runs as shown below:

pgcc -acc -fast -Minfo matrix-acc-check.c -o matrix-acc-check
./matrix-acc-check
OpenACC matrix multiplication test was successful!

The source code for matrix-acc.c was created by removing the italicized code from matric-acc-check.c to simplify the following discussion.

/* matrix-acc.c */
#define SIZE 1000
float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];

int main()
{
  int i,j,k;
  
  // Initialize matrices.
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      a[i][j] = (float)i + j;
      b[i][j] = (float)i - j;
      c[i][j] = 0.0f;
    }
  }
  
  // Compute matrix multiplication.
#pragma acc kernels copyin(a,b) copy(c)
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      for (k = 0; k < SIZE; ++k) {
	c[i][j] += a[i][k] * b[k][j];
      }
    }
  }
  return 0;
}

Example 2: matrix-acc.c source code.

Note the similarity between matrix-acc.c and the following OpenMP implementation, matrix-omp.c. Only the pragmas are different as the OpenACC pragma includes copy operations that are not required in the OpenMP implementation.

/* matrix-omp.c */
#define SIZE 1000
float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];

int main()
{
  int i,j,k;
  
  // Initialize matrices.
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      a[i][j] = (float)i + j;
      b[i][j] = (float)i - j;
      c[i][j] = 0.0f;
    }
  }
  
  // Compute matrix multiplication.
#pragma omp parallel for default(none) shared(a,b,c) private(i,j,k)
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      for (k = 0; k < SIZE; ++k) {
	c[i][j] += a[i][k] * b[k][j];
      }
    }
  }
  return 0;
}

Example 3: matrix-omp.c source code.

Fortran programmers will find the corresponding source code in Example 4. Again, the OpenACC pragmas annotate data movement with the copy() and copyin() clauses. Note that the C-based pragmas know the extent of the code block due to the use of curly brackets while the Fortran version must explicitly specify the end of the scope of the pragma with "!$acc end …".


!     matrix-acc.f
      program example1 
      parameter ( n_size=1000 )
      real*4, dimension(:,:) :: a(n_size,n_size) 
      real*4, dimension(:,:) :: b(n_size,n_size) 
      real*4, dimension(:,:) :: c(n_size,n_size) 

!     Initialize matrices (values differ from C version)
      do i=1, n_size 
         do j=1, n_size 
            a(i,j) = i + j;
            b(i,j) = i - j;
            c(i,j) = 0.;
         enddo 
      enddo 

!$acc data copyin(a,b) copy(c) 
!$acc kernels loop 
!     Compute matrix multiplication.
      do i=1, n_size 
         do j=1, n_size 
            do k = 1, n_size
               c(i,j) = c(i,j) + a(i,k) * b(k,j)
            enddo 
         enddo 
      enddo 
!$acc end data
      end program example1

Example 4: matrix-acc.f source code.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video