Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Easy GPU Parallelism with OpenACC

This is the first in a series of articles by Rob Farber on OpenACC directives, which enable existing C/C++ and Fortran code to run with high performance on massively parallel devices such as GPUs. The magic in OpenACC lies in how it extends the familiar face of OpenMP pragma programming to encompass coprocessors. As a result, OpenACC opens the door to scalable, massively parallel GPU — accelerating millions of lines of legacy application code without requiring a new language such as CUDA or OpenCL, or fork application source tree to support multiple languages

OpenACC is a set of standardized, high-level pragmas that enables C/C++ and Fortran programmers to utilize massively parallel coprocessors with much of the convenience of OpenMP. A pragma is a form of code annotation that informs the compiler of something about the code. In this case, it identifies the succeeding block of code or structured loop as a good candidate for parallelization. OpenMP is a well-known and widely supported standard that defines pragmas programmers have used since 1997 to parallelize applications on shared memory multicore processors. The OpenACC standard has generated excitement because it preserves the familiarity of OpenMP code annotation while extending the execution model to encompass devices that reside in separate memory spaces. To support coprocessors, OpenACC pragmas annotate data placement and transfer as well as loop and block parallelism.

The success of GPU computing in recent years has motivated compiler vendors to extend the OpenMP shared memory pragma programming approach to coprocessors. Approved by the OpenACC standards committee in November 2011, the OpenACC version 1.0 standard creates a unified syntax and prevents a "tower of babel" proliferation of incompatible pragmas. Adoption has been rapid by companies such as NVIDIA, PGI (The Portland Group), CAPS Enterprise, and Cray.

Make Your Life Simple

Pragmas and high-level APIs are designed to provide software functionality. They hide many details of the underlying implementation to free a programmer's attention for other tasks.A colleague humorously refers to pragma-based programming as a negotiation that occurs between the developer and the compiler. Note that pragmas are informational statements provided by the programmer to the assist the compiler. This means that pragmas are not subject to the same level of syntax, type, and sanity checking as the rest of the source code. The compiler is free to ignore any pragma for any reason including: it does not support the pragma, syntax errors, code complexity, unresolved (or potentially unresolved) dependencies, edge cases where the compiler cannot guarantee that vectors or matrices do not overlap, use of pointers, and many others. Profiling tools and informational messages from the compiler about parallelization, or an inability to parallelize, are essential to a successful to achieving high performance.

An OpenACC pragma for C/C++ can be identified from the string "#pragma acc" just like an OpenMP pragma can be identified from "#pragma omp". Similarly, Fortran pragmas can be identified by "! $acc". Always ensure that these strings begin all OpenACC (or OpenMP) pragmas. Moreover, it is legal to mix OpenMP, OpenACC, and other pragmas in a single source file.

OpenACC Syntax

OpenACC provides a fairly rich pragma language to annotate data location, data transfer, and loop or code block parallelism. The syntax of OpenACC pragmas (sometimes referred to as OpenACC directives) is:

  • C/C++: "#pragma acc directive-name [clause [[,] clause]…] new-line"
  • Fortran: "!$acc directive-name [clause [[,] clause]…] new-line"

OpenACC pragmas in C/C++ are somewhat more concise than their Fortran counterparts as the compiler can determine a code block from the curly bracket "{}" notation. The OpenACC specification also requires that the _OPENACC preprocessor macro be defined when compiling OpenACC applications. This macro can be used for the conditional compilation of OpenACC code. The _OPENACC macro name will have a value yyyymm where yyyy is the year and mm is the month designation of the version of the OpenACC directives supported by the implementation.

Table 1 shows a list of OpenACC version 1.0 pragmas and clauses.

!$acc kernels !$acc parallel !$acc data !$acc loop !$acc wait
#pragma acc kernels #pragma acc parallel #pragma acc data #pragma acc loop #pragma acc wait
Clauses Clauses Clauses Clauses
if() if()

if() collapse()
async() async() async() within kernels region
copy() num_gangs() gang()
copyin() num_workers() worker()
copyout() vector_length() vector()
create() reduction() seq()
present() copyin() copyin() private()
present_or_copy() copyout() copyout() reduction()
present_or_copyin() create() create()
present_or_copyout() present() present()
present_or_create() present_or_copy() deviceptr() in .c
deviceptr() present_or_copyin() deviceptr() in .f

Table 1. Currently supported OpenACC pragmas.

Two OpenACC environment variables, ACC_DEVICE_TYPE and ACC_DEVICE_NUM can be set by the user:

  • ACC_DEVICE_TYPE: Controls the default device type to use when executing accelerator parallel and kernels regions, when the program has been compiled to use more than one different type of device.
    The allowed values of this environment variable are implementation-defined.
    Examples include ACC_DEVICE_TYPE=NVIDIA.
  • ACC_DEVICE_NUM: Specifies the default device number to use when executing accelerator regions. The value of this environment variable must be a nonnegative integer between zero and the number of devices of the desired type attached to the host.
    If the value is zero, the implementation-defined default is used.
    If the value is greater than the number of devices attached, the behavior is implementation-defined.
    On multi-GPU systems, this variable will avoid the TDR (Timeout Detection and Recovery) watchdog reset for long-running GPU applications by running on the GPU that is not used for the display. (Consult the vendor driver information to see how to modify the TDR time for your operation system.

In addition, OpenACC provides several runtime routines: acc_get_num_devices(), acc_set_device_type(), acc_get_device_type(), acc_set_device_num(), acc_get_device_num(), acc_async_test(), acc_async_test_all(), acc_async_wait(), acc_async_wait_all(), acc_init(), acc_shutdown(), acc_on_device(), acc_malloc(), acc_free().

Vendor specific information can be found on the Nvidia, PGI, CAPS, and Cray websites.

Building, Running and Profiling a First Program

This tutorial uses The Portland Group (PGI) Accelerator C and Fortran compilers release 12.5 with OpenACC support. PGI has been deeply involved in developing pragma-based programming for coprocessors since 2008, plus they are a founding member of the OpenACC standards body. The PGI OpenACC compilers currently target NVIDIA GPUs, but it is important to note that OpenACC can support other coprocessors (such as AMD GPUs and Intel MIC) as well. More information about the PGI compilers is available on the company's website.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.