Channels ▼
RSS

Parallel

OpenMP: A Portable Solution for Threading


OpenMP Library Functions

As you may remember, in addition to pragmas, OpenMP provides a set of functions calls and environment variables. So far, only the pragmas have been described. The pragmas are the key to OpenMP because they provide the highest degree of simplicity and portability, and the pragmas can be easily switched off to generate a non-threaded version of the code.

In contrast, the OpenMP function calls require you to add the conditional compilation in your programs as shown below, in case you want to generate a serial version.


#include <omp.h>

#ifdef _OPENMP
    omp_set_num_threads(4);
#endif

When in doubt, always try to use the pragmas and keep the function calls for the times when they are absolutely necessary. To use the function calls, include the <omp.h> header file. The compiler automatically links to the correct libraries.

The four most heavily used OpenMP library functions are shown in Table 5. They retrieve the total number of threads, set the number of threads, return the current thread number, and return the number of available cores, logical processors or physical processors, respectively. To view the complete list of OpenMP library functions, see the OpenMP Specification Version 2.5.

Table 5: The Most Heavily Used OpenMP Library Functions.

Figure 2 uses these functions to perform data processing for each element in array x. This example illustrates a few important concepts when using the function calls instead of pragmas. First, your code must be rewritten, and with any rewrite comes extra documentation, debugging, testing, and maintenance work. Second, it becomes difficult or impossible to compile without OpenMP support. Finally, because thread values have been hard coded, you lose the ability to have loop scheduling adjusted for you, and this threaded code is not scalable beyond four cores or processors, even if you have more than four cores or processors in the system.


float x[8000];

omp_set_num_threads(4);
#pragma omp parallel private(k)
{ // This code has a shortcoming. Can you find it?
  int num_thds = omp_get_num_threads();
  int ElementsPerThread = 8000 / num_thds;
  int Tid = omp_get_thread_num();
  int LowBound = Tid*ElementsPerThread;
  int UpperBound = LowBound + ElementsPerThread;

  for ( k = LowBound; k < UpperBound; k++ )
      DataProcess(x[k]);
}

Figure 2: Loop that Uses OpenMP Functions and Illustrates the Drawbacks.

OpenMP Environment Variables

The OpenMP specification defines a few environment variables. Occasionally the two shown in Table 6 may be useful during development.

Table 6: Most Commonly Used Environment Variables for OpenMP.

Additional compiler-specific environment variables are usually available. Be sure to review your compiler's documentation to become familiar with additional variables.

Compilation

Using the OpenMP pragmas requires an OpenMP-compatible compiler and thread-safe runtime libraries. The Intel C++ Compiler version 7.0 or later and the Intel Fortran compiler both support OpenMP on Linux and Windows. Several other choices are available as well, for instance, Microsoft supports OpenMP in Visual C++ 2005 for Windows and the Xbox 360 platform, and has also made OpenMP work with managed C++ code. In addition, OpenMP compilers for C/C++ and Fortran on Linux and Windows are available from the Portland Group.

The /Qopenmp command-line option given to the Intel C++ Compiler instructs it to pay attention to the OpenMP pragmas and to create multi-threaded code. If you omit this switch from the command line, the compiler will ignore the OpenMP pragmas. This action provides a very simple way to generate a single-threaded version without changing any source code. Table 7 provides a summary of invocation options for using OpenMP.

Table 7: Compiler Switches for OpenMP (C/C++ and Fortran).

For conditional compilation, the compiler defines _OPENMP. If needed, this definition can be tested in this manner:


#ifdef _OPENMP
    printf ( "Hello World, I'm using OpenMP!\n" );
#endif

The thread-safe runtime libraries are selected and linked automatically when the OpenMP-related compilation switch is used.

The Intel compilers support the OpenMP Specification Version 2.5 except the workshare construct. Be sure to browse the release notes and compatibility information supplied with the compiler for the latest information. The complete OpenMP specification is available from the OpenMP Web site, listed in References.

Debugging

Debugging multi-threaded applications has always been a challenge due to the nondeterministic execution of multiple instruction streams caused by runtime thread-scheduling and context switching. Also, debuggers may change the runtime performance and thread scheduling behaviors, which can mask race conditions and other forms of thread interaction. Even print statements can mask issues because they use synchronization and operating system functions to guarantee thread-safety.

Debugging an OpenMP program adds some difficulty, as OpenMP compilers must communicate all the necessary information of private variables, shared variables, threadprivate variables, and all kinds of constructs to debuggers after threaded code generation; additional code that is impossible to examine and step through without a specialized OpenMP-aware debugger. Therefore, the key is narrowing down the problem to a small code section that causes the same problem. It would be even better if you could come up with a very small test case that can reproduce the problem. The following list provides guidelines for debugging OpenMP programs.

  1. Use the binary search method to identify the parallel construct causing the failure by enabling and disabling the OpenMP pragmas in the program.
  2. Compile the routine causing problem with no /Qopenmp switch and with /Qopenmp_stubs switch; then you can check if the code fails with a serial run, if so, it is a serial code debugging. If not, go to Step 3.
  3. Compile the routine causing problem with /Qopenmp switch and set the environment variable OMP_NUM_THREADS=1; then you can check if the threaded code fails with a serial run. If so, it is a single-thread code debugging of threaded code. If not, go to Step 4.
  4. Identify the failing scenario at the lowest compiler optimization level by compiling it with /Qopenmp and one of the switches such as /Od, /O1, /O2, /O3, and/or /Qipo.
  5. Examine the code section causing the failure and look for problems such as violation of data dependence after parallelization, race conditions, deadlock, missing barriers, and uninitialized variables. If you can not spot any problem, go to Step 6.
  6. Compile the code using /Qtcheck to perform the OpenMP code instrumentation and run the instrumented code inside the Intel Thread Checker.

Problems are often due to race conditions. Most race conditions are caused by shared variables that really should have been declared private, reduction, or threadprivate. Sometimes, race conditions are also caused by missing necessary synchronization such as critica and atomic protection of updating shared variables. Start by looking at the variables inside the parallel regions and make sure that the variables are declared private when necessary. Also, check functions called within parallel constructs. By default, variables declared on the stack are private but the C/C++ keyword static changes the variable to be placed on the global heap and therefore the variables are shared for OpenMP loops. The default(none) clause, shown in the following code sample, can be used to help find those hard-to-spot variables. If you specify default(none), then every variable must be declared with a datasharing attribute clause.


#pragma omp parallel for default(none) private(x,y) shared(a,b)

Another common mistake is uninitialized variables. Remember that private variables do not have initial values upon entering or exiting a parallel construct. Use the firstprivate or lastprivate clauses discussed previously to initialize or copy them. But do so only when necessary because this copying adds overhead.

If you still can't find the bug, perhaps you are working with just too much parallel code. It may be useful to make some sections execute serially, by disabling the parallel code. This will at least identify the location of the bug. An easy way to make a parallel region execute in serial is to use the if clause, which can be added to any parallel construct as shown in the following two examples.


#pragma omp parallel if(0)
printf("Executed by thread %d\n", omp_get_thread_num());

#pragma omp parallel for if(0)
for ( x = 0; x < 15; x++ ) fn1(x);

In the general form, the if clause can be any scalar expression, like the one shown in the following example that causes serial execution when the number of iterations is less than 16.


#pragma omp parallel for if(n>=16)
for ( k = 0; k < n; k++ ) fn2(k);

Another method is to pick the region of the code that contains the bug and place it within a critical section, a single construct, or a master construct. Try to find the section of code that suddenly works when it is within a critical section and fails without the critical section, or executed with a single thread.

The goal is to use the abilities of OpenMP to quickly shift code back and forth between parallel and serial states so that you can identify the locale of the bug. This approach only works if the program does in fact function correctly when run completely in serial mode. Notice that only OpenMP gives you the possibility of testing code this way without rewriting it substantially. Standard programming techniques used in the Windows API or Pthreads irretrievably commit the code to a threaded model and so make this debugging approach more difficult.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video