Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Optimization Techniques for Intel Multicore Processors Revisited


Compiler Support for Parallelism in the Intel Parallel Composer

Intel Parallel Composer is comprised of a C++ compiler that provides access to multicore programming features. The optimizing compiler generates code targeting IA-32 and Intel 64 architectures, conforms to the C and C++ languages, and offers binary compatibility with Microsoft Visual C++. The strongest advantage of the Intel Compiler is its optimization technology and performance feature support which includes OpenMP and automatic parallelization.

OpenMP is a portable, shared memory multiprocessing application program interface supported by multiple vendors on several operating systems and under the following programming languages, Fortran 77, Fortran 90, C, and C++. OpenMP simplifies parallel application development by hiding many of the details of thread management and thread communication behind a simplified programming interface. Developers specify parallel regions of code by adding pragmas to the source code. In addition, these pragmas communicate other information such as properties of variables and simple synchronization.

Listing 1 is a sample OpenMP program that calculates the value of pi by summing the area under a curve.

#include <stdio.h>
#include <omp.h>
static int num_steps = 100000;
double step;
#define NUM_THREADS 2
int main ()
{
  int i;
  double x, pi, sum = 0.0;
  step = 1.0/(double) num_steps;
  omp_set_num_threads(NUM_THREADS);
  #pragma omp parallel for reduction(+:sum) private(x)
  for (i=0;i< num_steps; i++){
    x = (i+0.5)*step;
    sum = sum + 4.0/(1.0+x*x);
  }
  pi = step * sum;
   printf("%lf", pi);
   return 0;
}

Listing 1: Sample OpenMP code showing the use of pragma and library function.

The key line in the parallel implementation is:

#pragma omp parallel for reduction (+:sum) private (x)

which specifies the following for loop should be executed by a team of threads, temporary partial results represented by the sum variable should be aggregated at the end of the parallel region by addition, and finally the variable x is private, meaning each thread gets its own private copy. The program strives to be serially consistent which means the parallel version of the code is very similar to the original serial version. In fact, transforming this code back to a serial version requires the removal of the openmp function call and the openmp pragma. The keys in parallelizing the code are summed up as follows:

  • Identify the concurrent work. The concurrent work is the area calculation encompassing different parts of the curve.
  • Divide the work evenly. The number of rectangle areas to compute is 100000 and is equally allocated between the threads.
  • Create private copies of commonly used resources. The variable x needs to be private as each thread's copy will be different.
  • Synchronize access to unique shared resources. The only shared resource, step, does not require synchronization in this example because it is only read by the threads, not written.

Automatic parallelization (enabled using the /Qparallel option), which is also called "auto-parallelization," analyzes loops and creates threaded code for the loops determined to be beneficial to parallelize. Automatic parallelization is a good first technique to try in parallelizing your code as the effort to do so is fairly low. The compiler will only parallelize loops that can be determined to be safe to parallelize. The following tips may improve the likelihood of successful parallelization:

  • Use the optimization reporting option. The parallelization optimization report (/Qpar-report) provides a summary of the compiler's analysis of every loop and in cases where a loop cannot be parallelized, a reason as to why not. This is useful in that even if the compiler cannot parallelize the loop, the developer can use the information gained in the report to identify regions for manual threading.
  • Expose the trip count of loops whenever possible. The compiler has a greater chance of parallelizing loops whose trip counts are statically determinable.
  • Avoid placing function calls inside loop bodies. Function calls may have effects on the loop that cannot be determined at compile time and may prevent parallelization.
  • Adjust the threshold needed for auto-parallelization. The compiler estimates how much computation is occurring inside of the loop and if it determines the amount is too small, parallelization may not occur. This can be overridden by the threshold option (/Qpar-threshold[n]).

Listing 2 shows the results of compiling the code in Listing 1 with the auto-parallelization option. Compilation and execution of the OpenMP-enabled code works successfully using only auto-parallelization, pointing out the fact that auto-parallelization using the Intel Compiler uses the same underlying libraries as the OpenMP implementation. For example, the call to omp_set_num_threads resolves correctly with auto-parallelization even though this function is defined by the OpenMP API.

icl /Qparallel pi.cpp
pi-1.c(11): warning #161: unrecognized #pragma
#pragma omp parallel for reduction(+:sum) private(x)
        ^
pi.cpp(12) : (col. 4) remark: LOOP WAS AUTO-PARALLELIZED.

Listing 2: Compiler log of compiling with auto-parallelization


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.