Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Debugging and Optimizing Multithreaded OpenMP Programs


To search such errors, we need programs like Intel Thread Checker.This program is provided as a module for Intel VTune Performance Analyzer profiler adding to already existing means for working with multithreaded code. Intel Thread Checker lets you detect both the errors described above and many other defects, for example, deadlocks (points of mutual lock of computational threads) and memory leaks.

When Intel Thread Checker is installed, a new project category will appear in New Project dialog of Intel VTune Performance Analyzer application -- Threading Wizards (wizards for working with threads), among which there will be Intel Thread Checker Wizard. To launch an example, you should select it and point the path to the executed program in the next wizard's window. The program will be executed and the profiler will collect all data about the application's operation. An example of such information provided by Intel Thread Checker is shown in Figure 1.

Figure 1: Lots of critical errors were detected during Thread Checker's operation.

As you see, the number of errors is large even for such a small program. Thread Checker groups the detected errors simultaneously estimating how critical they are for program operation and provides their description increasing the programmer's labor effectiveness. Besides, on Source View inlay there is program code of the application where the sections with the errors are marked (Figure 2).

Figure 2: Analysis of multithreaded code by Intel Thread Checker

We should mention that Intel Thread Checker cannot detect errors in some cases. This occurs when the code rarely gets control or is executed on a system with a different architecture. Errors can be also missed when the set of input text data differs greatly from the data processed by the program when used by end users. All this doesn't allow us to be sure that there are no errors in a multithreaded program when it is tested by dynamic analysis means the results of which depend on the environment and time of its execution.

But there is another tool and it is good news for OpenMP developers. This tool is VivaMP offering an alternative approach to verifying parallel programs. VivaMP is built according the principle of static code analyzer and allows you to test the application code without launching it.

VivaMP's scopes of use:

  • Controlling correctness of the code of applications developed on the basis of OpenMP technology.
  • Aid in mastering OpenMP technology and integrating it into already existing projects.
  • Creating parallel applications which use resources more effectively.
  • Search of errors in existing OpenMP applications.

VivaMP analyzer integrates into Visual Studio 2005/2008 environment and provides simple interface for testing applications (Figure 3).

Figure 3: Launch of VivaMP tool, integrated into Visual Studio 2005

If we launch VivaMP for our example we'll get a message about errors in 4 different strings where incorrect modification of variables occurs (Figure 4).

Figure 4: The result of VivaMP static analyzer's operation.

Of course, static analysis also has some disadvantages as well as dynamic analysis. But being used together these methodologies (Intel Thread Checker and VivaMP tools) supplement each other very well. And used together they serve rather a safe method of detecting errors in multithreaded applications.

The above described error of writing into x and y variables detected by Intel Thread Checker and VivaMP tools can be easily corrected: you only need to add one more directive into #pragma omp parallel for construction: private (x, y). Thus, these two variables will be defined as private and there will be separate copies of x and y in each computational thread. But you should also keep in mind that all the threads save the calculated result by adding it to s variable. Such errors occur when one computational thread tries to write some value into shared memory and simultaneously the second thread performs reading. In the given example it can lead to an incorrect result.

Let's consider s += j*y instruction. Originally it is suggested that each thread add the calculated result to the current value of s variable and then the same operations are executed by all the other threads. But in some cases the two threads begin to execute s += j*y instruction simultaneously, that is each of them first reads the current value of s variable, then adds the result of j*y multiplication to this value and writes the final result into s variable.

Unlike reading operation which can be implemented in parallel mode and is quite quick, the writing operation is always sequential. Consequently, if the first thread writes the new value first, the second thread, having performed the writing operation after that, will erase the result of the first thread's calculations because the both computational threads read one and the same value of s variable and begin to write their data into it. In other words, the value of s variable which will be written into shared memory by the second thread doesn't consider the results of calculations performed by the first thread at all. You can avoid such a situation by making sure that at any moment only one thread is allowed to execute s += j*y operation. Such operations are called indivisible or atomic. When we need to point the compiler that some instruction is atomic we use #pragma omp atomic construction. The program code in which the described operations are corrected is shown in Listing Three.

double FixedFunctionOpenMP(int N)
{
  double x, y, s=0;
  #pragma omp parallel for \
          private(x,y) num_threads(2)
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      #pragma omp atomic
      s += j * y;
      y = y * x;
    };
  };
  return s;
}

Listing Three

Having recompiled the program and analyzed it once again in Thread Checker, we'll see that it doesn't contain critical errors. Only two messages are shown telling that the parallel threads end when reaching return operator in MathFunction function. In the given example it is alright because only the code inside this function is paralleled. VivaMP static analyzer won't show any messages on this code at all as it is absolutely correct from its viewpoint.

But some work still remains. Let's see if our code has really become more effective after paralleling. Let's measure the execution time for three functions: 1 - sequential, 2 - parallel incorrect, 3 - parallel correct. The results of this measuring for N=1500 are given in Table 1.

Table 1: Results of functions operation.

What do we see in the table? The parallel variant of the incorrect function works slower in several times. But we are not interested in this function. The problem is that the correct variant works even more than 60 times slower. Do we need such parallelism? Of course not.

The reason is that we have chosen a very ineffective method of solving the problem of summing the result in s variable by using atomic directive. This approach leads to that the threads wait for each other very often. To avoid constant deadlocks when executing atomic summing operation we can use the special directive reduction. reduction option defines that the variable will get the combined value at the exit from the parallel block. The following operations are permissible: +, *, -, &, |, ^, &&, ||. The modified variant of the function is shown in Listing Four.

double OptimizedFunction(int N)
{
  double x, y, s=0;
  #pragma omp parallel for private(x,y) \
          num_threads(2) reduction(+: s)
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      s += j * y;
      y = y * x;
    };
  };
  return s;
}

Listing Four

In this case we'll get the function's variant not only correct but of higher performance as well (Table 2). The speed of calculations has almost doubled (more exactly, it has increased in 1.85 times) what is a good result for such functions.

Table 2: Results of functions operation.

In conclusion we would like to point out once again that an operable parallel program not always can be effective. And although parallel programming provides many ways to increase code effectiveness it demands attention and good knowledge of the used technologies from the programmer. Fortunately, there exist such tools as Intel Thread Checker and VivaMP which greatly simplify creation and testing of multithreaded applications.

#include "stdafx.h"
#include <omp.h>
#include <stdlib.h>
#include <windows.h>
class VivaMeteringTimeStruct {
public:
  VivaMeteringTimeStruct()
    { m_userTime = GetCurrentUserTime(); }
  ~VivaMeteringTimeStruct()
    { printf("Time = %.4f seconds\n", GetUserSeconds()); }
  double GetUserSeconds();
private:
  __int64 GetCurrentUserTime() const;
  __int64 m_userTime;
};
__int64 VivaMeteringTimeStruct::GetCurrentUserTime() const
{
  FILETIME creationTime, exitTime, kernelTime, userTime;
  GetThreadTimes(GetCurrentThread(), &creationTime,
                 &exitTime, &kernelTime, &userTime);
  __int64 curTime;
  curTime = userTime.dwHighDateTime;
  curTime <<= 32;
  curTime += userTime.dwLowDateTime;
  return curTime;
}
double VivaMeteringTimeStruct::GetUserSeconds()
{
  __int64 delta = GetCurrentUserTime() - m_userTime;
  return double(delta) / 10000000.0;
}
double Function(int N)
{
  double x, y, s=0;
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      s += j * y;
      y = y * x;
    };
  };
  return s;
}
double FunctionOpenMP(int N)
{
  double x, y, s=0;
  #pragma omp parallel for num_threads(2)
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      s += j * y;
      y = y * x;
    };
  };
  return s;
}
double FixedFunctionOpenMP(int N)
{
  double x, y, s=0;
  #pragma omp parallel for private(x,y) num_threads(2)
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      #pragma omp atomic
      s += j * y;
      y = y * x;
    };
  };
  return s;
}
double OptimizedFunction(int N)
{
  double x, y, s=0;
  #pragma omp parallel for private(x,y) \
          num_threads(2) reduction(+: s)
  for (int i=1; i<=N; i++) {
    x = (double)i/N;
    y = x;
    for (int j=1; j<=N; j++) {
      s += j * y;
      y = y * x;
    };
  };
  return s;
}
int _tmain(int , _TCHAR* [])
{
  int N = 15000;
  {
    VivaMeteringTimeStruct Timer;
    printf("Result = %.3f  ", Function(N));
  }
  {
    VivaMeteringTimeStruct Timer;
    printf("Result = %.3f  ", FunctionOpenMP(N));
  }
  {
    VivaMeteringTimeStruct Timer;
    printf("Result = %.3f  ", FixedFunctionOpenMP(N));
  }
  {
    VivaMeteringTimeStruct Timer;
    printf("Result = %.3f  ", OptimizedFunction(N));
  }
  return 0;
}

Listing Five


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.