Channels ▼

Stephen Blair-chappell

Dr. Dobb's Bloggers

Going Parallel: Part 5 -- Checking for Parallel Errors

July 16, 2009

In the previous blog, I created an application that calculated the value of Pi. In this blog I show how to detect and correct the errors that I inadvertently introduced in my last posting.

When changing code from serial to Parallel, it's easy to introduce new types of errors; data-races and deadlocks. Fortunately there are a number of tools that can be used to detect these. I'm going to Static and Dynamic analysis to help spot the problems.
In Static analysis we do a sort of 'automatic' code inspection using the Intel Parallel Composer. For the dynamic analysis I use the Intel Parallel Inspector. Both Composer and Inspector are part of the Intel Parallel Studio.

 

Links to previous sections of blog

Going Parallel: Part 1: Doing Two Things at Once - Impossible!
Going Parallel: Part 2: So Who's Really Writing Parallel Applications?
Going Parallel - Part 3: Let's get started!
Going Parallel - Part 4: Enter Parallel Studio

The application so far

 

If you remember, I created a dhrystone  visual studio application, added some openMP and then ran the application. This resulted in a runtime error:

 

Static Analysis

Intel Parallel Studio comes with Parallel Lint.  Parallel Lint can be used to help detect some 'parallel type' problems at the compilation stage. When making serial code parallel, any accesses to global or shared data is likely to cause problems. In the static analysis I am interested in spotting where global variables are accessed.

Filtering out the noise

Because the code is old legacy code, it is missing a number of function prototypes, giving warnings such as:

warning #266: function "strcpy" declared implicitly
warning #1786: function "scanf" ... was declared "deprecated

Before using Parallel Lint, I disabled these warnings so that the output window was less cluttered.  I did this by adding a list to the Disabled Specific Diagnostics in the project properties dialogue box:

 
Enabling Parallel Lint

Parallel lint is enabled from the project properties dialogue box:

I've set the analysis depth to All Errors and Warnings.

Parallel Lint reports 33 warnings, similar to the ones below.

1>dhry_1.c(160): warning #12251: possible data dependence from (file:dhry_1.c line:160) to (file:dhry_2.c line:80), due to "Int_3_Loc" may lead to incorrect program execution in parallel mode

1>dhry_1.c(164): warning #12246: flow data dependence from (file:dhry_1.c line:164) to (file:dhry_1.c line:158), due to "Int_1_Loc" may lead to incorrect program execution in parallel mode

All the threading warnings related to the following symbols:

        Int_3_Loc, Int_1_Loc, Int_2_Loc, Int_Glob, Ch_Index

Three of the symbols are local, two are global.

For the moment I'm not going to fix the errors, I'm going to run the dynamic analysis and see how the results compare.

 

   Important Note:
Don't forget to disable Parallel Lint! No executable is generated when Parallel Lint is enabled

 

 

Dynamic Analysis

For the dynamic analysis we use Parallel Inspector. First of all I made sure the Inspector was set up for detecting Threading errors via the drop-down menu on the tool bar.

 

Second thing I did was to reduce the number of loops the dhrystone program will run - this can be done in the Command Arguments:

 

Finally I launched the Inspector, choosing the second level of analysis Does my target have deadlocks or data races?.

 

 

Once the analysis has completed a new pane opens in Visual studio :

 

By pressing the Interpret Result button, the following vew is displayed:

 

As suggested by the pop-up hint, I double click on the first problem 'P1'

 

 

Right Problem - Wrong Solution

Warning : I never completed this task - I just include it so you can see the hole I was digging ... For the next useful steps see next section Strategies for fixing my legacy problem 

 

For the first 'naive' attempt at fixing the problem, I simply place a ##pragma omp critical in front line 365 and line 375 of dhry_1.c

e.g:

    #pragma omp critical
    Bool_Loc = Ch_1_Glob == 'A';

Pragma, pragma everywhere

 

After doing about 14 corrections, I reran the Inspection and still had 25 data races. I also knew that adding the pragma was not the best solution - but I was interested in getting something correct, but not necessarily optimal. Having lost the will to live, I ditched this attempt and started to look at other options.

 

Strategies for fixing  legacy problem

Fundamental problem of the code are two fold. Firstly  there are a lot of globals, and secondly some of the local variables in main() are being shared across the threads.  Sharing of data, whether global or local across threads will produce data races, unless there is a suitable synchronisation mechanism in place.  In the previous section I tried to enforce some protection by adding #pragma omp critical statements, which has the effect of serialising access to the block of code below the pragma. As you already know, I didn't pursue this strategy for very long.

The style of  the code reflects the programming standards of the day. It's not uncommon when parallelising legacy code that there needs to be some refactoring of the code.     

Strategies

Strategies to correct the data-races could include

  • Use synchronisation to protect the objects
  • Move the globals so they are local variable on the stack
  • Use OpenMP data sharing constructs.
  • Place globals \ locals  within the parallel region
  • Use OpenMP private\etc  to localise variables
Goals

My overall goal is to

    1. make minimum changes to the code
    2. make sure code can still run in serial mode
    3. make code so it is scalable
    4. make the code run on all cores.

Any changes I make to the code should not break these goals.

Code Changes in a nutshell

A copy of the modified files are listed at the end of this blog.

Globals

To work around the globals issue, I

1. Created a structure to hold the globals:

    typedef struct globs
    {
        Rec_Pointer Ptr,
        Next_Ptr;
        int Int;
        Boolean Bool;
        char Ch_1,
        Ch_2;
        int Arr_1 [50];
        int
Arr_2 [50] [50];
     } GLOBALS;

    extern GLOBALS *pGlob_Loc;
 

2. Malloced  memory for structures based on max threads

    pGlob_Loc = (GLOBALS *)malloc(sizeof(GLOBALS) * omp_get_max_threads());

3. Added a define that uses the ThreadID
 

    #define Ptr_Glob pGlob_Loc[ThreadID].Ptr
    #define Next_Ptr_Glob pGlob_Loc[ThreadID].Next_Ptr
    #define Int_Glob pGlob_Loc[ThreadID].Int
    #define Bool_Glob pGlob_Loc[ThreadID].Bool
    #define Ch_1_Glob pGlob_Loc[ThreadID].Ch_1
    #define Ch_2_Glob pGlob_Loc[ThreadID].Ch_2
    #define Arr_1_Glob pGlob_Loc[ThreadID].Arr_1
    #define
Arr_2_Glob pGlob_Loc[ThreadID].Arr_2

   

5. Passed ThreadID into sub functions

Locals

To fix the shared local symbols I split the #pragma omp parallel for, so the parallel region started before the locals were declared:

    main (int argc, char * argv[])
    /*****/
    /* main program, corresponds to procedures */
    /* Main and Proc_0 in the Ada version */
    {
        pGlob_Loc = (GLOBALS *)malloc(
sizeof(GLOBALS) * omp_get_max_threads());
        #pragma omp parallel
        {
             One_Fifty Int_1_Loc;
            REG One_Fifty Int_2_Loc;
            One_Fifty Int_3_Loc;

            ...

        #pragma omp for
        for
(Run_Index = 1; Run_Index <= Number_Of_Runs; ++Run_Index)

 

Having made the changes to the code, yippee I only end up with 3 data races. All of these are associated with the calculation of time.

Time Measurements

What I'd like to do is make sure time just measured on one thread. To do this I enclosed all the time-related code into a pragma omp master. 

e.g.

#pragma omp master
{
    #ifdef TIMES
    times (&time_info);
    Begin_Time = (long) time_info.tms_utime;
    #endif
    #ifdef
TIME
   Begin_Time = time ( (
long *) 0);
    #endif
}

I did the same with two other blocks of code that are concerned with time,


Once I did this, the Problems P1 to P3 disappeared.

 

Final Check with Parallel Lint

I re-ran Parallel Lint and got 7 warnings about possible problems with access to pGlob_Loc, and 67 errors about an I/O problem.

1>dhry_1.c(88): warning #12251: possible data dependence from (file:dhry_1.c line:88) to (file:dhry_1.c line:91), due to "pGlob_Loc" may lead to incorrect program execution in parallel mode

1>dhry_1.c(105): error #12158: unsynchronized use of I/O statements by multiple threads

I'm not unduly worried by the pGlob_Loc. However, the I/O issue is due to the fact that printf may be  non-re-entrant. I'll continue my explorations in the next blog

 

Conclusions (so far)

I hope I've shown you that doing both static and dynamic analysis of a parallel application is worthwhile. Using tools such as Intel Parallel Inspector and Intel Parallel Lint can help boost the confidence you have in your parallel code.

There's still more to do on this code - and I have purposely ignored some errors that will need correcting. , At least now I have an application that is going in the right direction. 

 

Modified Files

A copy of the modified files can be found here: 
    dhry_correctness.h
    dhry_1_correctness.c
    dhry_2_correctness.c

 

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video