Going Parallel: Part 5 -- Checking for Parallel Errors
In the previous blog, I created an application that calculated the value of Pi. In this blog I show how to detect and correct the errors that I inadvertently introduced in my last posting.
When changing code from serial to Parallel, it's easy to introduce new types of errors; data-races and deadlocks. Fortunately there are a number of tools that can be used to detect these. I'm going to Static and Dynamic analysis to help spot the problems.
In Static analysis we do a sort of 'automatic' code inspection using the Intel Parallel Composer. For the dynamic analysis I use the Intel Parallel Inspector. Both Composer and Inspector are part of the Intel Parallel Studio.
Links to previous sections of blog
Going Parallel: Part 1: Doing Two Things at Once - Impossible!
Going Parallel: Part 2: So Who's Really Writing Parallel Applications?
Going Parallel - Part 3: Let's get started!
Going Parallel - Part 4: Enter Parallel Studio
The application so far
If you remember, I created a dhrystone visual studio application, added some openMP and then ran the application. This resulted in a runtime error:
Static Analysis
Intel Parallel Studio comes with Parallel Lint. Parallel Lint can be used to help detect some 'parallel type' problems at the compilation stage. When making serial code parallel, any accesses to global or shared data is likely to cause problems. In the static analysis I am interested in spotting where global variables are accessed.
Filtering out the noise
Because the code is old legacy code, it is missing a number of function prototypes, giving warnings such as:
warning #266: function "strcpy" declared implicitly
warning #1786: function "scanf" ... was declared "deprecated
Before using Parallel Lint, I disabled these warnings so that the output window was less cluttered. I did this by adding a list to the Disabled Specific Diagnostics in the project properties dialogue box:
Enabling Parallel Lint
Parallel lint is enabled from the project properties dialogue box:
I've set the analysis depth to All Errors and Warnings.
Parallel Lint reports 33 warnings, similar to the ones below.
1>dhry_1.c(160): warning #12251: possible data dependence from (file:dhry_1.c line:160) to (file:dhry_2.c line:80), due to "Int_3_Loc" may lead to incorrect program execution in parallel mode
1>dhry_1.c(164): warning #12246: flow data dependence from (file:dhry_1.c line:164) to (file:dhry_1.c line:158), due to "Int_1_Loc" may lead to incorrect program execution in parallel mode
All the threading warnings related to the following symbols:
Int_3_Loc, Int_1_Loc, Int_2_Loc, Int_Glob, Ch_Index
Three of the symbols are local, two are global.
For the moment I'm not going to fix the errors, I'm going to run the dynamic analysis and see how the results compare.
Important Note:
Don't forget to disable Parallel Lint! No executable is generated when Parallel
Lint is enabled
Dynamic Analysis
For the dynamic analysis we use Parallel Inspector. First of all I made sure the Inspector was set up for detecting Threading errors via the drop-down menu on the tool bar.
Second thing I did was to reduce the number of loops the dhrystone program will run - this can be done in the Command Arguments:
Finally I launched the Inspector, choosing the second level of analysis Does my target have deadlocks or data races?.
Once the analysis has completed a new pane opens in Visual studio :
By pressing the Interpret Result button, the following vew is displayed:
As suggested by the pop-up hint, I double click on the first problem 'P1'
Right Problem - Wrong Solution
Warning : I never completed this task - I just include it so you can see the hole I was digging ... For the next useful steps see next section Strategies for fixing my legacy problem
For the first 'naive' attempt at fixing the problem, I simply place a ##pragma omp critical in front line 365 and line 375 of dhry_1.c
e.g:
#pragma omp criticalBool_Loc = Ch_1_Glob == 'A';
Pragma, pragma everywhere
After doing about 14 corrections, I reran the Inspection and still had 25 data races. I also knew that adding the pragma was not the best solution - but I was interested in getting something correct, but not necessarily optimal. Having lost the will to live, I ditched this attempt and started to look at other options.
Strategies for fixing legacy problem
Fundamental problem of the code are two fold. Firstly there are a lot of globals, and secondly some of the local variables in main() are being shared across the threads. Sharing of data, whether global or local across threads will produce data races, unless there is a suitable synchronisation mechanism in place. In the previous section I tried to enforce some protection by adding #pragma omp critical statements, which has the effect of serialising access to the block of code below the pragma. As you already know, I didn't pursue this strategy for very long.
The style of the code reflects the programming standards of the day. It's not uncommon when parallelising legacy code that there needs to be some refactoring of the code.
Strategies
Strategies to correct the data-races could include
- Use synchronisation to protect the objects
- Move the globals so they are local variable on the stack
- Use OpenMP data sharing constructs.
- Place globals \ locals within the parallel region
- Use OpenMP private\etc to localise variables
Goals
My overall goal is to
1. make minimum changes to the code
2. make sure code can still run in serial mode
3. make code so it is scalable
4. make the code run on all cores.
Any changes I make to the code should not break these goals.
Code Changes in a nutshell
A copy of the modified files are listed at the end of this blog.
Globals
To work around the globals issue, I
1. Created a structure to hold the globals:
typedef struct
globs
{
Rec_Pointer Ptr,
Next_Ptr;
int Int;
Boolean Bool;
char Ch_1,
Ch_2;
int Arr_1 [50];
int Arr_2 [50] [50];
} GLOBALS;
extern
GLOBALS *pGlob_Loc;2. Malloced memory for structures based on max threads
pGlob_Loc = (GLOBALS *)malloc(
sizeof(GLOBALS) * omp_get_max_threads());3. Added a define that uses the ThreadID
#define Ptr_Glob pGlob_Loc[ThreadID].Ptr
#define
Next_Ptr_Glob pGlob_Loc[ThreadID].Next_Ptr
#define
Int_Glob pGlob_Loc[ThreadID].Int
#define
Bool_Glob pGlob_Loc[ThreadID].Bool
#define
Ch_1_Glob pGlob_Loc[ThreadID].Ch_1
#define
Ch_2_Glob pGlob_Loc[ThreadID].Ch_2
#define
Arr_1_Glob pGlob_Loc[ThreadID].Arr_1
#define
Arr_2_Glob pGlob_Loc[ThreadID].Arr_2
5. Passed ThreadID into sub functions
Locals
To fix the shared local symbols I split the #pragma omp parallel for, so the parallel region started before the locals were declared:
main (
int argc, char * argv[])/*****/
/* main program, corresponds to procedures */
/* Main and Proc_0 in the Ada version */
{
pGlob_Loc = (GLOBALS *)malloc(sizeof(GLOBALS) * omp_get_max_threads());
#pragma omp parallel
{
One_Fifty Int_1_Loc;
REG One_Fifty Int_2_Loc;
One_Fifty Int_3_Loc;
...
#pragma
omp forfor (Run_Index = 1; Run_Index <= Number_Of_Runs; ++Run_Index)
Having made the changes to the code, yippee I only end up with 3 data races. All of these are associated with the calculation of time.
Time Measurements
What I'd like to do is make sure time just measured on one thread. To do this I enclosed all the time-related code into a pragma omp master.
e.g.
#pragma
omp master{
#ifdef TIMES
times (&time_info);
Begin_Time = (long) time_info.tms_utime;
#endif
#ifdef TIME
Begin_Time = time ( (long *) 0);
#endif
}
I did the same with two other blocks of code that are concerned with time,
Once I did this, the Problems P1 to P3 disappeared.
Final Check with Parallel Lint
I re-ran Parallel Lint and got 7 warnings about possible problems with access to pGlob_Loc, and 67 errors about an I/O problem.
1>dhry_1.c(88): warning #12251: possible data dependence from (file:dhry_1.c line:88) to (file:dhry_1.c line:91), due to "pGlob_Loc" may lead to incorrect program execution in parallel mode
1>dhry_1.c(105): error #12158: unsynchronized use of I/O statements by multiple threads
I'm not unduly worried by the pGlob_Loc. However, the I/O issue is due to the fact that printf may be non-re-entrant. I'll continue my explorations in the next blog
Conclusions (so far)
I hope I've shown you that doing both static and dynamic analysis of a parallel application is worthwhile. Using tools such as Intel Parallel Inspector and Intel Parallel Lint can help boost the confidence you have in your parallel code.
There's still more to do on this code - and I have purposely ignored some errors that will need correcting. , At least now I have an application that is going in the right direction.
Modified Files
A copy of the modified files can be found here:
dhry_correctness.h
dhry_1_correctness.c
dhry_2_correctness.c

