A Simple Example
The code in Listing One is serial code. It is simple and has no traps: Given one array of BusStops and one of Houses (both containing coordinates), we want to determine the closest bus stop to each house. If there are many houses and bus stops: Depending on the implementation of FindNearestBusStop, the runtime of this function can take up to several hours or more.
Listing One
// Global bus stop data
typedef struct
{
float x, y;
// ...
}oneBusStop;
oneBusStop *BusStops;
int BusStopCount;
// Global house data
typedef struct
{
float x, y;
int BusStop;
// ...
}oneHouse;
oneHouse *Houses;
int houseCount;
int FindNearestBusStop(oneHouse *theHouse)
{
int i = 0;
// Find nearest of all entries in BusStops
// ...
return i; // index in BusStops array
}
void FindBusStopsForAllHouses()
{
int i;
for(i = 0; i < houseCount; i++)
Houses[i].BusStop = FindNearestBusStop(&Houses[i]);
}
If we make sure that FindNearestBusStop does not modify any input data, parallelization of the loop is straightforward: We just divide the array of houses into four slices and assign each of those slices to a thread (Listing Two), using the Windows threading API. This kind of parallelized partition of work will scale very well with the number of processors.
Listing Two
DWORD WINAPI FindBusStops(void * threadData)
{
int i;
startData *data = (startData*) threadData;
for(i = data->first; i <= data->last; i++)
Houses[i].BusStop = FindNearestBusStop(&Houses[i]);
return 0;
}
#define THREAD_COUNT 4 //normally not hard-coded, but detected at run time
int noOfThreads = THREAD_COUNT;
void FindBusStopsForAllHouses()
{
int slizeCount, x;
HANDLE handles[THREAD_COUNT];
startData data[THREAD_COUNT];
sliceCount = houseCount / noOfThreads;
for(x = 0; x < noOfThreads; x++)
{
data[x].first = sliceCount * x;
if(x == noOfThreads - 1)
data[x].last = houseCount - 1;
else
data[x].last = sliceCount * (x + 1) - 1;
handles[x] = CreateThread(NULL, 0, FindBusStops, &data[x], 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, handles, TRUE, INFINITE);
}
Solution using OpenMP
The same results can be achieved using OpenMP. All parallelism is introduced by simply inserting the line #pragma omp parallel for before the for loop (Listing Three).
Listing 3
void FindBusStopsForAllHouses()
{
int i;
#pragma omp parallel for
for(i = 0; i < houseCount; i++)
Houses[i].BusStop = FindNearestBusStop(&Houses[i]);
}
As can be seen, OpenMP makes the change virtually trivial.
Not all cases will be as easy to solve as this example, but it does illustrate that not all multithreading is difficult. Locking mechanisms and the question of thread contention for access to code and data are left to other tutorials.
Practical Tips
Configuration
To distinguish multithreading-specific failures from other problems, it is helpful to create a command-line switch to disable all multithreading (frequently done by allowing a specification that the maximum number of threads = 1). This allows you to determine whether a given defect is the result of multithreading or a bug in the serial code.
How many threads to use
Typically it's best to determine the number of execution pipelines (on x86 systems, this is the number of processors multiplied by the number of cores multiplied by 2 if HyperThreading is in use; see Using concurrency for scalability, an MSDN article by Joe Duffy, for further discussion). If all cores operate without blocking due to inter-thread interference as described earlier, the performance can scale almost linearly by the number of execution pipelines. In programs where much of the processed data fits into a processor cache, the performance gain may be much higher (super linear parallelization is described in Using OpenMP, by Barbara Chapman, Gabriele Jost, and Ruud van der pas).
Testing and Validation
Test the code on multiple multiprocessor machines; this way, any contention problems between threads that might be masked on one machine might become visible on another.
Debugging
When debugging multithreaded applications, be aware of an additional possible problem: If you hit a breakpoint, all threads of your application are suspended. But if you step through your code in many systems, all threads are resumed until the next line is fully executed. This may lead to a different thread hitting the first-mentioned breakpoint again before the stepped thread has reached the next line which can be rather confusing. This situation can be avoided by freezing all other threads in the debugger when hitting a breakpoint.
Conclusion
Multithreading is becoming the standard for desktop applications. It will become increasingly difficult to avoid this transition. If done carefully and step by step, the migration of a mature application can be done successfully. Start with the easy things and progress from there.
References
Using concurrency for scalability by Joe Duffy. MSDN Magazine, September 2006.
Using OpenMP, Barbara Chapman, Gabriele Jost, Ruud van der pas. MIT Press 2008, p. 186.
Multithreaded File I/O by Stefan Wörthmüller. Dr. Dobb's, 2009.
Eliminate False Sharing, by Herb Sutter. Dr. Dobb's, 2009.
Modern Multithreading, Richard H. Carver, Kuo-Chung Tai. Wiley-Intersience 2006.
Stefan Wörthmüller lives in Berlin where he develops mostly in C++.



