Example: An Internally Parallelized Algorithm
The first example is simple and obvious, but shows an important way to deliver scalable parallelism in a tactical way for an existing application. Consider the following sequential implementation of quicksort, simplified to its essentials:
// Example 1(a): Sequential quicksort
void Quicksort( Iter first, Iter last ) {
// If the range is small, another sort is faster.
if( distance( first, last ) < limit ) {
OtherSort( first, last );
return;
}
// 1. Pick a pivot and partition the data.
Iter pivot = Partition( first, last );
// 2. Recurse to sort the subranges.
Quicksort( first, pivot );
Quicksort( pivot, last );
}
This is a traditional divide-and-conquer recursive algorithm, performed sequentially. If we're trying to parallelize our application, this is just the sort (sorry) of function we're looking for: A compute-intensive function that contains natural parallelism, in this case in the algorithm. The subrange sort operations work on nonoverlapping subsets of the data, and their processing is probably fully independent as long as sorting the left subrange doesn't cause other side effects that could affect sorting the right subrange, and vice versa.
This turns out to be a perfectly simple case to illustrate the point of this article, because the natural way to parallelize this code inherently hides the parallelism from the caller (assuming we get the mechanical details right; more about avoiding mistakes and pitfalls in another article):
// Example 1(b): Parallelized quicksort
void Quicksort( Iter first, Iter last ) {
// If the range is small, <FONT COLOR="red">synchronous</font> sort is faster.
if( distance( first, last ) < limit ) {
OtherSort( first, last );
return;
}
// 1. Pick a pivot and partition the data <FONT COLOR="red">synchronously</font>.
Iter pivot = Partition( first, last );
// 2. Recurse to sort the subranges <font color="red">in parallel</font>.
<b><font color="red">future fut = pool.run( [=]{</font></b> Quicksort( first, pivot ); <b><font color="red">} );</font></b>
Quicksort( pivot, last );
<b><font color="red">fut.join();</font></b>
}
Note that the code may look slightly different in your environment, but we're simply saying to execute the two subrange sorts in parallel:
- The syntax pool.run(x) denotes creating a task x to run in parallel, which is common when using thread pools for example.
- The syntax [=]{ f(); } is C++0x lambda function syntax, a convenient syntactic sugar for writing a function object to be executed elsewhere. If you're using C#, the syntax looks like ()=>{ f(); }. If you're using Java, just create a runnable object by hand.
This function may spawn a lot of parallel work recursively, but everything is joined before we return to the caller. The function just happens to run faster when run on machine that has more cores. So sequential code that calls this library function also gets to reap the benefits of leveraging whatever hardware parallelism there happens to be on a given end user's machine, at least for this function, without having to know about the parallel work being done on its behalf.



