Beneath the Planet of the Iterative Quicksort
The secret ingredient for parallel iterative sorting is okra!
No, that doesn't sound right. That's the secret ingredient on next week's Iron Chef episode. (Shhh, don't spoil it for anyone. Try to act surprised when they announce it.)
For parallel iterative Quicksort, the missing piece from my last blog is how to terminate the sorting threads. To review, we need to use a thread-safe structure that can hold the index pairs of sub-arrays to be partitioned and then sorted. We don't want to drop an index pair or sort the same sub-array more than once. A quick and dirty implementation of a thread-safe queue using Windows threads includes a CriticalSection object as part of the queue structure to protect access to the queue. Most of the code for both enqueue() and dequeue() operations will be protected by the lock, which will serialize access to the queue. That's the quick part.
The dirty part would be not having a separate test for the queue being empty, but returning an error code when a thread tries to dequeue() from an empty queue. To use this modification in serial, the body of the iterative Quicksort function is enclosed in an infinite while-loop. When the function finds an empty queue (only after the sort is complete since the initial pair of index values for the whole array to be sorted was pushed onto the queue before the Quicksort function was first called), it returns the error code and then breaks out of the infinite loop.
Why go to such lengths? Why not just have a separate test for the queue being empty? Two reasons:
- First, when threaded, the nondeterministic execution can allow one thread to find a non-empty queue and, before it can pull out the single index pair on the queue, another thread also finds the queue holds something. Both threads expect to find an index pair, but only one will get it and the other will get garbage.
- Second, the empty queue can no longer be used to determine sort completion when more than one thread is involved in the sort. Once the initial index pair is taken out of the queue by one thread, any other threads involved with the sort will find the queue empty. If those threads terminate their execution, the first thread will be left alone to sort the whole array.
So, getting back to terminating threads, you first need to know when it is okay for threads to terminate. If the empty queue cannot be used to signal when the sorting is done, what other condition could be used to know when all items have been placed in their final position? Fortunately, the Quicksort algorithm has a special property that we can exploit: every time we call the Partition() function at least one item the pivot element is known to be in its sorted position. That means we can simply increment a shared counter after Partition() is executed and, when the counter equals the total number of items that were to be sorted, every item is in its final position.
Obviously the increment operation on the shared counter needs to be done in a protected way. If we use Windows threads for our parallelism, the InterlockedIncrement() function to perform an atomic increment after returning from calls to Partition(). Also, we can test for the special case where the sub-array to be sorted has a single element in it. Because this item must also be in its final position, there will be no need to call QuickSort() on that one thing, but simply increment the counter.
Before we go all the way with a parallel solution, let's take a look at how all the ideas discussed above can be implemented into the serial iterative QuickSort() function.
long tCount = 0;
BOOL Done = FALSE;
void QuickSort(void *pArg)
{
int p, r, q;
qSortIndex *d, *d1, *d2;
while (1) {
if (Done) break; // signal to terminate
Q.dequeue(&d);
p = d.lo; r = d.hi;
if (p < r) { // more than one thing to sort
q = Partition(p, r);
InterlockedIncrement(&tCount);
d1.lo = p; d1.hi = q-1;
Q.enqueue(d1);
d2.lo = q+1; d2.hi = r;
Q.enqueue(d2);
}
else if (p == r) { // exactly one thing to be sorted
InterlockedIncrement(&tCount);
if (tCount == N) Done = TRUE; // Signal that sorting is done
}
}
}
The (shared) variables tCount and Done are defined and initialized external to the QuickSort() function. The sort is completed when tCount reaches the number of items (N) in the original array to be sorted. This sets Done to TRUE and the next iteration of the while-loop will break out rather than attempt to remove something from the (now empty) queue, Q. Notice that even though I talked about the modification of an error code being returned from the dequeue() method, we don't use it here in the serial version.

