Channels ▼

Clay Breshears

Dr. Dobb's Bloggers

Rise of the Planet of the Iterative Quicksort

October 24, 2011

The genesis of my study of this less-than-intuitive implementation of Quicksort actually was a challenge from a colleague to do Iterative Quicksort with OpenMP. It's easy to use OpenMP tasks for parallelizing the recursive version of the algorithm. Voila!

void QuickSort(SomeDataType *A, int p, int r)
{
  if (p < r) {  // if there is one or more things to sort...
    q = Partition(A, p, r);
#pragma omp task
    QuickSort(A, p, q-1);
#pragma omp task
    QuickSort(A, q+1, r);
  }
}
. . . 
#pragma omp parallel 
{
  #pragma omp single
    QuickSort(A, 0, N); // initial call in parallel region
}

The initial call to get the Quicksort recursion started is done from a single region within a parallel region. That way, only one thread will actually execute the call to QuickSort() while all other threads on the team wait at the implicit barrier of the single construct. However, not wanting to leave threads spinning or twiddling their thumbs or knitting sweaters or doing whatever idle threads do, the OpenMP runtime will assign tasks spawned from QuickSort() to the blocked threads.

Using OpenMP on an iterative Quicksort can be as easy as I showed in the previous installment. Take a thread-safe queue, add a semaphore to keep count of items in the queue, and count as items are put into the final position to know when the sort is complete. The OpenMP thread team simply becomes a thread pool calling the iterative function as if they were explicitly spawned. If you haven't been there and done that, go back to my previous post and then come back. I'll wait.

Ready?

Okay, what can we do different? How about using an Intel Threading Building Blocks (TBB) concurrent_queue container? It may not sound much different than using a brute force queue implementation, but it does have at least one an advantage over the previous code I've posted. The TBB concurrent_queue object has a try_pop() method. This method looks into the queue and, if there is something there, it retrieves the item at the head of the queue; otherwise, it returns an error code. (Sound familiar?) And all of this is done atomically.

By using the TBB queue there is no need to allocate a semaphore object to track the number of items in the queue. However, we will need to set up a spin-wait loop to keep checking the queue status. Every time the queue is empty, threads will check the Done variable to determine if the empty queue is indicative of the sort being done. If Done has been set to TRUE, the sort is done and threads can return from QuickSort(); if not, check the queue again for an index pair of a sub-array to be sorted.

Everything else is pretty much the same as we've seen previously. Because we are using OpenMP, the protected update to tCount is done within an atomic region.

Here's the code for the OpenMP QuickSort() function. I've added the shared declaration of the TBB concurrent_queue to hold items of type qSortIndex (index pairs of integers defining a sub-array to be sorted), but not the inclusion of the proper TBB header file nor the calling of QuickSort() from a parallel region.

 tbb::concurrent_queue<qSortIndex> Q;
. . .
void QuickSort(SomeDataType *A, int N)
{
  int p, r, q;
  qSortIndex d, d1, d2;
  int t = 0;

  while (1) {
    while (!Q.try_pop(d)) { 
      if (Done) return;  // if Done, no more need to process anything
    }
    p = d.lo; r = d.hi;
    if (p < r) {
      q = Partition(A, p, r);
  #pragma omp atomic
      ++tCount;
      d1.lo = p; d1.hi = q-1;
      Q.push(d1);
      d2.lo = q+1; d2.hi = r;
      Q.push(d2);
    }
    else if (p == r) {
 #pragma omp atomic
      ++tCount;
      if (tCount == N) Done = TRUE;   // Signal threads that sorting is done
    }
  }
}

I know I said I was not a fan of spin-wait loops if there was an alternative. I've already provided an alternative, but where's the fun in rehashing something we've already done? Besides, another advantage of the above implementation is that there is no need to signal an external thread that sorting is done and have that external thread fill up the queue to "free" those threads sitting on some synchronization object waiting for something in the queue before they are able to check the status of the sort. This should yield simpler code. There is no need to spawn a team of one OpenMP thread to be the "external" thread and not have all threads deal with all the signals and objects of a spin-wait free version.

One last thing before I wind up this post. Below is a code snippet from an older version of the OpenMP algorithm I wrote. This is the two while-loops and one statement that follows the inner while-loop from the above function. The code works when run with multiple threads, but it looked funny when I was reviewing things to write up this post. (Helpful Hint to Readers: Include documenting comments so you won't be mystified by code you wrote just one month before.)

   while (1) {
    while (!Q.try_pop(d)) { //pause? or else spin-wait with no wait
      if (Done) break;  // if Done, no more need to process anything
    }
    if (Done) break;  // signaled to terminate
   . . .

The two conditional tests so close together looked redundant. I knew that one of them was required. However, when I tried to simplify the code by removing one or the other, the execution hung. So, let's end with a quiz. The quiz has two questions: 1) When running with two or more threads, why does the code hang if the first if (Done) break (line 3) is removed? 2) When running with two or more threads, why does the code hang if the second if (Done) break (line 5) is removed?

If you have an answer, show off and leave them in the comments.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video