Channels ▼

Clay Breshears

Dr. Dobb's Bloggers

Quicksort Partition via Prefix Scan

July 02, 2012

In the three steps that I've just outlined here, I can create one of three subsets needed from the Partition operation for Quicksort. The other two partition subsets can be constructed in a similar manner, all in O(log M) time. After that the "less than" and "greater than" subsets are ready for sorting via the Quicksort algorithm.

There were a few ideas that occurred to me as I was writing the above. If you're in an indulgent mood, please read on. If you've got something else to get to soon or you have something cooking as you're reading this, simply skip down to the last two paragraphs.

First, the creation of the three masks can be done at the same time with a 3-part if-then-else structure to place a "1" in the corresponding position of the proper vector mask. Each processor is only looking at one element compared to the pivot value and it's still an O(1) operation. The prefix scan still needs to be done three times. (I suppose, if you're really tricky and you've written your own prefix scan subroutine, you can do all of them at the same time by handling each of the three vector masks all at once.)

Moving the data needs some care when building all three subsets concurrently. For the less-than subset, nothing changes from what was described earlier. If you are using separately allocated vectors for the other two subsets, these are handled just like the less-than vector. However, if you allocate a single vector the same length as the vector to be partitioned, you need to calculate the offsets into that one vector for the second and third subsets. The offsets are going to be found using the last element of the post-prefix scan vector masks. For example, look back to the example prefix scan results. The last value stored there is "3", which means that the pivot subset would start in the [4] (3 + 1) index position and the other elements equal to the pivot value will be indexed by adding 4 to the appropriate prefix scan vector values. The index of the first element in the greater-than subset will be the sum of the final elements from the less-than and equal prefix scan results plus the sum of the two final mask vector elements. (Only one of the three mask vectors will have a "1" in the final position and since I'm using an exclusive scan, the prefix scan on that mask vector will be one less than the actual count of the number of elements that match the criteria.)

Second, if you choose the pivot element to be the final vector element, you can skip the prefix scan for the equal subset run the partition algorithm on the first M-1 elements. Simply place that one pivot value in the proper position between the other two subset vectors when you've computed them. One of the partitioned subsets needs to hold any item from the unsorted vector that might be equal to the pivot value, too.

Finally, this variation of Quicksort is stable. By using the prefix scan and the method for choosing which elements go into which subset, the relative order of equal keys is preserved. Once the subsets become small enough, you simply need to use a stable sort algorithm to complete the process. If you pick the final element as the pivot to reduce the number of prefix scan operations (and because there may not be too many equal-valued keys), then the first subset must be "less than or equal to" in order to preserve the stability of this sort.

If you've been playing along with the other underlying theme of my last two postings, you should be able to see the advantages of this Partition variation in terms of work and span. In the traditional and the scan version of parallel Quickosrt, the work will be O(n log n). This shouldn't be a surprise since we are doing a sort. The big difference is that the span of the prefix scan version is much improved. With the traditional serial Partition code, the span for parallel Quicksort will be O(n) while the scan version is going to be only O(log^2 n).

With manycore processors, I hope you can see that there may be more parallelism available to speed up algorithms for which we already have parallel implementations on multi-core systems. Finding the algorithms that can take advantage of the bounty of cores will be a challenge and will require looking at things from a different perspective. Next up, I'll detail an actual implementation of the scan-based Partition operation described above.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.