Channels ▼
RSS

Design

Parallel In-Place N-bit-Radix Sort


Data-Dependent Parallelism

Some algorithms exhibit performance that varies dramatically across input data variations. This may or may not be a desirable behavior. Table 4 and Figure 4 show performance measurements for various algorithms with all array elements being MAX_ULONG=0xffffffff. Performance of all elements of zeros is slightly better.

[Click image to view at full size]
Table 4: MAX_ULONG=0xffffffff in all array elements.

[Click image to view at full size]
Figure 4: MAX_ULONG in all array elements.

STL sort and TBB parallel_sort are significantly faster than the Radix family of algorithms for all but the smallest array size. STL sort varies as much as 137X with data input variation, with random input being slower. TBB parallel_sort varies as much as 55X.

Non-parallel Radix Sort algorithms (IPP Radix Sort and In-Place Radix Sort) improved in performance with non-random input, by up to 2X and 1.2X respectively. However, Parallel In-Place Radix Sort slowed down by 1.3X to 2.4X. Parallel In-Place Radix Sort is faster than the non-parallel version by up to 1.4X for larger array sizes (compared to 2.5X for random inputs); i.e., parallel speedup reduced with non-random inputs. Also, Intel IPP Radix Sort is faster than Parallel In-Place Radix Sort for non-random inputs, plus it uses a single processing core.

Thus, random input distribution may not lead to the worst case performance for sorting algorithms.

Other Data Types

So far, performance of sorting unsigned 32-bit arrays was studied. Non-parallel N-bit-Radix Sort performs well for other data type sizes as well, such as 8-bit, 16-bit and 64-bit (unsigned and signed), as in [1]. At 8-bit and 16-bit data types comparison to Parallel Counting Sort developed in [2 and 3] is possible. Listings 2 and 3 support various byte-divisible unsigned data type sizes. Tables 5-7 and Figures 5-7 show performance measurements for unsigned 8-bit, 16-bit and 64-bit data types, respectively.

[Click image to view at full size]
Table 5: Unsigned 8-bit array elements.

[Click image to view at full size]
Figure 5: Unsigned 8-bit array elements.

For arrays of unsigned 8-bit elements, Parallel In-Place N-bit-Radix Sort provides at most about 20% in performance gain for the largest of array sizes, and 10% performance loss for mid-size array. It consistently outperforms STL sort() for all array sizes, up to 5X for mid-size arrays and 4.6X for the largest array sizes. It lags in performance consistently for all but the smallest array size against Intel's IPP Radix Sort. It consistently outperforms TBB parallel_sort across all array sizes, from 1.4X to 3.4X, with 1.8X for the largest array sizes. Parallel Counting Sort is substantially faster than all other algorithms, especially for the largest array sizes, outperforming Parallel Radix Sort by up to 14X and 3X faster than Intel's IPP Radix Sort for the largest array sizes.

[Click image to view at full size]
Table 6: Unsigned 16-bit array elements.

[Click image to view at full size]
Figure 6: Unsigned 16-bit array elements.

For 16-bit arrays, Parallel In-Place Radix Sort provides up to 1.8X speed-up over the non-parallel implementation on a quad-core processor. It is consistently faster that STL sort, by up to 8.7X. It lags in performance to Intel's IPP Radix Sort, but outperforms TBB parallel_sort in all but one array size, by up to 2.5X. Parallel Counting Sort is nearly 10X faster than all other algorithms for the largest array sizes, but lags in performance for mid and small array sizes.

[Click image to view at full size]
Table 7: Unsigned 64-bit array elements.

[Click image to view at full size]
Figure 7: Unsigned 64-bit array elements.

For 64-bit arrays, Parallel In-Place Radix Sort provides nearly 2X speedup over the non-parallel version, and is consistently faster except for two smaller array sizes. It is up to 6X faster than STL sort, and is faster except for one smaller array size. It is also consistently faster than TBB parallel_sort, up to 1.6X. Thus, Parallel In-Place Radix Sort consistently provides speedup over the non-parallel implementation across all tested data type sizes of unsigned 8-, 16-, 32-, and 64-bit.

Parallel Radix Sort implementation makes it possible for an input array of 8-bit elements to run on multiple cores by using 4-bit-Radix Sort (or 2-bit-Radix, or 1-bit-Radix). When performance was measured, comparing to 8-bit-Radix Sort, 4-bit-Radix Sort was about 12% slower, and 2-bit-Radix Sort was nearly 2X slower. Most likely multiple passes over the input data (bandwidth) is the limiting factor and warrants future investigation (as it could be due to parallel overhead).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video