Channels ▼
RSS

Parallel

Algorithm Improvement through Performance Measurement: Part 3


Hybrid

It is simple to convert the initial 256-Radix Sort implementation to a Hybrid implementation by letting Insertion Sort handle any recursion for a small number of elements. This strategy was used successfully by STL sort and by Hybrid Binary-Radix Sort in (see Algorithm Improvement through Performance Measurement: Part 2). In the case of Hybrid Binary-Radix Sort the performance was improved by about 30%. However, 256-Radix Sort has a substantially larger overhead than Hybrid Binary-Radix Sort or Insertion Sort, which shows up in measurements shown in Table 1 and Graph 1, where Insertion Sort is significantly faster for input data set under 1K elements. Table 2 and Graph 2 compare non-Hybrid and Hybrid implementations of 256-Radix Sort.

Table 2: Random 32-bit Unsigned Elements

Graph 2

Listing 2 shows the Hybrid 256-Radix Sort implementation. Note that the same value of 32 that was used in [1] for Hybrid Binary-Radix Sort is used as a threshold to decide whether to call Insertion Sort or not. Also, the decision is made immediately before calling Hybrid 256-Radix Sort, to avoid the overhead of executing the Hybrid 256-Radix Sort even once when the array size is smaller than the threshold.


// Listing 2
template< class _Type >
inline void insertionSortSimilarToSTLnoSelfAssignment( _Type* a, unsigned long a_size )
{
    for ( unsigned long i = 1; i < a_size; i++ )
   {
        if ( a[ i ] < a[ i - 1 ] )   // no need to do (j > 0) compare for the first iteration
                    {
                        _Type currentElement = a[ i ];
                        a[ i ] = a[ i - 1 ];
                        unsigned long j;
                        for ( j = i - 1; j > 0 && currentElement < a[ j - 1 ]; j-- )
                              {
                                  a[ j ] = a[ j - 1 ];
                              }
                              a[ j ] = currentElement;   // always necessary work/write
		}
           // Perform no work at all if the first comparison fails - i.e. never assign an element to itself!
            }
}
template< class _Type >
inline void _RadixSort_HybridUnsigned_Radix256( _Type* a, long last, _Type bitMask, unsigned long shiftRightAmount )
{
         const unsigned long  numberOfBins = 256;  // 256-radix (Bins 0 - 255)
          unsigned long count[ numberOfBins ];
          for( unsigned long i = 0; i < numberOfBins; i++ )
              count[ i ] = 0;
          for ( long _current = 0; _current <= last; _current++ )	// Scan the array and count the number of times each value appears
{
        _Type digit = ( a[ _current ] & bitMask ) >> shiftRightAmount;  // extract the digit sorting is based on
              count[ digit ]++;
}
long startOfBin[ numberOfBins ], endOfBin[ numberOfBins ], nextBin;
     startOfBin[ 0 ] = endOfBin[ 0 ] = nextBin = 0;
     for( unsigned long i = 1; i < numberOfBins; i++ )
          startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ];
     for ( long _current = 0; _current <= last; )  // Move elements from the original array into the bins based on digit value
{
   _Type digit = ( a[ _current ] & bitMask ) >> shiftRightAmount;  // extract the digit we are sorting based on
            if ( endOfBin[ digit ] != _current )
            {
            _swap( a[ _current ], a[ endOfBin[ digit ]] );
            endOfBin[ digit ]++;
             }
                 else {
                 endOfBin[ digit ]++;   // leave the element at its location and grow the bin
                _current++;                 // advance the current pointer to the next element
              while( _current >= startOfBin[ nextBin ] && nextBin < numberOfBins )
                 nextBin++;
                while( endOfBin[ nextBin - 1 ] == startOfBin[ nextBin ] && nextBin < numberOfBins )
                     nextBin++;
                if ( _current < endOfBin[ nextBin - 1 ] )
                     _current = endOfBin[ nextBin - 1 ];
          }
}
bitMask          >>= 8;
shiftRightAmount  -= 8;
if ( bitMask != 0 )    // end recursion when all the bits have been processes
{
         for( unsigned long i = 0; i < numberOfBins; i++ )
         if      (( endOfBin[ i ] - startOfBin[ i ] ) >= 32 )
         _RadixSort_HybridUnsigned_Radix256( &a[ startOfBin[ i ]], endOfBin[ i ] - 1 - startOfBin[ i ], bitMask, shiftRightAmount );
          else if (( endOfBin[ i ] - startOfBin[ i ] ) >= 2 )
             insertionSortSimilarToSTLnoSelfAssignment( &a[ startOfBin[ i ]], endOfBin[ i ] - startOfBin[ i ]);
     }
}
// Listing 2
inline void RadixSortInPlace_HybridUnsigned_Radix256( unsigned long* a, unsigned long a_size )
{
       if ( a_size < 2 ) return;
       unsigned long bitMask = 0xFF000000; // bitMask controls how many bits we process at a time
       unsigned long shiftRightAmount = 24;
             if ( a_size >= 32 )
                 _RadixSort_HybridUnsigned_Radix256(a, a_size - 1, bitMask, shiftRightAmount );
        else
        insertionSortSimilarToSTLnoSelfAssignment( a, a_size );
}

Listing 2

Inside the Hybrid 256-Radix Sort template function (which recurses) the recursion threshold check is done at the end of the function and not at the beginning because this is more efficient by several percent. Otherwise, if the check whether to call Insertion Sort or not is done at the top of the template function, then 256-Radix Sort would be called only then to decide that Insertion Sort should have been called, adding an unnecessary function call overhead before every call to Insertion Sort.

These measurements show that a Hybrid 256-Radix Sort using Insertion Sort substantially improves the performance of the algorithm. This is due to Insertion Sort having much lower overhead for small input data sets (see Table 1 and Graph 1).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video