# Parallel In-Place Radix Sort Simplified

### Sequential Implementation Details

The conceptual algorithm magically sized bins to hold all of their elements. Each bin knew where the bin started and where to place the next element. Algorithm implementation needs to keep track of these items. Listing 1 shows the simplified implementation of In-Place MSD N-bit-Radix sort algorithm. Figure 5 shows the steps of the algorithm, augmented with pointers along the way.

```// Copyright(c), Victor J. Duvanenko, 2010
// Function template In-place MSD Radix Sort implementation (simplified).
template< class _Type, unsigned long PowerOfTwoRadix, unsigned long Log2ofPowerOfTwoRadix, long Threshold >
{
for( unsigned long i = 0; i < PowerOfTwoRadix; i++ )     count[ i ] = 0;
for ( long _current = 0; _current <= last; _current++ )  count[ (unsigned long)(( a[ _current ] & bitMask ) >> shiftRightAmount ) ]++;	// Scan the array and count the number of times each value appears

long startOfBin[ PowerOfTwoRadix + 1 ], endOfBin[ PowerOfTwoRadix ], nextBin = 1;
startOfBin[ 0 ] = endOfBin[ 0 ] = 0;    startOfBin[ PowerOfTwoRadix ] = -1;			// sentinal
for( unsigned long i = 1; i < PowerOfTwoRadix; i++ )
startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ];

for ( long _current = 0; _current <= last; )
{
unsigned long digit;
_Type _current_element = a[ _current ];	// get the compiler to recognize that a register can be used for the loop instead of a[_current] memory location
while( endOfBin[ digit = (unsigned long)(( _current_element & bitMask ) >> shiftRightAmount )] != _current )  _swap( _current_element, a[ endOfBin[ digit ]++ ] );
a[ _current ] = _current_element;

endOfBin[ digit ]++;
while( endOfBin[ nextBin - 1 ] == startOfBin[ nextBin ] )  nextBin++;	// skip over empty and full bins, when the end of the current bin reaches the start of the next bin
_current = endOfBin[ nextBin - 1 ];
}
if ( bitMask != 0 )						// end recursion when all the bits have been processes
{
else												shiftRightAmount  = 0;

for( unsigned long i = 0; i < PowerOfTwoRadix; i++ )
{
long numberOfElements = endOfBin[ i ] - startOfBin[ i ];
if ( numberOfElements >= Threshold )		// endOfBin actually points to one beyond the bin
else if ( numberOfElements >= 2 )
insertionSortSimilarToSTLnoSelfAssignment( &a[ startOfBin[ i ]], numberOfElements );
}
}
}

template< class _Type >
{
const long Threshold                      =  25;
const unsigned long PowerOfTwoRadix       = 256;	// Radix - must be a power of 2
const unsigned long Log2ofPowerOfTwoRadix =   8;	// log( 256 ) = 8
unsigned long shiftRightAmount = sizeof( _Type ) * 8 - Log2ofPowerOfTwoRadix;		// Create bit-mask and shift right amount
_Type bitMask = (_Type)( ((_Type)( PowerOfTwoRadix - 1 )) << shiftRightAmount );	// bitMask controls/selects how many and which bits we process at a time

if ( a_size >= Threshold )
else
insertionSortSimilarToSTLnoSelfAssignment( a, a_size );
}
```

Listing 1

[Click image to view at full size]
Figure 5: First level of recursion with pointers.

The implementation is split into the setup function template and a recursive function template. The first three for loops in the recursive function are responsible for determination of bin sizes, along with the starting point (index) of each bin within the source array. The first two for loops implement the counting portion of Counting Sort. The third for loop figures out the starting point of each bin from the counts. The startOfBin[ b ] indicates the starting point of bin b, and endOfBin[ b ] indicates where the next element would be placed within bin b. The endOfBin[ b ] starts out at the same value as startOfBin[ b ], ensuring that elements are added to each bin from the beginning. Whenever an element is added to bin b, endOfBin[ b ] gets incremented, growing the size of the bin by one. Once the startOfBin and endOfBin arrays have been initialized, the body for loop is executed (the forth for loop). This loops starts at the first array element and continues to process until the last element has been passed. The element at the _current index is picked up. The while loop is where the movement of array elements is performed by swapping. Inside the while loop condition, the digit of the _current_element used for sorting is extracted. Then endOfBin[ digit ] value is compared with _current array index. This comparison detects whether the swaps have created a loop, such that the _current_element gets back to the _current spot within the array. This is the same condition as we saw above where the element being processed ends up in the current bin. As long as no loop has been detected, the while loop continues to swap — i.e., to move the _current_element into its bin, and the element from that bin to _current_element. Once a loop has been detected, the while loop stops swapping and the _current_element is placed back in the array at the _current location. Note that the swap is performed between the _current_element and the end of the bin picked by the extracted digit. After the swap the endOfBin[ digit ] is incremented, growing that particular bin.

Once a loop has been found and completed by the while loop, the endOfBin[] of the current bin is grown, since an element has just been added. Now, the next element to be processed needs to be found, and the next while loop accomplishes this task. The condition in the while loop checks to see if the endOfBin[] of the current bin butts against the startOfBin[] of the next bin, and skips all bins that butt consecutively. This while loop continues until a gap between bins have been found, where unprocessed array elements have to be, or startOfBin[ radix ] has been reached. startOfBin[ radix ], or more precisely startOfBin[ PowerOfTwoRadix ], is a sentinel value. startOfBin[PowerOfTwoRadix] is not a real bin — the real bins are from zero to (radix - 1). startOfBin[PowerOfTwoRadix ] was initialized to -1 to ensure that equality is not possible (since array index of -1 is not a valid value). Thus, the while loop will exit either when it finds a gap between bins, or when the sentinel has been reached. The _current index will be set to the endOfBin[] value, which is either the index in between bins or endOfBin[ radix - 1 ].

Figure 5 also illustrates the steps taken by the algorithm in greater detail. Plain arrows indicate endOfBin[] values for bins containing at least one element. An arrow with a ball-tail indicates the position of _current and _current_element.

After the main for loop has finished processing the input array based on the most-significant-digit, the binMask moves right to select the next digit to be used for sorting. Once all digits have been used for sorting, (if (bitMask != 0)) test, the algorithm exits. The last loop is responsible for efficient recursive calls. The algorithm steps through all of the bins and calls itself recursively when a bin has more elements in it than the Threshold value set by the template function parameter. For smaller bins, an efficient version of Insertion Sort is called (developed in Part 1 of this series).

Insertion Sort is in-place and is faster than Radix Sort for small arrays. Of course, for bins with fewer than two elements, nothing needs to be done.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

# First C Compiler Now on Github

The earliest known C compiler by the legendary Dennis Ritchie has been published on the repository.

# HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

# Building Bare Metal ARM Systems with GNU

All you need to know to get up and running... and programming on ARM

# Amazon's Vogels Challenges IT: Rethink App Dev

Amazon Web Services CTO says promised land of cloud computing requires a new generation of applications that follow different principles.

# How to Select a PaaS Partner

Eventually, the vast majority of Web applications will run on a platform-as-a-service, or PaaS, vendor's infrastructure. To help sort out the options, we sent out a matrix with more than 70 decision points to a variety of PaaS providers.

More "Best of the Web" >>