### Plain Function

Listing 2 shows the plain function implementation. It was inspired by the algorithms and coding style of Professor Sedgewick's "Algorithms in C++" book series. This implementation has only a few differences with the function template version. Recursive calls have been moved inside the recursive function, and are no longer in the setup function. Two of the three template parameters have been converted to constants, and the third has become an added function parameter. From users point of view the function template and plain functions have an identical interface (for sorting arrays of "unsigned long" 32-bit elements). However, the function template will also work for other native unsigned numeric data types, such as 8, 16, 32, 64, and so on. To support other data types the plain function would need to be replicated for each data type.

// Copyright(c), Victor J. Duvanenko, 2010 // Plain function In-place MSD Radix Sort implementation (simplified). const unsigned long PowerOfTwoRadix = 256; // constants in C++ are of static linkage - i.e. seen only by this source file (nice in this case!) const unsigned long Log2ofPowerOfTwoRadix = 8; inline void _RadixSort_Unsigned_PowerOf2Radix_Simple( unsigned long* a, long last, unsigned long bitMask, unsigned long shiftRightAmount, long Threshold ) { if ( last < Threshold ) { insertionSortSimilarToSTLnoSelfAssignment( a, last + 1 ); return; } unsigned long count[ PowerOfTwoRadix ]; for( unsigned long i = 0; i < PowerOfTwoRadix; i++ ) count[ i ] = 0; for ( long _current = 0; _current <= last; _current++ ) count[ (unsigned long)(( a[ _current ] & bitMask ) >> shiftRightAmount ) ]++; // Scan the array and count the number of times each value appears long startOfBin[ PowerOfTwoRadix + 1 ], endOfBin[ PowerOfTwoRadix ], nextBin = 1; startOfBin[ 0 ] = endOfBin[ 0 ] = 0; startOfBin[ PowerOfTwoRadix ] = -1; // sentinal for( unsigned long i = 1; i < PowerOfTwoRadix; i++ ) startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ]; for ( long _current = 0; _current <= last; ) { unsigned long digit; unsigned long current_element = a[ _current ]; // get the compiler to recognize that a register can be used for the loop instead of a[_current] memory location while( endOfBin[ digit = (unsigned long)(( current_element & bitMask ) >> shiftRightAmount )] != _current ) _swap( current_element, a[ endOfBin[ digit ]++ ] ); a[ _current ] = current_element; endOfBin[ digit ]++; while( endOfBin[ nextBin - 1 ] == startOfBin[ nextBin ] ) nextBin++; // skip over empty and full bins, when the end of the current bin reaches the start of the next bin _current = endOfBin[ nextBin - 1 ]; } bitMask >>= Log2ofPowerOfTwoRadix; if ( bitMask != 0 ) // end recursion when all the bits have been processes { if ( shiftRightAmount >= Log2ofPowerOfTwoRadix ) shiftRightAmount -= Log2ofPowerOfTwoRadix; else shiftRightAmount = 0; for( unsigned long i = 0; i < PowerOfTwoRadix; i++ ) _RadixSort_Unsigned_PowerOf2Radix_Simple( &a[ startOfBin[ i ]], endOfBin[ i ] - startOfBin[ i ] - 1, bitMask, shiftRightAmount, Threshold ); } } inline void RadixSortInPlace_PowerOf2Radix_Simple( unsigned long* a, unsigned long a_size ) { unsigned long shiftRightAmount = sizeof( unsigned long ) * 8 - Log2ofPowerOfTwoRadix; // Create bit-mask and shift right amount unsigned long bitMask = (unsigned long)( ((unsigned long)( PowerOfTwoRadix - 1 )) << shiftRightAmount ); // bitMask controls/selects how many and which bits we process at a time const long Threshold = 25; _RadixSort_Unsigned_PowerOf2Radix_Simple( a, a_size - 1, bitMask, shiftRightAmount, Threshold ); }

Using constants was necessary, since local arrays (**count[]**,** startOfBin[]**, and **endOfBin[]**) are allocated based on the value of **PowerOfTwoRadix**. Some C++ compilers do not support allocation of local arrays (on the stack) based on an array size value passed as an argument to a function, as shown below:

void foo( const unsigned long array_size ) { unsigned long my_array[ array_size ]; // Compile ERROR! ... }

Some versions of gcc support this coding style natively. Also, an **alloca()** function could be used instead of constants, as using constants is constraining (and in the case of radix, needs to be varied).

### References

Parallel In-Place N-bit-Radix Sort

Quality of Random Numbers versus Performance

In-Place Hybrid Binary-Radix Sort

Optimizing Performance of Sorting Algorithms (Insertion and Selection Sort)

In-Place Hybrid N-bit-Radix Sort