Improved Template Function
So far, a simple template function has been used, which abstracted/generalized the data type only. Listing 3 abstracts the Radix, log2( Radix ), and Threshold, as well as the data type.
// Copyright(c), Victor J. Duvanenko, 2009 // Listing 3 template< class _Type > inline unsigned long extractDigit( _Type a, _Type bitMask, unsigned long shiftRightAmount ) { unsigned long digit = (unsigned long)(( a & bitMask ) >> shiftRightAmount ); // extract the digit we are sorting based on return digit; } template< unsigned long PowerOfTwoRadix, unsigned long Log2ofPowerOfTwoRadix, long Threshold, class _Type > inline void _RadixSort_StableUnsigned_PowerOf2Radix_2( _Type* a, _Type* b, long last, _Type bitMask, unsigned long shiftRightAmount, bool inputArrayIsDestination ) { const unsigned long numberOfBins = PowerOfTwoRadix; unsigned long count[ numberOfBins ]; for( unsigned long i = 0; i < numberOfBins; i++ ) count[ i ] = 0; for ( long _current = 0; _current <= last; _current++ ) // Scan the array and count the number of times each value appears count[ extractDigit( a[ _current ], bitMask, shiftRightAmount ) ]++; long startOfBin[ numberOfBins ], endOfBin[ numberOfBins ]; startOfBin[ 0 ] = endOfBin[ 0 ] = 0; for( unsigned long i = 1; i < numberOfBins; i++ ) startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ]; for ( long _current = 0; _current <= last; _current++ ) b[ endOfBin[ extractDigit( a[ _current ], bitMask, shiftRightAmount ) ]++ ] = a[ _current ]; bitMask >>= Log2ofPowerOfTwoRadix; if ( bitMask != 0 ) // end recursion when all the bits have been processes { if ( shiftRightAmount >= Log2ofPowerOfTwoRadix ) shiftRightAmount -= Log2ofPowerOfTwoRadix; else shiftRightAmount = 0; inputArrayIsDestination = !inputArrayIsDestination; for( unsigned long i = 0; i < numberOfBins; i++ ) { long numOfElements = endOfBin[ i ] - startOfBin[ i ]; if ( numOfElements >= Threshold ) _RadixSort_StableUnsigned_PowerOf2Radix_2< PowerOfTwoRadix, Log2ofPowerOfTwoRadix, Threshold >( &b[ startOfBin[ i ]], &a[ startOfBin[ i ]], numOfElements - 1, bitMask, shiftRightAmount, inputArrayIsDestination ); else { insertionSortSimilarToSTLnoSelfAssignment( &b[ startOfBin[ i ]], numOfElements ); if ( inputArrayIsDestination ) for ( long j = startOfBin[ i ]; j < endOfBin[ i ]; j++ ) // copy from external array back into the input array a[ j ] = b[ j ]; } } } else { // Done with recursion copy all of the bins if ( !inputArrayIsDestination ) for ( long _current = 0; _current <= last; _current++ ) // copy from external array back into the input array a[ _current ] = b[ _current ]; } } template< class _Type > inline void RadixSortMSDStablePowerOf2Radix_unsigned( _Type* a, _Type* b, unsigned long a_size ) { const unsigned long Threshold = 100; // Threshold of when to switch to using Insertion Sort const unsigned long PowerOfTwoRadix = 256; const unsigned long Log2ofPowerOfTwoRadix = 8; unsigned long shiftRightAmount = sizeof( _Type ) * 8 - Log2ofPowerOfTwoRadix; _Type bitMask = (_Type)( ((_Type)( PowerOfTwoRadix - 1 )) << shiftRightAmount ); // bitMask controls/selects how many and which bits we process at a time if ( a_size >= Threshold ) _RadixSort_StableUnsigned_PowerOf2Radix_2< PowerOfTwoRadix, Log2ofPowerOfTwoRadix, Threshold >( a, b, a_size - 1, bitMask, shiftRightAmount, false ); else insertionSortSimilarToSTLnoSelfAssignment( a, a_size ); }
C++ provides a powerful template function capability, which allow not only function arguments to be passed to a template function (on the stack), but also generic values and data types [4]. The function arguments that are inside the parentheses () will be pushed on the stack, but generic parameters inside the pair of <> are not pushed on the stack. Sadly, the function arguments are allowed to have default values, where as generic parameters are not [4].
These generic values are not passed on the stack and thus don’t slow recursive function down. They can also be "true constants" to allow arrays to be allocated locally, and not slow things down by dynamic allocations. For example, the generic value PowerOfTwoRadix is used to allocate the count[] array of the Radix size locally (on the stack). Log2ofPowerOfTwoRadix and Threshold are used as "true constants" that are not passed on the stack. Note that _Type generic parameter is last, which allows it to not be specified, as is seen in the recursive function call to _RadixSort_StableUnsigned_PowerOf2Radix_2<>().
This implementation can also handle an arbitrary power-of-two radix -- i.e. any number of bits.