### Sequential Implementation Details

The conceptual algorithm magically sized bins to hold all of their elements. Each bin knew where the bin started and where to place the next element. Algorithm implementation needs to keep track of these items. Listing 1 shows the simplified implementation of In-Place MSD N-bit-Radix sort algorithm. Figure 5 shows the steps of the algorithm, augmented with pointers along the way.

// Copyright(c), Victor J. Duvanenko, 2010 // Function template In-place MSD Radix Sort implementation (simplified). template< class _Type, unsigned long PowerOfTwoRadix, unsigned long Log2ofPowerOfTwoRadix, long Threshold > inline void _RadixSort_Unsigned_PowerOf2Radix_L1( _Type* a, long last, _Type bitMask, unsigned long shiftRightAmount ) { unsigned long count[ PowerOfTwoRadix ]; for( unsigned long i = 0; i < PowerOfTwoRadix; i++ ) count[ i ] = 0; for ( long _current = 0; _current <= last; _current++ ) count[ (unsigned long)(( a[ _current ] & bitMask ) >> shiftRightAmount ) ]++; // Scan the array and count the number of times each value appears long startOfBin[ PowerOfTwoRadix + 1 ], endOfBin[ PowerOfTwoRadix ], nextBin = 1; startOfBin[ 0 ] = endOfBin[ 0 ] = 0; startOfBin[ PowerOfTwoRadix ] = -1; // sentinal for( unsigned long i = 1; i < PowerOfTwoRadix; i++ ) startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ]; for ( long _current = 0; _current <= last; ) { unsigned long digit; _Type _current_element = a[ _current ]; // get the compiler to recognize that a register can be used for the loop instead of a[_current] memory location while( endOfBin[ digit = (unsigned long)(( _current_element & bitMask ) >> shiftRightAmount )] != _current ) _swap( _current_element, a[ endOfBin[ digit ]++ ] ); a[ _current ] = _current_element; endOfBin[ digit ]++; while( endOfBin[ nextBin - 1 ] == startOfBin[ nextBin ] ) nextBin++; // skip over empty and full bins, when the end of the current bin reaches the start of the next bin _current = endOfBin[ nextBin - 1 ]; } bitMask >>= Log2ofPowerOfTwoRadix; if ( bitMask != 0 ) // end recursion when all the bits have been processes { if ( shiftRightAmount >= Log2ofPowerOfTwoRadix ) shiftRightAmount -= Log2ofPowerOfTwoRadix; else shiftRightAmount = 0; for( unsigned long i = 0; i < PowerOfTwoRadix; i++ ) { long numberOfElements = endOfBin[ i ] - startOfBin[ i ]; if ( numberOfElements >= Threshold ) // endOfBin actually points to one beyond the bin _RadixSort_Unsigned_PowerOf2Radix_L1< _Type, PowerOfTwoRadix, Log2ofPowerOfTwoRadix, Threshold >( &a[ startOfBin[ i ]], numberOfElements - 1, bitMask, shiftRightAmount ); else if ( numberOfElements >= 2 ) insertionSortSimilarToSTLnoSelfAssignment( &a[ startOfBin[ i ]], numberOfElements ); } } } template< class _Type > inline void RadixSortInPlace_PowerOf2Radix_Unsigned( _Type* a, unsigned long a_size ) { const long Threshold = 25; const unsigned long PowerOfTwoRadix = 256; // Radix - must be a power of 2 const unsigned long Log2ofPowerOfTwoRadix = 8; // log( 256 ) = 8 unsigned long shiftRightAmount = sizeof( _Type ) * 8 - Log2ofPowerOfTwoRadix; // Create bit-mask and shift right amount _Type bitMask = (_Type)( ((_Type)( PowerOfTwoRadix - 1 )) << shiftRightAmount ); // bitMask controls/selects how many and which bits we process at a time if ( a_size >= Threshold ) _RadixSort_Unsigned_PowerOf2Radix_L1< _Type, PowerOfTwoRadix, Log2ofPowerOfTwoRadix, Threshold >( a, a_size - 1, bitMask, shiftRightAmount ); else insertionSortSimilarToSTLnoSelfAssignment( a, a_size ); }

The implementation is split into the setup function template and a recursive function template. The first three **for** loops in the recursive function are responsible for determination of bin sizes, along with the starting point (index) of each bin within the source array. The first two **for** loops implement the counting portion of Counting Sort. The third **for** loop figures out the starting point of each bin from the counts. The **startOfBin[ b ]** indicates the starting point of bin **b**, and **endOfBin[ b ] **indicates where the next element would be placed within bin **b**. The **endOfBin[ b ] ** starts out at the same value as **startOfBin[ b ]**, ensuring that elements are added to each bin from the beginning. Whenever an element is added to bin **b**, **endOfBin[ b ]** gets incremented, growing the size of the bin by one.
Once the **startOfBin** and **endOfBin** arrays have been initialized, the body **for** loop is executed (the forth **for **loop). This loops starts at the first array element and continues to process until the last element has been passed. The element at the **_current** index is picked up. The **while** loop is where the movement of array elements is performed by swapping. Inside the **while** loop condition, the digit of the **_current_element** used for sorting is extracted. Then **endOfBin[ digit ]** value is compared with **_current** array index. This comparison detects whether the swaps have created a loop, such that the** _current_element** gets back to the **_current** spot within the array. This is the same condition as we saw above where the element being processed ends up in the current bin. As long as no loop has been detected, the **while** loop continues to swap — i.e., to move the** _current_element** into its bin, and the element from that bin to **_current_element**. Once a loop has been detected, the **while** loop stops swapping and the **_current_element** is placed back in the array at the **_current** location. Note that the swap is performed between the **_current_element** and the end of the bin picked by the extracted digit. After the swap the** endOfBin[ digit ] **is incremented, growing that particular bin.

Once a loop has been found and completed by the **while** loop, the **endOfBin[]** of the current bin is grown, since an element has just been added. Now, the next element to be processed needs to be found, and the next **while** loop accomplishes this task. The condition in the **while** loop checks to see if the **endOfBin[]** of the current bin butts against the **startOfBin[]** of the next bin, and skips all bins that butt consecutively. This **while** loop continues until a gap between bins have been found, where unprocessed array elements have to be, or **startOfBin[ radix ]** has been reached. **startOfBin[ radix ]**, or more precisely **startOfBin[ PowerOfTwoRadix ]**, is a sentinel value. **startOfBin[PowerOfTwoRadix]** is not a real bin — the real bins are from zero to (**radix** - 1). **startOfBin[PowerOfTwoRadix ]** was initialized to -1 to ensure that equality is not possible (since array index of -1 is not a valid value). Thus, the **while** loop will exit either when it finds a gap between bins, or when the sentinel has been reached. The **_current** index will be set to the **endOfBin[]** value, which is either the index in between bins or **endOfBin[ radix - 1 ]**.

Figure 5 also illustrates the steps taken by the algorithm in greater detail. Plain arrows indicate **endOfBin[] **values for bins containing at least one element. An arrow with a ball-tail indicates the position of **_current **and **_current_element**.

After the main **for **loop has finished processing the input array based on the most-significant-digit, the **binMask **moves right to select the next digit to be used for sorting. Once all digits have been used for sorting, **(if (bitMask != 0))** test, the algorithm exits. The last loop is responsible for efficient recursive calls. The algorithm steps through all of the bins and calls itself recursively when a bin has more elements in it than the **Threshold **value set by the template function parameter. For smaller bins, an efficient version of Insertion Sort is called (developed in Part 1 of this series).

Insertion Sort is in-place and is faster than Radix Sort for small arrays. Of course, for bins with fewer than two elements, nothing needs to be done.