Algorithm Improvement through Performance Measurement: Part 1

Algorithm Improvement through Performance Measurement: Part 2

Algorithm Improvement through Performance Measurement: Part 3

Algorithm Improvement through Performance Measurement: Part 4

Algorithm Improvement through Performance Measurement: Part 5

The In-place Hybrid Binary-Radix Sort I presented in Algorithm Improvement through Performance Measurement: Part 2 sorted arrays of numbers (8, 16, 32 and 64-bit, signed and unsigned) in linear time. The algorithm was shown to outperform the STL sort, but not the Radix-Sort that's part of Intel's Integrated Performance Primitives (IPP) library. However, it offered the ability to perform sorting without using extra storage ("in-place"), whereas Intel's IPP Radix-Sort requires additional storage of equal size to the input array.

The In-place Hybrid Binary-Radix Sort algorithm sorted based of the most-significant-bit first, splitting the array into two bins (0's and 1's bin), in-place, and then recursively sorted each of these bins, splitting them into 0's and 1's bins, continuing until all bits have been used for sorting. To reduce the overhead of recursion, when the bins were below a certain size, Insertion Sort was used, which boosted the overall performance over a purely Binary-Radix Sort.

Intel's IPP Radix-Sort algorithm was significantly faster than In-place Hybrid Binary-Radix Sort: 20X for 8-bit, 8X for 16-bit, and 3X for 32-bit unsigned integers, but at the price of 2X the storage. It would be beneficial to develop a Radix Sort implementation that was in-place. That is the focus of this article.

### Radix-Sort

Radix Sort does not compare array elements to each other, but instead sorts based on the digits of each array element. It can either sort starting with the most-significant-digit (MSD) or the least-significant-digit (LSD). In the MSD case, the algorithm starts by sorting all array elements into bins based on the MSD. For example, for decimal-radix the array elements would be split into 10 bins based on the MSD. If an element's MSD was a 0, then it would be placed into bin 0. If an element's MSD was a 1, then it would be placed into bin 1, and so on. Each bin could then be copied back to the original array, starting with bin 0, then bin1 and so on. Each bin would then be split into sub-bins based on the next digit and so on, until all digits have been used for sorting.

Figure 1 illustrates the concept of splitting the array into bins and then those bins into sub-bins, and so on recursively. At the top level the original array is shown. At the next level down in the tree, the original array is split into bins, up to N-radix of them, based on the first digit. For example, if 256-radix (8-bits) was used, then the up to 256 bins would be created. At the next level, each of these bins is split further into sub-bins, based on the next digit. Once again, up to 256 sub-bins are created if 256-radix is used. Thus, at this level it is possible to have up to 256*256 = 64K sub-bins. This process repeats at the next level using the next digit and so on, until all of the bits within the element have been used for sorting.

For example, for an array made of 16-bit numbers, and using 256-radix sorting starting with the most-significant-digit, the first level would be created by taking the upper 8-bits of each element, looking at that value and placing the element into the bin based on that value. Since there are 256 possible values for 8-bits, then up to 256 bins would be created. Then the next 8-bits of each element are used to break the 256 bins further into 256 sub-bins. For 16-bit numbers, there would be only two levels of bins and sub-bins; i.e., 256*256 = 64K bins at the bottom level. Then a method is needed to collect these bins in the right order.

ff00_{16} 0001_{16} 0280_{16} 0030_{16} 5000_{16} 0201_{16}

To sort the above array of hex 16-bit unsigned numbers the following steps would be performed:

- The first level is based on the most-significant 8-bits of each element of the array:
- ff00
_{16}would be placed in bin ff_{16}(bin 256 in decimal), - 0001
_{16}and 0030_{16}would be placed in bin 0, - 0280
_{16}and 0201_{16}would be placed in bin 2, - 5000
_{16}would be placed in bin 50_{16}(bin 80 in decimal), - The rest of the 256 bins would not have any elements in them. The first level is done.

- ff00
- The second level is based on the next 8-bits of each element of the array:
- Bin ff
_{16}does not need to be sorted any further since it has only a single element in it, - Bin 0 has two elements and is divided into two bins: bin 1 and bin 30
_{16}(48 decimal), - Bin 2 has two elements and is divided into two bins: bin 80
_{16}(128 decimal) and bin 1, - Bin 50
_{16}does not need to be sorted any further, - The second level is done.

- Bin ff
- The elements are collected in the order of recursion starting with the lowest number bin on the first level and the second level and progressing to the highest bin on the first and second levels.

### The In-place Radix-Sort Concept

The main ideas of in-place Radix-Sort are:

- A pass is performed over the input array to determine how many elements are in each bin -- the size of each bin.
- Since the size of each bin has been determined, the starting index within the input array for each bin can be computed.
- Take the first element within the input array and figure out which bin it should move to based on its MSD. Swap it with the element that is at the start of that bin. Grow the size of that bin.
- Examine the new element that was swapped into the first element location. Figure out which bin it should move to. Swap it with the element in that bin. If that bin turns out to be the first bin, then keep the element there and grow the first bin, then process the next element in the array.
- If this next element is inside the next bin, then step over this next bin. If this bin butts against another bin, then skip it too, until an element is found that does not belong to any bin yet.
- Continue is such a manner until all elements within the input array have been placed in their bins.
- Once processing by MSD has been done, call the Radix-Sort itself recursively for each of the bins that has two or more elements in it and sort based on the next digit. Continue recursion until all digits have been used to sort with.

Listing 1 shows an initial in-place 256-radix implementation for arrays of 32-bit unsigned numbers.

// Listing 1 template< class _Type > inline void _RadixSort_initialUnsigned_Radix256_2( _Type* a, long last, _Type bitMask, unsigned long shiftRightAmount ) { const unsigned long numberOfBins = 256; // 256-radix (Bins 0 - 255) unsigned long count[ numberOfBins ]; for( unsigned long i = 0; i < numberOfBins; i++ ) count[ i ] = 0; for ( long _current = 0; _current <= last; _current++ ) // Scan array. Count number of times each value appears { _Type digit = ( a[ _current ] & bitMask ) >> shiftRightAmount; // extract digit sorting is based on count[ digit ]++; } long startOfBin[ numberOfBins ], endOfBin[ numberOfBins ], nextBin; startOfBin[ 0 ] = endOfBin[ 0 ] = nextBin = 0; for( unsigned long i = 1; i < numberOfBins; i++ ) startOfBin[ i ] = endOfBin[ i ] = startOfBin[ i - 1 ] + count[ i - 1 ]; for ( long _current = 0; _current <= last; ) // Move elements from original array into bins based on digit value { _Type digit = ( a[ _current ] & bitMask ) >> shiftRightAmount; // extract the digit we are sorting based on if ( endOfBin[ digit ] != _current ) { _swap( a[ _current ], a[ endOfBin[ digit ]] ); endOfBin[ digit ]++; } else { endOfBin[ digit ]++; // leave the element at its location and grow the bin _current++; // advance the current pointer to the next element while( _current >= startOfBin[ nextBin ] && nextBin < numberOfBins ) nextBin++; while( endOfBin[ nextBin - 1 ] == startOfBin[ nextBin ] && nextBin < numberOfBins ) nextBin++; if ( _current < endOfBin[ nextBin - 1 ] ) _current = endOfBin[ nextBin - 1 ]; } } bitMask >>= 8; shiftRightAmount -= 8; if ( bitMask != 0 ) // end recursion when all the bits have been processes { for( unsigned long i = 0; i < numberOfBins; i++ ) if (( endOfBin[ i ] - startOfBin[ i ] ) >= 2 ) _RadixSort_initialUnsigned_Radix256_2( &a[ startOfBin[ i ]], endOfBin[ i ] - 1 - startOfBin[ i ], bitMask, shiftRightAmount ); } } // Listing 1 inline void RadixSortInPlace_initialUnsigned_Radix256( unsigned long* a, unsigned long a_size ) { if ( a_size < 2 ) return; unsigned long bitMask = 0xFF000000; // bitMask controls how many bits we process at a time unsigned long shiftRightAmount = 24; _RadixSort_initialUnsigned_Radix256_2( a, a_size - 1, bitMask, shiftRightAmount ); }

This algorithm follows the recursive method above. A detailed explanation of the inner workings of this algorithm is provided below.