Initial Implementation: The Details
The initial implementation processes 8-bits of each array element at a time, which makes it a 256-radix algorithm. It uses the most-significant 8-bits during the first level of recursion, and then moves to the next 8-bits. Thus, for an array of 16-bit elements, two levels of recursion will be used. For an array of 32-bit elements, four levels of recursion will be used. The bitmask is set to select these most-significant 8-bits, and the shiftRightAmount will also be used to extract the particular 8-bits from each element.
A template function _RadixSort_initialUnsigned_Radix256() is then called, which will be generalized to handle 8, 16, 32 and 64-bit unsigned data types. This function allocates the count array of radix size (256 elements in this case), and initializes each of the counts to zero (in the first for loop). In the second for loop the array that was passed into the function is scanned from beginning to end. For each element the digit that we will be sorting based on is extracted (e.g., the upper 8-bit of each element).
This value (digit) is used as an index into the count array and that particular count is incremented. In other words, this for loop counts how many elements in the array have the most-significant-digit (MSD) equal to a zero, in which case count is incremented. If the digit is a one, then count is incremented, and so on for all other digits up to the digit value of 255, where count is incremented. After the for loop, the count array will hold a count of occurrence of each digit value. From this pass over the array the size of each bin has been determined. This pass is very similar to a histogram in image and video processing domain, where the occurrence of each value of red, green, and blue color component is determined and shown on a graph.
In the third for loop the starting location for each bin is determined, using the count array. Bin 0 starts at location (offset) zero and will hold count elements. Bin 1 will start right after the last element of bin 0 -- i.e., at location (offset) of zero plus count. Bin 2 will start at bin 1 starting point plus count, and so on for the rest of the bins. Keep in mind that if a bin has zero elements, then it will have a starting point starting point. However, the next bin will have the same starting point as well. The ending point for each bin (marked by endOfBin array) is kept track of as each bin is grown in size, when elements are added to it. Initially these ending points (endOfBin array) are set to the same value as the starting point of each bin, indicating that each array has zero elements in it. The difference between the starting and ending points (e.g., endOfBin -- startOfBin ) indicates how many elements are currently in that bin.
The fourth for loop performs sorting based on the current digit. The loop iterates through all elements within the array given to the function. For each element in the array, the first step is to extract the digit that is used to sort with by masking and shifting. The _current location within the array is where the current array element is being processed. This location starts out being at the same location as startOfBin and endOfBin. Next, the if statement compares the _current location with the endOfBin[digit], where the _current element should be moved to based on the extracted digit. If the _current element belongs to a bin with endOfBin[digit] that is different from _current, then the _current location within the array does not need to be advanced. The _current element is moved to the bin it belongs to and the element that was at that location is moved to the _current location (the swap operation). That bin is also grown. This iteration of the loop is done. The _current location within the array holds a new element that was swapped in. This new element will be processed in the next loop iteration.
The fourth for loop also has an else portion of the if statement, where processing occurs when the element at the current location belongs in the current Bin. In this case, the element does not need to be moved, and the current Bin is grown (by incrementing its endOfBin). The current location (_current) is incremented, at which point several conditions can occur. The _current location index could end up inside the next bin, which is the condition the first while loop looks for. This situation is illustrated in the diagram below:
Bin 0 starts with a0 which it contains, and the startOfBin points to (left arrow with a circular tail). The value at a1 is not part of bin 0 yet, and is the value being analyzed. The endOfBin pointer is set to the one beyond the end of bin 0, pointing to the next potential candidate. The _current pointer is located at a1 as well, which is the value the for loop is processing. Since the _current element has the upper 8-bit of value equal to zero (in bold) this element belongs in bin 0. No swapping or moving is needed. Bin 0 needs to be grown to include a1 element, and the algorithm does this by advancing the endOfBin for bin 0. The resulting situation is illustrated on the next line of the diagram.
In this case, bin 0 has been filled -- all of the elements that belong to bin 0 have been found and moved to bin 0. Now, bin 0 butts against bin 1 without any gaps. The _current pointer is selecting element a2 which belongs to bin 1. Thus, bin 1 needs to be skipped. This is the situation that the second while loop processes. In this case, several bins butt against each other with no gaps in between. The second while loop skips all of the bins until a gap is found with an element that does not yet belong to any of the bins, or all of the bins have been processed (filled). At this point the _current pointer will either point to the next unprocessed element in between bins or past the end of array -- past _last (and the main loop will exit):
Another scenario that can occur is illustrated in the diagram above. In this case, bin 0 has just been grown by adding a1 element, and current pointer was moved to a2 element. However, bin 1 contains zero elements in it (startofBin and endOfBin point to a2). This is the case that the first while loop takes care of. In this case, nextBin is advanced. NextBin keeps track on which bin is the next bin after the _current pointer. In the case above, since the _current pointer was at the same location as endOfBin -- i.e., the current bin was bin 0 -- the nextBin value was 1 (to indicated that bin 1 was next). By using the nextBin value, the while loop is able to check whether the _current pointer has reached the start of the next bin, which then needs to be skipped. This is the condition shown in the diagram above -- _current pointer is equal to startOfBin, and thus bin 1 needs to be skipped, which is done by incrementing nextBin. A while loop is used in this case, instead of an if statement, because several bins in a row could have zero elements in then and all of them would start at a2 element. This while loop will skip all of these zero size bins.
Performance measurements of this initial implementation are shown in Table 1, along with STL sort, Intel's IPP Radix Sort, Intel's sort, Hybrid Binary-Radix Sort, and Insertion Sort.
These measurements show the initial 256-Radix Sort implementation lagging in performance comparing to all other sorting algorithms, except Insertion Sort for large input data sets. Note that STL sort and Hybrid Binary-Radix Sort perform substantially better than other algorithms for small input data sets because they are hybrid algorithms and utilize Insertion Sort for small data sets .
The same correctness tests were performed on every algorithm described, as was performed in . The same performance tests were performed as in Algorithm Improvement through Performance Measurement: Part 2.