# Parallel Shellsort

This is a tale of two sorts: Insertion Sort and Shellsort. These aren't as different as Dickensian London and Paris; they're more like Minneapolis and St. Paul. Shellsort is a twist on the simple Insertion Sort. I'll start with a review of Insertion Sort and then take a look at the Shellsort variation and how I would make it parallel.

Insertion Sort is such a simple algorithm to remember and code that I've always used it as my "go to" sort whenever I needed a quick and dirty sort algorithm. I imagine the algorithm as picking up playing cards that are being dealt out for a game. The first card that is dealt will be in sorted order by itself. For each new card that is dealt, I pick up the new card, and scan the current cards in my hand from one side to the other until the place that the new card fits into the sequence is found. If I'm playing "Go Fish" I like to keep all the cards with the same value together and in sorted order by value (it's easier to tell that I have any threes if I know they will be between the twos and fours); if I'm playing bridge, I keep cards of the same suit together and then rank the cards by face value within each suit (rank is a second key for sorting).

Implementation of Insertion Sort starts with an unsorted list of items, which would be like having all the cards dealt out before picking them up to be sorted. This is typically an array, but it can be done with a linked list of items. At any time during the sort, the list of items is thought to be divided into two parts: the sorted portion and the remaining unsorted items. To insert a new item, the next unsorted item is chosen, and the sorted list is scanned from biggest index toward the smallest. Items larger than the item to be inserted are moved down one place in the array. When the first item from the sorted portion smaller than the item to be inserted is found, the position of the item to be inserted (within the current set of sorted items) has been found and the item is stored in the array. The code for sorting an array of integers with the Insertion Sort algorithm is given below.

void InsertionSort(int N, int *A) { int i, j, v; for (i = 1; i < N; i++) { v = A[i]; j = i; while (A[j-1] > v) { A[j] = A[j-1]; j--; if (j <= 0) break; } A[j] = v; } }

Insertion Sort is an inherently serial algorithm. Items are inserted into the sorted part one at a time. For a concurrent version of Insertion Sort I once imagined starting in the middle and working out, giving two fronts that could each be handled by a different thread. As an insertion point is sought, items are moved backward or forward (depending on the thread). The big problem with this idea is when each thread's search for insertion points crosses the search of the other thread (much like Alfred Jingle's distressing of the Pickwickians). What protections of the shared array would be needed to do this correctly? How complex does the code become when dealing with a shifting end boundary to terminate the insertion of the "largest" or "smallest" seen item? On top of all that synchronization and overhead, the algorithm is restricted to using two threads and is not very scalable.

If you want to use Insertion Sort as part of a parallel algorithm, I would recommend that it be used as part of a larger sorting algorithm where that "larger" algorithm is parallel. For example, when the partitions of Quicksort get to be a certain size, rather than continue the recursion or spinning off new sorting tasks on the small partitions, just apply Insertion Sort to quickly finish up the sort. A parallel Mergesort could also use the trick of employing Insertion Sort when partitions get small enough instead of continuing the creation of tasks for smaller and smaller partitions.

And that brings us to the Shellsort algorithm. Shellsort is a version of Insertion Sort that does an "h-sort" across the whole array. That is, the algorithm performs an Insertion Sort in `h`

separate, interleaved partitions at the same time. After each pass with a given `h`

, the value of `h`

is reduced and another pass over the list is done to `h`

-sort the list. When the value of `h`

reaches 1, a simple Insertion Sort of the whole list is done.

That's a pretty brief description and it might be a bit abstract if you've not seen this algorithm or if you haven't thought about it for a while. Let me attempt to clarify with a card game example. Start by picking up all the cards that have been dealt to you rather than putting them into your hand as they arrive in front of you. (If you have a slow dealer, you could freshen your drink while all the cards are being distributed.) For purposes of this explanation I will number the cards from left to right starting with "1" and use the value `h == 3`

, which means I have three interleaved partitions.

The three leftmost cards in your hand are the first cards in their respective partitions and are sorted relative to the partition. Starting with card 4, compare it to card 1; if the two cards are out of order relative to each other, swap them. Do the same with cards 5 and 2, and then with cards 6 and 3. Card 7 is compared to the card in position 4 and swapped if out of order; then compared to card 1. This goes on until you have placed the rightmost card into its sorted position within its partition. Essentially you are executing three Insertion Sorts on the partitions defined by every third card, but doing it by cycling through partitions when a new unsorted card is considered. Once the rightmost card has been put into the sorted position within its partition, the value of `h`

is decremented and the process is repeated with the new `h`

value and the new set of `h`

partitions. From a value of 3, the next value of `h`

should be 1, which is simple Insertion Sort on the entire hand.

The advantage of Shellsort over Insertion Sort is that items are moved closer to their final position with fewer exchanges of items. While Insertion Sort exchanges adjacent items, Shellsort exchanges "adjacent" items that are actually `h`

items apart. If the item with the lowest key value within an array of `N`

items is located at the opposite end of the array, it will take at most `N/h`

exchanges to place this element within the first `h`

locations in the array.

The computational complexity of Shellsort is not well understood and relies heavily on the sequence of decreasing values for `h`

. It does seem to be able to perform better than some other sorts `(O(n`

for Insertion and Selection sorts). In ^{2})*Algorithms* (Addison-Wesley, 1983) by Robert Sedgewick, it is noted that the sequence..., 1093, 364, 121, 40, 13, 4, 1 has been shown to provide good speed. This sequence is built into the code examples used in this post.

Serial code for this method of Shellsort is given here:

void Shellsort(int N, int *A) { int i, j, h, v; h = 1; while (h < N) h = 3*h + 1; h /= 3; while (h != 1) { h /= 3; for (i = h; i < N; i++) { v = A[i]; j = i; while (A[j-h] > v) { A[j] = A[j-h]; j = j – h; if (j <= h) break; } A[j] = v; } h /= 3; } InsertionSort(N,A); // clean up }