# Bubble, BubbleSort, and Trouble

*The earth hath bubbles as the water has,
And these are of them.
—*Macbeth,

*Macbeth*ACT I Scene 3

Bubblesort was the first sorting algorithm that I ever learned (and I programmed it in COBOL). It is easy to code and simple to understand. Here's the algorithm to sort an array of integers in serial.

void BubbleSort(int N, int *A) { int i, j; int temp; for (i = N-1; i > 0; --i) { for (j = 0; j < i; ++j) { if (A[j] > A[j+1]) { temp = A[j]; A[j] = A[j+1]; A[j+1] = temp; } } } }

If we assume that we are sorting to a monotonically increasing sequence of keys, those elements with low value keys will "bubble up" to the lower index elements of the array toward their final sorted position. For each iteration of the outer loop the element with the largest key value, not already at its final position, will "drop down" to its final sorted position. From the code you will notice that the inner loop passes through a decreasing set of elements from the array until there are only two elements left to consider. After verifying these are in the proper order, the sort will be complete. The total number of comparisons executed in the serial algorithm is the sum of the numbers `n`

to 1. If you remember solving recurrence relations or the story of Gauss summing numbers, you will realize that Bubblesort is an O(`n`

^{2}) algorithm.

An obvious optimization to this "brute force" version is to keep track of the number of exchanges done during each outer loop iteration. If no exchanges are done, then the array is sorted and there is no need for additional passes.

I can approach Bubblesort as a task decomposition, where a task is one full and complete pass through the data; i.e., a single iteration of the outer loop. However, if I want to have any kind of parallel execution, coordinating that full pass through the array by a single thread within a pool of threads will require some synchronization. Obviously, the iterations will be executed in the same order as they would be for the serial algorithm, but different threads cannot be making the comparison on the same element of the array at the same time. Thus, the start of each iteration will be controlled and the thread assigned to some iteration will only start after the previous thread has progressed some number of elements into the array. Using this delayed start method of assigning threads I hope to preserve the execution order of the serial algorithm, but still execute with multiple threads without data races.

The general algorithmic structure that I've just described is known as a *wavefront* approach. Like waves washing onto a beach, the threads sweep through the data one after the other. As long as one thread doesn't catch up to another, data races are avoided. You may be thinking that waves in the ocean don't "catch up" with the wave ahead of it, so how can independent threads catch up to one another? To illustrate the potential problem with threads running in a wavefront formation, let us imagine a long, long shelf of books that need to be sorted. This shelf is on the Scottish moors and we charge Banquo and Duncan (characters from William Shakespeare's Scottish Play) with the task of sorting those books using the Bubblesort algorithm. (I wanted to keep up the nautical theme started with the waves on the beach analogy, but it was too much of a stretch.) Banquo starts walking from one end of the shelf comparing adjacent pairs of books and interchanges any two books that are out of order relative to each other. After some time, Duncan starts at the same point following Banquo's path doing the same book comparison and exchange. When Banquo or Duncan finish they run back to starting point of the shelf and repeat the process until they both do no exchanges throughout one of their passes along the shelf.