Blogs

Clay Breshears

See more from this author

Bubble, BubbleSort, and Trouble

Bubblesort was the first sorting algorithm that I ever learned (and I programmed it in COBOL). It is easy to code and simple to understand.


The earth hath bubbles as the water has,
And these are of them.
Macbeth, Macbeth ACT I Scene 3

Bubblesort was the first sorting algorithm that I ever learned (and I programmed it in COBOL). It is easy to code and simple to understand. Here's the algorithm to sort an array of integers in serial.

void BubbleSort(int N, int *A)
{
  int i, j;
  int temp;
  for (i = N-1; i > 0; --i) {
    for (j = 0; j < i; ++j) {
      if (A[j] > A[j+1]) {
        temp = A[j]; A[j] = A[j+1]; A[j+1] = temp;
      }
    }
  }
}

If we assume that we are sorting to a monotonically increasing sequence of keys, those elements with low value keys will "bubble up" to the lower index elements of the array toward their final sorted position. For each iteration of the outer loop the element with the largest key value, not already at its final position, will "drop down" to its final sorted position. From the code you will notice that the inner loop passes through a decreasing set of elements from the array until there are only two elements left to consider. After verifying these are in the proper order, the sort will be complete. The total number of comparisons executed in the serial algorithm is the sum of the numbers n to 1. If you remember solving recurrence relations or the story of Gauss summing numbers, you will realize that Bubblesort is an O(n2) algorithm.

An obvious optimization to this "brute force" version is to keep track of the number of exchanges done during each outer loop iteration. If no exchanges are done, then the array is sorted and there is no need for additional passes.

I can approach Bubblesort as a task decomposition, where a task is one full and complete pass through the data; i.e., a single iteration of the outer loop. However, if I want to have any kind of parallel execution, coordinating that full pass through the array by a single thread within a pool of threads will require some synchronization. Obviously, the iterations will be executed in the same order as they would be for the serial algorithm, but different threads cannot be making the comparison on the same element of the array at the same time. Thus, the start of each iteration will be controlled and the thread assigned to some iteration will only start after the previous thread has progressed some number of elements into the array. Using this delayed start method of assigning threads I hope to preserve the execution order of the serial algorithm, but still execute with multiple threads without data races.

The general algorithmic structure that I've just described is known as a wavefront approach. Like waves washing onto a beach, the threads sweep through the data one after the other. As long as one thread doesn't catch up to another, data races are avoided. You may be thinking that waves in the ocean don't "catch up" with the wave ahead of it, so how can independent threads catch up to one another? To illustrate the potential problem with threads running in a wavefront formation, let us imagine a long, long shelf of books that need to be sorted. This shelf is on the Scottish moors and we charge Banquo and Duncan (characters from William Shakespeare's Scottish Play) with the task of sorting those books using the Bubblesort algorithm. (I wanted to keep up the nautical theme started with the waves on the beach analogy, but it was too much of a stretch.) Banquo starts walking from one end of the shelf comparing adjacent pairs of books and interchanges any two books that are out of order relative to each other. After some time, Duncan starts at the same point following Banquo's path doing the same book comparison and exchange. When Banquo or Duncan finish they run back to starting point of the shelf and repeat the process until they both do no exchanges throughout one of their passes along the shelf.





Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

DrDobbs encourages readers to engage in spirited, healthy debate, including taking us to task. However, DrDobbs moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. DrDobbs further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Features

  • Booting an Intel Architecture System, Part II: Advanced Initialization

    Once the processor is running and memory has been initialized, timers and devices must be started up and a memory map laid out. Only then, can the OS be loaded.

  • Booting an Intel Architecture System, Part I: Early Initialization

    The boot sequence today is far more complex than it was even a decade ago. Here's a detailed, low-level, step-by-step walkthrough of the boot up.

  • How Data Dependence Affects Performance

    Data dependence between statements is a straightjacket on the compiler's ability to optimize code for parallelism. So to get the maximum benefit from parallel code, data dependence must be carefully managed

  • The Intel Threading Building Blocks Flow Graph

    User feedback inspired the flow graph feature in Intel Threading Building Blocks, which allows programmers to express static and dynamic dependency graphs, as well as reactive or event-based graphs.

More Features >>

Go Parallel Video

Go Parallel Twitter Feed

More Twitter >>

Our Facebook Community