Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Teaching C++ Badly — How To Sort Slowly

May 25, 2011

This post continues the discussion that I started in April about how not to teach C++. It is inspired by code like the following:

// Code like this should <i>never</i> appear in a textbook
for (int i = 0; i < arraysize – 1; ++i) {
     for (int j = 0; j < arraysize – 1; ++j) {
         if (array[j] > array[j+1]) {
             temp = array[j];
             array[j] = array[j+1];
             array[j+1] = temp;
         }
     }
}

I adapted this code from a well-known textbook. I have changed the names of the variables and reformatted it so that you can't figure out what textbook it is, because I don't want to embarrass the author or authors. What I do want to do is explain the comment that appears before the code — why it is such a bad idea to put code like this in a textbook. Ever.

This code is a kind of bubble sort, so called because the small elements float toward the beginning of the array being sorted, and the large elements sink to the end. It works by repeatedly comparing pairs of adjacent elements and swapping them if they are out of sequence. It is one of the simplest sorting algorithms, so textbooks often use it to teach beginners about algorithms.

Sorting, in turn, is one of the most important algorithms in all of computer science. I once saw an estimate that a quarter of all computer time used worldwide went into sorting. (This claim preceded the widespread use of computer graphics; I would guess that today the comparable task is moving bits from one part of a display to another.) So why not show students how to sort data as a fundamental part of their education?

The first argument against bubble sorting is that it is slow. This claim may come as a surprise to those of my readers who know that I think human time is more important than computer time. However, quadratic algorithms are so wretchedly slow that they often outweigh the importance of human time.

If you have a 10-element array, and increase it to a 10,000-element array, a bubble sort will take not 1,000 times as long to run, but 1,000,000 times as long. Bubble sorting is so slow that tasks that ought to be commonplace, such as sorting the words in a dictionary, are too slow to contemplate, even on modern computers. Moreover, making the computer faster makes the problem worse, because more speed is usually accompanied by more memory. If you replace your computer with one that runs twice as fast and has twice as much memory, sorting the biggest array that will fit in memory takes twice as long!

The usual argument for using bubble sort is that it is an easy algorithm to understand. I don't buy it, because my experience is that simplified though it might be, bubble sort is still too complicated for most beginners. If you disagree with me, your homework assignment is to look at the code above and determine whether it is actually correct — and then to explain, in terms beginners can understand, how you convinced yourself that the code is correct.

But even if you buy the argument that bubble sort is easy enough for beginners to understand, what you're essentially saying is that it is a good thing to teach beginners how to write a bad solution to a problem that the standard library solves well:

std::sort(array, array+arraysize);

This call is not only much easier to understand than my original example, but it also runs much faster. Moreover, using it also opens the opportunity to explain what I think is an even better alternative:

std::stable_sort(array, array+arraysize);

I would much rather spend time explaining to students what a stable sort is, and why they should use stable_sort instead of sort unless they have a good reason to do otherwise, than to explain to them how to do their own sorting — in ways they barely understand — by writing code that is too slow for any serious use.

For that matter, comparing sort with stable_sort gives the opportunity to discuss an important engineering trade-off with security implications. The standard requirements for sort are intended to allow implementations to use Tony Hoare's classic Quicksort algorithm , which usually runs in time proportional to n log(n). However, it is possible, if the input to Quicksort happens to be particularly unfortunately ordered, for Quicksort to run in quadratic time.

What is worse is that in 1999, Doug McIlroy discovered a simple way to reverse-engineer an implementation of Quicksort by giving it input that causes it to run in quadratic time. The implication of this discovery is that programs that use Quicksort are vulnerable to what is essentially a denial-of-service attack if it is possible for a malicious user to control the details of what is being sorted.

The standard requirements on stable_sort are intended to rule out Quicksort's occasional quadratic behavior, so that programs that use stable_sort will not run horrendously slowly, even in the face of maliciously chosen input.

Of course, many programs do not need to worry about sorting maliciously chosen data, and the risk of chance quadratic behavior is often acceptable. Be that as it may, I would much rather spend classroom time discussing these engineering tradeoffs than explaining to students how to write code in a style that I would never want them to use in practice.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video