The Standard Librarian: Sorting in the Standard Library

By Matthew H. Austern, August 01, 2001

The Standard C++ library provides a half dozen different tools for sorting, and knowing when to use which one is an essential part of knowing how to use the library properly. In some cases, the choice of sort function can make an enormous difference in performance.

August 2001 C++ Experts Forum/The Standard Librarian

Whenever you have a sequence of values, one of the operations you'll most often want to perform is sorting. Sorting data makes it easier for humans to understand, and sorting is the first step in any number of algorithms — even such humble algorithms as calculating the sum of a list of numbers. Every programming system provides some form of sorting; the Standard C++ library provides six! (Or perhaps more, depending on how you count them.) How do they differ, and when should you use one instead of another?

Sorting with Algorithms

Clause 25 of the C++ Standard has a section called "Sorting and related operations." It includes many operations on sorted ranges, and three generic algorithms for sorting: sort, stable_sort, and partial_sort.

The first two, sort and stable_sort, have essentially the same interface: as usual with STL algorithms, you pass two Random Access Iterators that define the range to be sorted. You can also, optionally, provide a third parameter that defines how to compare elements: the third parameter is a function object that takes two arguments, x and y, and returns true if x should come before y. So, for example, if v is a vector of int:

std::sort(v.begin(), v.end());

sorts it in ascending order. To sort it in descending order instead, you need to supply a different comparison method:

std::sort(v.begin(), v.end(),
          std::greater<int>());

Note that we're supplying greater as the third argument, not greater_equal. This is important, and it holds for all of the STL sorting algorithms: the comparison function must return false if the two arguments are the same. To some extent, this is arbitrary: it's possible to express a sorting algorithm in terms of a comparison function that looks like < or in terms of a comparison function that looks like <=. It's necessary to make a definite choice, though, and to stick with it consistently. The Standard C++ library chose the former. If you pass sort a function object that returns true for equal arguments, you'll get unpredictable and implementation-dependent results; on some systems, it will appear to work, and on others it could cause an infinite loop or a memory protection fault.

For simple cases, you won't see much difference if you use stable_sort instead of sort. Like sort, stable_sort sorts a range [first, last) into ascending order by default, and again you can supply a comparison function to change the ordering. If you read the C++ Standard, you'll see two differences between sort and stable_sort:

As the name suggests, stable_sort is stable. If two elements compare equal, stable_sort won't change their relative order.
The performance guarantees are different.

The first point, stability, is straightforward. It's usually irrelevant for ints (one "42" is just as good as another), but sometimes very important for other data types. If you're sorting tasks by priority, for example, you probably want high priority tasks to come before low priority ones, but for tasks to appear in their original order when there's a tie. For sort there's no such guarantee, but for stable_sort there is.

What the Standard says about performance is less straightforward: the guarantees for sort and stable_sort are both complicated, and in neither case do they give the complete picture. The Standard says that sort makes O(N log N) comparisons "on the average," without saying what the worst case is, and that stable_sort makes O(N (log N)²) comparisons, but that "if enough extra memory is available," it, too, is O(N log N).

How does one make sense of all of these different cases?

The performance guarantees were written with specific implementation techniques in mind, and the guarantees make more sense if you know what those techniques are. They permit sort to be written in terms of quicksort (recursively partitioning a range) and stable_sort in terms of merge sort (recursively merging sorted subranges) [1]. Quicksort is one of the fastest known sorting methods, and it is almost always O(N log N), but there are some special sequences for which it is O(N²); if the Standard always required sort to be O(N log N), quicksort would be forbidden. Similarly, merging two subranges is easier if there is an external buffer to copy the results into; merge sort is O(N log N) if it can use an auxiliary buffer as large as the original range, and O(N (log N)²) if it can't obtain any auxiliary buffer. If it can only use a smaller auxiliary buffer, then its complexity is intermediate between the two — but, in realistic cases, closer to O(N log N).

That's an explanation of what's in the Standard, but it's still not complete. The Standard says that "if the worst case behavior is important," you shouldn't use sort. However, that advice is less true than it was when the Standard was first written. Many standard library implementations, including the ones that come with GNU g++ and Microsoft Visual C++, now use a new variation on quicksort, called introsort [2]. Introsort is just as fast as quicksort on average, but its complexity is always O(N log N). Unless you're worried about older standard library implementations, worst case behavior is no longer a reason to avoid sort. And, while stable_sort is (usually) O(N log N) too, that O glosses over many details.

In most cases, stable_sort is much slower than sort, because it has to do more work for each element; that's the price you pay for the guarantee of stability. Use stable_sort when it's important for equivalent elements to keep their original ordering (for example, when you're sorting by priority, but keeping items of equal priority in first-come first-served order), and use sort otherwise.

The other generic sorting algorithm is partial_sort (along with a minor variant, partial_sort_copy.) It has slightly different functionality and slightly different syntax: it finds and sorts the first n elements of a range. As usual the default is to sort in ascending order as determined by the < operator, but you can provide a function object to change that. So, for example, if you're interested only in the top 10 elements of a sequence, you can find them with

partial_sort(first, first+10, last, 
             greater<T>());

Afterwards the largest 10 elements will be contained in [first, first+10) (sorted in descending order), and the rest of the elements will be contained in [first+10, last).

There's an obvious limiting case: if you write partial_sort(first, last, last), then you're asking partial_sort to sort the entire range [first, last). So, while the syntax may be slightly different, you can still use partial_sort the same way you use sort. Is there a reason to? Not really. Looking at what the C++ Standard says about complexity, you'll see that sorting a range with partial_sort is O(N log N), just like sort, but, again, asymptotic complexity is an incomplete description. Two O(N log N) algorithms aren't necessarily the same speed; in this case, sort is usually much faster.

The reason partial_sort exists is for partial sorts. Suppose that you have a list of a million names and you need to find the first thousand, sorted in alphabetical order. You could get those thousand names by sorting the entire list and ignoring all but the first part of it, but that would be grossly wasteful. It would be much faster to use partial_sort or partial_sort_copy:

vector<string> result(1000);
partial_sort_copy(names.begin(), names.end(),
                  result.begin(), result.end());

Use partial_sort when you're only interested in the first part of the sorted range, and use sort when you need to sort all of a range.

As with sort and stable_sort, it's helpful to think about how partial_sort is implemented. The usual implementation, and the one that the C++ Standard authors had in mind, is in terms of heap sort: the input range is rearranged into a data structure called a heap, which is essentially a binary tree flattened into an array in such a way so that it's easy to find the largest element and so that it's easy to remove the largest element and still have a valid heap. If we successively remove elements from a heap, then we'll be left with the smallest n elements: just what we need from partial_sort. If we remove all of the elements from a heap, then we'll be left with a sorted range.

The Standard C++ library includes generic algorithms for manipulating heaps directly, so, instead of using partial_sort, another way to sort a range is by writing:

make_heap(first, last);
sort_heap(first, last);

This looks different from

partial_sort(first, last, last);

but it really isn't. In both cases, you're using heap sort; the two forms should have exactly the same speed.

Finally, there's one last "generic" sorting algorithm, inherited from C: qsort. I put "generic" in quotes because qsort really isn't as generic as sort, stable_sort, or partial_sort. You can't use it to sort objects that have constructors, virtual functions, base classes, or private member variables. C doesn't know about those features; qsort assumes it can copy objects bit for bit, which only works with the very simplest of classes. It's also hard to use qsort in templates, since you have to pass it a comparison function that takes arguments of type void*, but that internally knows the exact types of the objects to be sorted.

The C Standard gives no performance guarantees for qsort. In cases where qsort can be used, however, it usually turns out to be much slower than sort. (Largely for the simple reason that sort's interface was designed so that the comparison function could be inlined, and qsort's interface was not.) Except in legacy code, you should avoid qsort; sort has a simpler and safer interface, it's less restrictive, and it's faster.

Sorting with Special Containers

I started by discussing generic algorithms, because the basic principle of the Standard C++ library is to decouple things that don't have to be coupled. Algorithms are separate from containers. There's nothing about sorting in the container requirements; you sort a vector by using an algorithm that's separate from std::vector:

sort(v.begin(), v.end());

Nevertheless, any complete discussion of sorting in C++ must include containers. Containers in general provide no sorting methods, but some special containers do.

You can't sort a vector by writing v.sort(), since std::vector has no such member function, but you can sort a list by writing l.sort(). As usual, you can also explicitly provide a different comparison function: if l is a list of ints, then l.sort(greater<int>()) will sort it in descending order.

In fact, list::sort is the only easy way to sort a list: std::list only provides Bidirectional Iterators, but the standalone sorting algorithms, sort, stable_sort, and partial_sort, require the more powerful Random Access Iterators [3]. The special member function list::sort doesn't use iterators, but instead exploits the fact that lists are implemented in terms of linked nodes. It uses a variation of merge sort that works by linking and unlinking nodes, rather than by copying list elements.

Finally, sorting a set (or a multiset, if you need to have duplicate elements) is easiest of all: it starts out sorted! You can't write sort(s.begin(), s.end()), but you also never need to. A fundamental property of a set is that its elements are arranged in order. Whenever you insert a new element, it automatically gets put in the appropriate place to maintain the ordering. Internally, set uses a data structure (usually some kind of balanced tree) that enables fast (O(log N)) lookup and fast insertion.

Timing Tests

Where does this leave us? I've made some assertions about timing, and we can make some more intuitive observations: that inserting elements into a set should be slower than sorting a vector, for example, because set is a complicated data structure that uses dynamic memory allocation, or that sorting a list should be roughly as fast as using stable_sort, because both are versions of merge sort. However, intuition is no substitute for timing tests. Measurements are difficult (exactly what do you measure, and how?), but they are essential.

Listing 1 is a program that constructs a vector<double>, puts it in random order, and then successively sorts it using each of the methods we've discussed (except for qsort). We deliberately use call-by-value when passing the vector to each timing test: we don't want any of the tests to see a vector that has already been sorted!

Compiling the program with Microsoft Visual C++ 7 beta (results with g++ are similar), and running it on a 500 MHz Pentium III, gives the following results:

Sorting a vector of 700000 doubles.
 sorting method t (sec)
 sort    0.971
 qsort    1.732
 stable_sort    1.402
 heap sort    1.282
 list sort    1.993
 set sort    3.194

This is as expected: sort is the fastest; stable_sort, heap sort, and qsort are considerably slower; and sorting with a list or a set, which use dynamic memory allocation and complicated data structures, is slower still. (Actually, this is an unusually good showing for qsort: on g++, and on older versions of VC++, qsort is even slower compared to sort.)

But this doesn't deserve to be called a sorting benchmark — that's too bold a claim. I tested sorting a vector<double>, on one particular system, using a specific (random) initial ordering. Only experiment can determine how far these results may be generalized. At the least, we should try sorting something other than doubles.

Listing 2 sorts a vector<string>: it opens a file and copies each line into a separate string. I'm no longer including qsort in this test, because qsort cannot handle elements of types with constructors. With Project Gutenberg's free e-text of Anna Karenina [4] as input, these are the results:

Sorting a file of 42731 lines.
    sorting method t (sec)
    sort    0.431
   stable_sort    1.322
   heap sort    0.751
   list sort    0.25
   set sort    0.43

Suddenly, things have changed. We still see that sort is much faster than stable_sort or heap sort, but the results for list and set have changed dramatically. Using a set is just about the same speed as using sort, and list is substantially faster. What happened?

What's changed is that double is a primitive type, and std::string is a complicated class. Copying a string, or assigning one string to another, means invoking a function — it might even mean allocating dynamic memory or creating a mutex lock. The tradeoffs have changed; the number of assignments matters much more when you're sorting strings than when you're sorting doubles. Sorting with a list involves no assignments at all: the whole reason for defining a special list::sort member function is that it works by manipulating pointers to internal list nodes. Relinking a few list node pointers is faster than copying a string.

We've rediscovered an old piece of conventional wisdom: if you're dealing with large records, you never want to sort the records themselves, but just pointers to them. Using a list makes this automatic, but you can just as easily do it explicitly: instead of sorting the original vector<string>, create an index vector each of whose elements is a vector<string>::const_iterator, and sort the index vector. You'll have to tell sort how to compare the elements of the index vector (you have to make sure to compare the pointed-to values, not the iterators themselves), but that's only a minor nuisance. We just need to provide an appropriate comparison function object:

template <class Iterator>
struct indirect_lt {
  bool operator()(Iterator x, Iterator y) const 
    { return *x < *y; }
};

Listing 3 shows how to use indirect_lt and compares the speed of direct and indirect sorting. The speedup is substantial.

Sorting a file of 42731 lines.
  sorting method t (sec)
  indirect sort    0.29
  sort    0.401

Advice

The Standard C++ library provides six sorting methods: qsort, sort, stable_sort, partial_sort, list::sort, and set/multiset.

You shouldn't use qsort in new code. It's slower than sort. Its interface is ugly and, because it requires casting, error-prone. Writing a comparison function to pass to qsort can be nontrivial, especially in generic code. You can't use qsort to sort objects with constructors or virtual member functions, or to sort any data structure other than a C array. Although qsort isn't formally deprecated, its only real use is backward compatibility with C.

Of the other five sorting methods, the first three are generic algorithms and the last two exploit special features of certain containers. Each of these methods compares objects with operator< by default, but allows you to specify your own comparison function if necessary. Each provides special features:

sort	Generally the fastest.
stable_sort	Guarantees stability in ordering of equivalent elements.
partial_sort	Allows sorting only the first N elements.
list::sort	Manipulates pointers, instead of copying elements.
set	Allows fast insertion into, and deletion from, a sorted range.

If you don't need any of the special features of any of the other four methods, your first choice should usually be sort.

Notes

[1] For more details on quicksort and merge sort, and other sorting algorithms, see D. E. Knuth, The Art of Computer Programming, vol. 2, 1998.

[2] D. R. Musser. "Introspective sorting and selection algorithms," Software Practice and Experience, 27(8):983, 1997.

[3] This restriction is partly artificial: it is possible to implement quicksort and merge sort in terms of Forward Iterators, albeit at the cost of some complexity. Artificial or not, however, it's what the Standard requires.

[4] See <http://sailor.gutenberg.org/by-author/to2.html>.

About the Author

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee’s library working group. He works at AT&T Labs — Research and can be contacted at [email protected].

1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

The Standard Librarian: Sorting in the Standard Library

Sorting with Algorithms

Sorting with Special Containers

Timing Tests

Advice

Notes

About the Author

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

The Standard Librarian: Sorting in the Standard Library

Sorting with Algorithms

Sorting with Special Containers

Timing Tests

Advice

Notes

About the Author

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content