STL Algorithms vs. Hand-Written Loops

By Scott Meyers, October 01, 2001

Itching to optimize? Not so fast! Scott presents yet more motivation to trust the library more than yourself.

October 2001/Article title

[This article is based on a book. S. Meyers, Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library, Adapted from item 43. © 2001 Addison-Wesley. Reprinted permission of Pearson Education, Inc.]

Every algorithm takes at least one pair of iterators that specify a range of objects over which to do something. min_element finds the smallest value in the range, for example, while accumulate summarizes some information about the range as a whole and partition separates all the elements of a range into those that do and do not satisfy some criterion. For algorithms to do their work, they must examine every object in the range(s) they are passed, and they do this in the way you’d expect: they loop from the beginning of the range(s) to the end. Some algorithms, such as find and find_if, may return before they complete the traversal, but even these algorithms internally contain a loop. After all, even find and find_if must look at every element of a range before they can conclude that what they are looking for is not present.

Internally, then, algorithms are loops. Furthermore, the breadth of STL algorithms means that many tasks you might naturally code as loops could also be written using algorithms. For example, if you have a Widget class that supports redrawing,
class Widget {
public:
 ...
 void redraw() const;
 ...
};
and you’d like to redraw all the Widgets in a list, you could do it with a loop, like this,
list<Widget> lw;
...
for (list<Widget>::iterator i = 
 lw.begin();
 i != lw.end(); ++i) {
 i->redraw();
}
but you could also do it with the for_each algorithm:
for_each(lw.begin(), lw.end(),
 mem_fun_ref(&Widget::redraw));
For many C++ programmers, writing the loop is more natural than calling the algorithm, and reading the loop is more comfortable than making sense of mem_fun_ref and the taking of Widget::redraw’s address. Yet this article argues that the algorithm call is preferable. In fact, this article argues that calling an algorithm is usually preferable to any hand-written loop. Why?

There are three reasons:

Efficiency: Algorithms are often more efficient than the loops programmers produce.

Correctness: Writing loops is more subject to errors than calling algorithms.

Maintainability: Algorithm calls often yield code that is clearer and more straightforward than the corresponding explicit loops.

The remainder of this article lays out the case for algorithms.

From an efficiency perspective, algorithms can beat explicit loops in three ways, two major, one minor. The minor way involves the elimination of redundant computations. Look again at the loop we just saw:
for (list<Widget>::iterator i = 
 lw.begin();
 i != lw.end();
 ++i) {
 i->redraw();
}
I’ve highlighted the loop termination test to emphasize that each time around the loop, i will be checked against lw.end(). That means that each time around the loop, the function list::end will be invoked. But we don’t need to call end more than once, because we’re not modifying the list. A single call to end would suffice, and, if we look again at the algorithm invocation, we’ll see that that’s exactly how many times end is evaluated:
// this call evaluates lw.end() exactly
// once
for_each(lw.begin(), lw.end(),
 mem_fun_ref(&Widget::redraw));
To be fair, STL implementers understand that begin and end (and similar functions, such as size) are used frequently, so they’re likely to design them for maximal efficiency. They’ll almost certainly inline them and strive to code them so that most compilers will be able to avoid repeated computations by hoisting their results out of loops like the one above. Experience shows that implementers don’t always succeed, however, and when they don’t, the avoidance of repeated computations is enough to give the algorithm a performance edge over the hand-written loop.

But that’s the minor efficiency argument. The first major argument is that library implementers can take advantage of their knowledge of container implementations to optimize traversals in a way that no library user ever could. For example, the objects in a deque are typically stored (internally) in one or more fixed-size arrays. Pointer-based traversals of these arrays are faster than iterator-based traversals, but only library implementers can use pointer-based traversals, because only they know the size of the internal arrays and how to move from one array to the next. Some STLs contain algorithm implementations that take their deque’s internal data structures into account, and such implementations have been known to clock in at more than 20 percent faster than the “normal” implementations of the algorithms.

The point is not that STL implementations are optimized for deques (or any other specific container type), but that implementers know more about their implementations than you do, and they can take advantage of this knowledge in algorithm implementations. If you shun algorithm calls in favor of your own loops, you forgo the opportunity to benefit from any implementation-specific optimizations they may have provided.

The second major efficiency argument is that all but the most trivial STL algorithms use computer science algorithms that are more sophisticated — sometimes much more sophisticated — than anything the average C++ programmer will be able to come up with. It’s next to impossible to beat sort or its kin (e.g., stable_sort, nth_element, etc.); the search algorithms for sorted ranges (e.g., binary_search, lower_bound, etc.) are equally good; and even such mundane tasks as eliminating objects from vectors, deques, and arrays are more efficiently accomplished using the erase-remove idiom than the loops most programmers come up with.

If the efficiency argument for algorithms doesn’t persuade you, perhaps you’re more amenable to a plea based on correctness. One of the trickier things about writing your own loops is making sure you use only iterators that (a) are valid and (b) point where you want them to. For example, suppose you have an array, and you’d like to take each array element, add 41 to it, then insert it into the front of a deque. Writing your own loop, you might come up with this:
// C API: this function takes a pointer
// to an array of at most arraySize
// doubles and writes data to it. It
// returns the number of doubles written.
size_t fillArray(double *pArray, size_t arraySize);

// create local array of max possible size
double data[maxNumDoubles];

// create deque, put data into it
deque<double> d;
...

// get array data from API
size_t numDoubles =
 fillArray(data, maxNumDoubles);

// for each i in data, insert data[i]+41
// at the front of d; this code has a bug!
for (size_t i = 0; i < numDoubles; ++i) {
 d.insert(d.begin(), data[i] + 41);
}
This works, as long as you’re happy with a result where the newly inserted elements are in the reverse order of the corresponding elements in data. Because each insertion location is d.begin(), the last element inserted will go at the front of the deque!

If that’s not what you wanted (and admit it, it’s not), you might think to fix it like this:
// remember d’s begin iterator
deque<double>::iterator insertLocation = d.begin();

// insert data[i]+41 at insertLocation, then
// increment insertLocation; this code is also buggy!
for (size_t i = 0; i < numDoubles; ++i) {
 d.insert(insertLocation++, data[i] + 41);
}
This looks like a double win, because it not only increments the iterator specifying the insertion position, it also eliminates the need to call begin each time around the loop; that eliminates the minor efficiency hit we discussed earlier. Alas, this approach runs into a different problem: it yields undefined results. Each time deque::insert is called, it invalidates all iterators into the deque, and that includes insertLocation. After the first call to insert, insertLocation is invalidated, and subsequent loop iterations are allowed to head straight to looneyland.

Once you puzzle this out, you might come up with the following:
deque<double>::iterator insertLocation =
 d.begin();

// update insertLocation each time
// insert is called to keep the iterator valid,
// then increment it
for (size_t i = 0; i < numDoubles; ++i) {
 insertLocation =
 d.insert(insertLocation, data[i] + 41);
 ++insertLocation;
}
This code finally does what you want, but think about how much work it took to get here! Compare that to the following call to transform:
// copy all elements from data to the
// front of d, adding 41 to each
transform(data, data + numDoubles,
 inserter(d, d.begin()),
 bind2nd(plus<int>(), 41));
The “bind2nd(plus<int>(), 41)” might take you a couple of minutes to get right (especially if you don’t use STL’s binders very often), but the only iterator-related worries you have are specifying the beginning and end of the source range (which was never a problem) and being sure to use inserter as the beginning of the destination range. In practice, figuring out the correct initial iterators for source and destination ranges is usually easy, or at least a lot easier than making sure the body of a loop doesn’t inadvertently invalidate an iterator you need to keep using.

This example is representative of a broad class of loops that are difficult to write correctly, because you have to be on constant alert for iterators that are incorrectly manipulated or are invalidated before you’re done using them. Given that using invalidated iterators leads to undefined behavior, and given that undefined behavior has a nasty habit of failing to show itself during development and testing, why run the risk if you don’t have to? Turn the iterators over to the algorithms, and let them worry about the vagaries of iterator manipulation.

I’ve explained why algorithms can be more efficient than hand-written loops, and I’ve described why such loops must navigate a thicket of iterator-related difficulties that algorithms avoid. With luck, you are now an algorithm believer. Yet luck is fickle, and I’d prefer a more secure conviction before I rest my case. Let us therefore move on to the issue of code clarity. In the long run, the best software is the clearest software, the software that is easiest to understand, the software that can most readily be enhanced, maintained, and molded to fit new circumstances. The familiarity of loops notwithstanding, algorithms have an advantage in this long-term competition.

The key to their edge is the power of a known vocabulary. There are some 70 algorithm names in the STL — a total of over 100 different function templates, once overloading is taken into account. Each of those algorithms carries out some well-defined task, and it is reasonable to expect professional C++ programmers to know (or be able to look up) what each does. Thus, when a programmer sees a transform call, that programmer recognizes that some function is being applied to every object in a range, and the results of those calls are being written somewhere. When the programmer sees a call to replace_if, he or she knows that all the objects in a range that satisfy some predicate are being modified. When the programmer comes across an invocation of partition, she or he understands that the objects in a range are being moved around so that all the objects satisfying a predicate are grouped together. The names of STL algorithms convey a lot of semantic information, and that makes them clearer than any random loop can hope to be.

When you see a for, while, or do, all you know is that some kind of loop is coming up. To acquire even the faintest idea of what that loop does, you have to examine it. Not so with algorithms. Once you see a call to an algorithm, the name alone sketches the outline of what it does. To understand exactly what will happen, of course, you must inspect the arguments being passed to the algorithm, but that’s often less work than trying to divine the intent of a general looping construct.

Simply put, algorithm names suggest what they do. “for,” “while,” and “do” don’t. In fact, this is true of any component of the Standard C or C++ library. Without doubt, you could write your own implementations of strlen, memset, or bsearch, if you wanted to, but you don’t. Why not? Because (1) somebody has already written them, so there’s no point in your doing it again; (2) the names are standard, so everybody knows what they do; and (3) you suspect that your library implementer knows some efficiency tricks you don’t know, and you’re unwilling to give up the possible optimizations a skilled library implementer might provide. Just as you don’t write your own versions of strlen et al., it makes no sense to write loops that duplicate functionality already present in STL algorithms.

I wish that were the end of the story, because I think it’s a strong finish. Alas, this is a tale that refuses to go gentle into that good night. Algorithm names are more meaningful than bare loops, it’s true, but specifying what to do during an iteration can be clearer using a loop than using an algorithm. For example, suppose you’d like to identify the first element in a vector whose value is greater than some x and less than some y. Here’s how you could do it using a loop:
vector<int> v;
int x, y;
...
// iterate from v.begin() until an
// appropriate value is found or
// v.end() is reached
vector<int>::iterator i = v.begin();
for( ; i != v.end(); ++i) {
 if (*i > x && *i < y) break;
}
// i now points to the value
// or is the same as v.end()
It is possible to pass this same logic to find_if, but it requires that you use a nonstandard function object adapter like SGI’s compose2 [1]:
// find the first value val where the
// "and" of val > x and val < y is true
vector<int> iterator i =
 find_if(v.begin(), v.end(),
 compose2(logical_and<bool>(),
 bind2nd(greater<int>(), x),
 bind2nd(less<int>(), y)));
Even if this didn’t use nonstandard components, many programmers would object that it’s nowhere near as clear as the loop, and I have to admit to being sympathetic to that view.

The find_if call can be made less imposing by moving the test logic into a separate functor class (i.e., a class declaring an operator() member function):
template<typename T>
class BetweenValues:
 public std::unary_function<T, bool> {
public:
 // have the ctor save the
 // values to be between
 BetweenValues(const T& lowValue,
 const T& highValue) 
 : lowVal(lowValue), highVal(highValue)
 {}
 // return whether val is
 // between the saved values
 bool operator()(const T& val) const
 {
 return val > lowVal && val < highVal;
 }
private:
 T lowVal;
 T highVal;
};
...
vector<int> iterator i =
 find_if(v.begin(), v.end(),
 BetweenValues<int>(x, y));
But this has its own drawbacks. First, creating the BetweenValues template is a lot more work than writing the loop body. Just count the lines. Loop body: one; BetweenValues template: twenty-four. Not a very good ratio. Second, the details of what find_if is looking for are now physically separate from the call. To really understand the call to find_if, one must look up the definition of BetweenValues, but BetweenValues must be defined outside the function containing the call to find_if. If you try to declare BetweenValues inside the function containing the call to find_if, like this,
// beginning of function
{
 ...
 template <typename T>
 class BetweenValues:
 public std::unary_function<T, bool> { ... };
 vector<int>::iterator i =
 find_if(v.begin(), v.end(),
 BetweenValues<int>(x, y));
 ...
}
// end of function
you’ll discover that it won’t compile, because templates can’t be declared inside functions. If you try to avoid that restriction by making BetweenValues a class instead of a template,
// beginning of function
{
 ...
 class BetweenValues:
 public std::unary_function<int, bool> { ... };
 vector<int> iterator i =
 find_if(v.begin(), v.end(),
 BetweenValues(x, y));
 ...
}
// end of function
you’ll find that you’re still out of luck, because classes defined inside functions are known as local classes, and local class types can’t be bound to template type arguments (such as the type of the functor taken by find_if). Sad as it may seem, functor classes and functor class templates are not allowed to be defined inside functions, no matter how convenient it would be to be able to do it.

In the ongoing tussle between algorithm calls and hand-written loops, the bottom line on code clarity is that it all depends on what you need to do inside the loop. If you need to do something an algorithm already does, or if you need to do something very similar to what an algorithm does, the algorithm call is clearer. If you need a loop that does something fairly simple, but would require a confusing tangle of binders and adapters or would require a separate functor class if you were to use an algorithm, you’re probably better off just writing the loop. Finally, if you need to do something fairly long and complex inside the loop, the scales tilt back toward algorithms, because long, complex computations should generally be moved into separate functions, anyway. Once you’ve moved the loop body into a separate function, you can almost certainly find a way to pass that function to an algorithm (often for_each) such that the resulting code is direct and straightforward.

If you agree with this Item that algorithm calls are generally preferable to hand-written loops, and if you also agree that range member functions are preferable to loops that iteratively invoke single-element member functions [2], an interesting conclusion emerges: well-crafted C++ programs using the STL contain far fewer loops than equivalent programs not using the STL. This is a good thing. Any time we can replace low-level words like for, while, and do with higher-level terms like insert, find, and for_each, we raise the level of abstraction in our software and thereby make it easier to write, document, enhance, and maintain.

Notes and References

[1] To learn more about compose2, consult the SGI STL website (<http://www.sgi.com/tech/stl/>) or Matt Austern’s book, Generic Programming and the STL (Addison-Wesley, 1999).

[2] Range member functions are container member functions such as insert, erase, and assign that take two iterators specifying a range to e.g., insert, erase, or assign. A single call to a range member is generally much more efficient than a hand-written loop that does the same thing. For details, consult Item 5 of Effective STL.

Scott Meyers is one of the world’s foremost authorities on C++; Effective STL is his third C++ book. He has a Ph.D. in Computer Science from Brown University, sits on the technical advisory boards of several companies, and provides training and consulting services to clients worldwide. His website is <www.aristeia.com>.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

STL Algorithms vs. Hand-Written Loops

Notes and References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

STL Algorithms vs. Hand-Written Loops

Notes and References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content