<algorithm>: for_each

In the first in his series on the Algorithms Library of the C++ Standard, Bjorn covers for_each, a terse alternative to writing your own loops.


July 01, 2005
URL:http://www.drdobbs.com/algorithm-foreach/184403895

July, 2005: <algorithm>: for_each

Bjorn is author of Beyond The C++ Standard Library: An Introduction to Boost. He can be contacted at [email protected].


The C++ Standard [1] defines not only the C++ language, but also the C++ Standard Library. In the C++ Standard Library, there are components for strings, containers, algorithms, input/output, and more. Every C++ programmer can benefit from knowing about these components, and to some extent, most do. However, for all those who stop at the doorstep of the library, which typically means that they know only of the most common classes—probably std::string, std::vector, and std::list—there is much more to learn. That is the purpose of this series of articles: to demonstrate and motivate one of the key components of the C++ Standard Library, and advocate the use of it. For the reader, the benefits of using the C++ Standard Library are very real:

For the C++ community, and the software industry relying on programs written in C++, the benefits of having as many programmers as possible use the tools from standard C++ are obvious.

In this series of articles, I will focus on clause 25 of the C++ Standard, which is the Algorithms Library. These algorithms are designed to work with iterators from the standard containers [2] (std::vector, std::map, and so forth) and other sequences. The first part of this series of articles covers the non-modifying sequence operations, the second part talks about the mutating sequence operations, and the third discusses sorting and related operations. These divisions are named after the categorization of algorithms defined in the C++ Standard.

for_each

The topic of this article is an algorithm called for each. Its purpose is to apply a function to every element in a sequence.

Here is how the C++ Standard defines for_each():

25.1.1 For each [lib.alg.foreach]

   template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function f);

1 Effects: Applies f to the result of dereferencing every iterator in the range [first, last), starting from first and proceeding to last - 1.
2 Returns: f.
3 Complexity: Applies f exactly last - first times.
4 Notes: If f returns a result, the result is ignored.
Header: <algorithm>
Category: Non-modifying sequence operations
Namespace: std

The algorithm for_each() expects three arguments: an iterator [3] first to the first element of the sequence, an iterator last to the last element of the sequence, and a function or function object f that the elements in the range are applied to. Note that the element (if such an element exists: it is common to pass a past-the-end iterator for the container) for the last iterator is never dereferenced, nor is f applied to it. for_each() returns the function or function object that is passed to it. Note that the return value is almost always a copy of the Function f because the algorithm accepts its argument by value.

Ranges and Notation

The iterator range [first,last) consists of every iterator in the sequence starting with and including first, and up to but not including last. This is extremely important—the iterator last is not part of the range: it denotes the end of the range. It therefore follows that the last iterator does not need to be dereferenceable.

The notation for ranges originates from mathematics, where it is used to denote a half-open interval. A half-open interval includes the first element but not the last.

For example, suppose that you have a container with elements of type door. Each door needs to be locked using the free functionlock(), which accepts an argument of type reference to door. Rather than resorting to a loop in which the doors are locked, you should be calling for_each(), mainly because it will convey the intention of your code much better than the loop. Here is how:


#include <algorithm>
#include <vector>
// A class that models a door
class door
{
public:
   void lock()
   {
      // Locking code omitted
   }
};

// Free function for locking doors
void lock(door& d)
{
   d.lock();
}

// Start of the program
int main()
{
   typedef std::vector<door> door_container;
   door_container doors;
   // Code for adding doors omitted
   // Use for_each to call lock() on all doors
   std::for_each(doors.begin(),doors.end(),lock);
}

The call to for_each() passes an iterator for the first element of the vector [4] doors as the beginning of the range, a past-the-end iterator for doors as the end of the range, and a pointer to lock() as the function to call. Figure 1 shows a conceptual view of a std::vector<door> with four elements. When applying for_each(), the function or function object is applied to each element, starting with begin.

Basic for_each

In the preceding example, you saw std::for_each() used to call the function lock() for all elements in the container doors. Perhaps you have used other programming languages where there are language constructs for doing the same thing that for_each() doe—for example, both VB.NET and C# provide such facilities.

C++ doesn't, but it's obviously possible to perform the same task using another language feature, namely loop constructs. In C++, and without using for_each(), you would need a loop to do the same thing as was shown in the preceding example. Here's an equivalent implementation using a forloop:

#include <vector>
class door
{
public:
   void lock()
   {
   // Locking code omitted
   }
};

// Free function for locking doors
void lock(door& d)
{
   d.lock();
}
   int main()
{
   typedef std::vector<door> door_container;
   typedef door_container::iterator iterator;
   door_container doors;
   // Code for adding doors omitted
   
   // Use a loop to call lock() on all doors
   iterator end=doors.end();
   for (iterator it=doors.begin();it!=end;++it)
   {
      (*it).lock();
   }
}

The loop at the end of the program is where the work is done: each of the elements in the vector doors is locked by dereferencing the iterator it and invoking the function door::lock(). This works, but it's perhaps not immediately apparent that the loop is to do something with every element in the vector. With the free function lock(), the locking in the loop could also be done thusly:

lock(*it);

In the implementation that utilized for_each(), we were able to make the code much terser than the loop:

std::for_each(doors.begin(),doors.end(),lock);

It is indeed quite succinct, and the meaning of the code should be apparent to anyone reading it. The first two arguments denote the range of iterators. The last argument to for_each() is a unary function or function object (here, the free function lock()). The algorithm will make sure that lock() is called for every door in the range. Look at the two implementations side by side, and I think you'll agree that for_each() is the better choice here:

// Loop version
iterator end=doors.end();
for (iterator it=doors.begin();it!=end;++it)
{
   lock(*it);
}

// for_each version
std::for_each(doors.begin(),doors.end(),lock);

Whenever you have a situation involving applying a function to a range of elements, consider using std::for_each() rather than a loop construct. The only requirement besides having a sequence of elements is a function or function object that accepts an argument that is compatible—either the same type or one that is convertible—with the elements of the sequence. Of course, you don't have to operate on every element in a container: because the pair of iterators passed to for_each() denotes the range to which the function object is to be applied, you are free to create any range that fits your purposes.

Intermediate for_each

Of course, things aren't always as simple as the door example. To begin with, you might not have a free function lock() to call, just the member function door::lock(). It is still possible to use for_each(), but things get a little more involved, because you cannot pass a pointer to member directly to it. Instead, you must use an adaptor from the C++ Standard Library called mem_fun_ref_t [5]. An adaptor is responsible for transforming a type so that it conforms to certain requirements, often an interface. For example, the container adaptor class stack augments a container with the typical operations for a stack: push(), pop(), top(), and so on. The stack adaptor uses the underlying container to implement the stack operations, but adapts the container's interface to that of a stack. The mem_fun_ref_t adaptor makes it possible to call a member function on a reference to the type containing the member. mem_fun_ref_t is a template class, but rather than supplying the correct template argument, you will want to use the helper function mem_fun_ref(), which deduces the template argument and returns a correctly parameterized object of type mem_fun_ref_t. Here's how the code looks when using for_each() and having it call the member function lock():


std::for_each(
doors.begin(),
doors.end(),
std::mem_fun_ref(&door::lock));

This is still readable code, assuming that you're familiar with mem_fun_ref(), and know that the expression &door::lock yields a pointer to member [6] for lock(). However, in this case it would make good sense to provide the free function lock(), just to avoid having to use the mem_fun_ref_t adaptor. What if you need to do something more complex than calling a free function or a member function? Well, you should still consider providing a function, or better yet, a function object, that can be used with for_each(). The potential drawback compared to writing a loop is that the implementation is moved away from the location where it is to be used. A function object is a class that implements the function call operator, which makes it behave like a function. One of the best arguments for using function objects rather than free functions is that function objects can have state. The following code snippet shows the equivalent of the free function lock() implemented as a function object. A word of caution is in order here: although the following example will show a perfectly valid function object, the intended use of the function object invites programmers to write erroneous code. Keep reading to see why.

// Function object for locking doors
class door_locker
{
public:
   // Function call operator
   void operator()(door& d) const
   {
      d.lock();
   }
};

The implementation for a function object definitely involves more boilerplate code than does a free function. However, you would now be able to add functionality to this function object that would be hard to do with a free function. Most notably, it would be easy to add state. Before we go there, take a look at how this function object is used together with for_each():

std::for_each(doors.begin(),doors.end(),door_locker());

Rather than creating a temporary door_locker and pass that to for_each(), you could pass one that you've already created:

door_locker locker;
std::for_each(doors.begin(),doors.end(),locker);

For example, suppose that you are expected to add functionality that counts the number of calls to door::lock(). However, you only need to count the calls when operating on a sequence of doors, just like in the previous example. Rather than making an intrusive change to door, you'd be better served by augmenting door_locker with that functionality. Here's an improved version of door_locker that comes with a lock counter, a function lock_count() for retrieving the current lock count, and a function reset_lock_count() for resetting the counter:


// Function object for locking doors

class door_locker : public std::unary_function<door,void>
{
   int lock_count_;
public:
   door_locker() : lock_count_(0) {}
   // Function call operator
   void operator()(door& d)
   {
      d.lock();
      ++lock_count_;
   }

   // Get the current lock count
   int lock_count() const
   {
      return lock_count_;
   }
   
   // Reset the lock count
   void reset_lock_count()
   {
      lock_count_=0;
   }
};

Note that the function call operator can no longer be const, because it is modifying the door_locker member lock_count_, and that we need to add a default constructor that initializes the counter to 0. Before, the function call operator was declared const, because it did not change the state of door_locker: It only operated on the reference to door passed to it, and in fact, door_locker had no state whatsoever. Also, the door_locker class now inherits from std::unary_function [7]. The reason for this is to provide the function object with two typedefs: argument_type and result_type. These typedefs make the function object adaptable [8], which means that it can be used together with the C++ Standard Library's function objects.

Let's put this new door_locker to the test. The following program intends to print the number of calls to lock() to std::cout:

#include <vector>
#include <functional> // For std::unary_function
#include <algorithm>

// Definition of door omitted

// Definition of door_locker omitted

int main()
{
   typedef std::vector<door>  door_container;
   typedef door_container::iterator iterator;
   door_container doors;

   // Code for adding doors omitted
   door_locker locker;
   std::for_each(
      doors.begin(),
      doors.end(),locker);
   std::cout << "Lock count: " << locker.lock_count() << '\n';
}

The implementation seems reasonable enough, doesn't it? However, I assure you that no matter how many doors exist in the container doors, you will always get a lock count that is 0. This has vexed and perplexed many programmers, but the reason is really quite simple: std::for_each() accepts its function object by value, and will in the preceding example operate on a copy of locker. So, locker will have an initial lock count that is 0, a copy will be passed to for_each(), and yet another copy will be returned from for_each. The returned copy will indeed contain an updated counter. Because for_each() returns a copy of the function object, you could opt to rewrite the code like this in order to retrieve the correct lock count:

door_locker locker;
locker=std::for_each(
  doors.begin(),
  doors.end(),locker);

Alternatively, you could make the whole thing a one-liner, albeit a long one:

std::cout << "Lock count: " <<
   (std::for_each(
     doors.begin(),
     doors.end(),
     door_locker()).lock_count()) << '\n';

This is unnecessarily cluttered and hard to read, however, and I don't recommend it. The fact that a copy of the function object is returned by for_each() is important. You can use the returned function object to access any state that the function object may have. Even if the function object handles copying as poorly as door_locker in the preceding example, you will still be able to retrieve the correct state. Yet another way to get the correct value—using a technique that is best avoided—to explicitly instantiate for_each() with template arguments that state that the function object should be passed by reference, like so:

door_locker locker;
std::for_each< iterator,door_locker&>(
   doors.begin(),
   doors.end(),
   locker);

This produces the correct value in locker, but nobody wants to write code like that, and indeed, they shouldn't have to.

The best way to avoid any potential copying problems is to design function objects that cope with the copying by sharing the implementation details. The most common way to achieve such sharing is by using a reference-counted smart pointer, for example boost::shared_ptr [9,10]. For our door_locker class, we could simply store the lock_counter_ in a shared_ptr, but I prefer to create a nested class with all shared data (at the moment we only have one data member, but reality changes quickly):

// Function object for locking doors
class door_locker : public std::unary_function< door,void>
{
   // Implementation class
   class door_locker_impl
   {
      int lock_count_;
   public:
      door_locker_impl() : lock_count_(0) {}
   // Increase the lock count
   void lock(door& d)
   {
      d.lock();
      ++lock_count_;
   }

   // Get the current lock count
   int lock_count() const
   {
      return lock_count_;
   }

   // Reset the lock count
   void reset_lock_count()
   {
      lock_count_=0;
   }
 };

   // The shared implementation is stored in pimpl_
   boost::shared_ptr< door_locker_impl>  pimpl_;
public:
   door_locker() : pimpl_(new door_locker_impl()) {}

   // Function call operator
   void operator()(door& d)
   {
      pimpl_->lock(d);
   }

   // Get the current lock count
   int lock_count() const
   {
      return pimpl_->lock_count();
   }
   // Reset the lock count
   void reset_lock_count()
   {
      pimpl_->reset_lock_count();
   }
};

As you can see, the implementation is moved to the nested class door_locker_impl [11], and the corresponding member function in door_locker simply forwards the calls. Whenever a door_locker is copied, the reference counter of the shared_ptr is incremented, and the copy will use the same door_locker_impl. This implementation deftly avoids any surprises, and the original code now functions correctly:


#include < vector>
#include < functional>
#include < algorithm>

// Definition of door omitted

// Definition of door_locker omitted

int main()
{
   typedef std::vector< door>  door_container;
   typedef door_container::iterator iterator;
   door_container doors;
   
   // Code for adding doors omitted
   door_locker locker;
   std::for_each(
     doors.begin(),
     doors.end(),locker);
   
   std::cout << "Lock count: " << locker.lock_count() << '\n';
}

The counter in locker will be updated as for_each() executes, and the printed value will reflect the number of calls to door::lock(). It is extremely important to make function objects work correctly when copied, because they almost always will be when you pass them to the C++ Standard Library algorithms. Correctness is obviously the most important motive for how you design your function objects, but efficiency might also be a concern. If you implement function objects that are expensive to copy, they might cause your programs to run slow.

A big advantage of using function objects like the one you've seen here is that the code is reusable. Had you written the code directly in a loop, you'd either use copypaste or refactor the code and move the implementation to a separate class or function. With a function object like door_locker, any code that needs the functionality is free to declare such a variable and use it. Of course, there's another side to that coin: by moving the functionality to another location in the source code, it may become harder to understand the code. Programmers reading your code will need to locate the function object when they encounter the call to for_each(), whereas with a loop, the implementation is always right there. If you're looking for the most efficient solution, then you'll also want to use function objects rather than function pointers, because compilers are very good at inlining calls to a function object's function call operator (although modern compilers are getting better at replacing function pointers with inline code).

Related Topics

A closely related algorithm to for_each() is std::transform(), which is a mutating sequence operation that, compared to for_each(), takes an additional parameter for an output iterator where the result of applying the function is placed. Read more about transform() in the article dedicated to that algorithm.

Table 1 lists all of the algorithms in for_each()'s category. You should know them by heart, so you'll always be able to select the correct algorithm.

Summary

You have seen how the algorithm for_each() can help you write terser and more readable code. It is a very common programming task to perform some operation on every element in a range, so for_each() is extremely useful. Free functions and function objects that perform the operations improve code readability when using for_each(), and of course, it often makes sense to factor out non-trivial code from loops into separate functions anyway. Still, you can also use an adaptor to have for_each() call a member function on the container's element type. Always remember that your function objects will almost always be copied when they are sent to for_each() and the other algorithms in the C++ Standard Library, so be sure to design them to handle copying like we did with the door_locker class. As Scott Meyers points out in Item 43 of Effective STL [12], the algorithms in the C++ Standard Library are superior because they are often more efficient, less prone to errors, and more easily maintained than hand-coded versions in loops. I'd like to add that by using these algorithms, you focus on the algorithm that solves the problem rather than the code, which makes for more elegant programs. And believe me, elegance does have its place in programming.

Acknowledgements

I have chosen to acknowledge not only the people who have directly contributed to this article, but also those who have made the STL what it is today. There are more than those mentioned, of course, and I appreciate if you are able to make me aware of contributors to whom I have failed to give due credit.

References

[1] The C++ Standard, ISO/IEC 14992-98, often referred to as "C++98", is available for purchase from the ANSI standards store.
[2] To start learning about Standard Template Libray (STL) containers, I would suggest reading The C++ Standard Library: A Tutorial and Reference, by Nicolai Josuttis.
[3] Any book that talks about the STL will discuss iterators, and extraordinary lucid explanations are available in Matthew Austern's Generic Programming and the STL: Using and Extending the C++ Standard Template Library.
[4] Another excellent book that teaches the STL from the ground up is STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library (Second Edition), by David Musser et al.
[5] There are actually better alternatives to the adaptors and binders from the C++ Standard Library, most notably Boost.Bind.
[6] Note that unlike function pointers, pointers to members must always be formed by applying an explicit and operator. In practice it's clearer and more consistent to apply it to function pointers too.
[7] std::unary_function and std::binary_function are defined in the header .
[8] If you want, you can also define the typedefs argument_type and result_type in your class rather than deriving from std::unary_function. That will also make the function object adaptable, but an extra benefit of using derivation from unary_function is that it nicely documents that your class is adaptable.
[9] boost::shared_ptr is a part of the Boost.Smart_ptr library .
[10] I devote one of the chapters in my book Beyond the C++ Standard Library: An Introduction to Boost to Boost.Smart_ptr, including of course one of its key components, boost::shared_ptr. [11] Moving the private parts of a class into a separate class is often done to insulate clients from unnecessary knowledge about the class internals. It is called "the pimpl idiom" or the "Cheshire Cat". Read more about it in Scott Meyer's Effective C++ : 55 Specific Ways to Improve Your Programs and Designs (Third Edition).
[12] Read more about Scott Meyers' Effective STL in the following STL Bibliography.

July, 2005: <algorithm>: for_each

Figure 1: A conceptual view of a std::vector<door> with four elements.

July, 2005: <algorithm>: for_each

Table 1: Non-modifying sequence operations.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.