Dr. Dobb's | STL & Generic Programming: STL Function Objects and Their Adaptors

STL & Generic Programming: STL Function Objects and Their Adaptors

STL is more than just algorithms, containers, and iterators. Function objects and adaptors greatly enhance your ability to use STL effectively.

June 01, 2002
URL:http://www.drdobbs.com/stl-generic-programming-stl-function-ob/184401527

June 2002/STL & Generic Programming

Introduction

In one of the first installments of this column almost two years ago, I said that the STL rests on three pillars: containers, iterators, and algorithms. Since then, I have tried to pass on to you what I have learned about these three topics in my own work as a software engineer. Perhaps I was able to save you from some of the cuts and bruises that we suffered when we first started using the STL in our shop. I am now almost ready to move on to the next chapter in my overall plan for this column, namely, generic programming techniques that are not specifically related to the STL. However, before I can do that, I must conduct a little mopping-up operation and talk about one last group of classes and functions that the STL provides, often overlooked but really nifty and useful nonetheless: function objects and their adaptors.

Overloading operator()

Let me begin by reminding you that one of the many operators that C++ lets you overload is operator(), the function call operator. More precisely, if we define a class Func as follows:
class Func
{
public:
  int operator()(int i) const
  { return i * 42; }
};
then we can use any object of type Func just like a function:
// Writes 84 to standard out
Func foo;
std::cout << foo(2);
The expression foo(2) is simply a syntactically different way of saying:
foo.operator()(2);
We can also use an unnamed temporary local object of type Func to make the same function call. This leads to the following syntactic construct, which you may find confusing at first:
std::cout << Func()(2);
The Func() part creates an unnamed temporary object of type Func, and the (2) part causes the call to operator(). Again, we could be more verbose and write
Func().operator()(2);
Needless to say, operator() can take any number of arguments of any type, and it can have any result type. In fact, you can do almost anything with operator() that you can do with an ordinary member function, like overload it (i.e., write multiple versions of it with different parameter lists) or make it virtual, private, or what not. However, I have never used or seen anybody use operator() as anything other than a plain old public, non-virtual, non-overloaded member function. In fact, at this point, you are probably asking yourself what reason anybody could possibly have for writing a function call operator for some class and then using objects of that type as if they were functions. Other than to satisfy a C++ programmer’s notorious desire for being syntactically cute, why would anybody do such a thing? In the course of this column, I’ll give you several good reasons for using such function objects instead of old-fashioned functions.

There is just one more thing I need to address, and that is terminology. The gurus of C++ and the STL commonly use the term “function object” for objects that are used like functions, and the term “functor” for both classes of this kind and their objects. Chuck Allison, senior editor of CUJ, has raised the objection that the word functor already has a rigorously defined meaning in mathematics, and that this meaning is not exactly compatible with the new use of the word in the C++ community. As a former research mathematician who has had more than a brush with category theory where functors abound, I have to agree with his point. On the other hand, the dictionary defines “functor” as “something that performs a function or an operation,” which is a perfect fit for C++-style function objects. Moreover, it seems to me that mathematics and C++ programming are sufficiently disjoint disciplines that they can each have their own definition of the term “functor” without causing confusion or misunderstanding. In short, I will agree with Chuck Allison’s point, but I will not accept it as a reason for not referring to classes with an overloaded operator() and their objects as functors.

STL Functors

The STL provides us with a number of predefined functors. Perhaps not surprisingly, they are all class templates. I will not list them all here, letting you look them up in your favorite STL book instead. Let us look at std::plus as an example. On the surface, plus is nothing but a functor whose function call operator adds two things of a templatized type:
template<typename T>
class plus
{
  // Some more stuff...
public:
  T operator()(
    const T& x,
    const T& y
    ) const
  { return x + y; }
};
When I saw this for the first time, I was a bit annoyed. Hadn’t they told me that the hottest thing since sliced bread was to overload operator+? If, for example, I am to write a matrix class, I’m supposed to overload its operator+ so that I can write:
Matrix A,B,C;
[...]
C = A + B;
rather than something like:
C = A.add(B);
or:
C = add(A, B);
Aren’t we going in circles here? First, we make it so that writing A + B calls operator+ of the matrix class, and then we introduce a functor std::plus so that we can write:
plus<Matrix>()(A, B);
to have the function call operator of class plus call A + B. In fact, if you were to use std::plus in that manner, writing:
plus<Matrix>()(A, B);
instead of A + B, then you would be going in circles. But that’s not what std::plus is for. All the good reasons for overloading operator+, or operator- and operator* for that matter, remain valid. Functors such as std::plus are useful for specific purposes in the context of generic programming, the prime example being the customization of STL-style algorithms. Suppose, for example, that you have two STL vectors of equal size containing objects of class Matrix, and operator+ is defined for these matrices. Now you want to add to each element in the first vector the corresponding element of the second vector. That is, you want to do what this code does:
for(int i=0; i<vect1.size(); ++i)
  vect1[i] += vect2[i];
As good STL users, we prefer algorithms to explicit loops, so we should do it like this:
std::transform(
  vect1.begin(),
  vect1.end(),
  vect2.begin(),
  vect1.begin(),
  std::plus<Matrix>()
  );
If you’re not sure, look up std::transform in your documentation to verify that what I’m doing here is correct and safe. Notice how we are passing an unnamed temporary object of type std::plus to the algorithm. This object will be used as a function object to perform the desired addition.

Now we have seen an example where the functor std::plus is put to good use: an object of type std::plus is passed to an STL algorithm to customize the action of the algorithm. This makes it possible to have an algorithm like std::transform perform an enormous variety of tasks. Interestingly, though, I have still not given you a good reason why std::plus is a functor rather than a plain old non-member function. The algorithm std::transform accepts anything that can be called like a function as its last argument, and the same is true for other STL algorithms that can be similarly customized. Therefore, function pointers work fine here. One great reason to prefer functors over function pointers in a situation like that is, somewhat surprisingly, efficiency. When the compiler generates the instantiation of std::transform with the template arguments of the example above, it is very likely that the call to std::plus’s operator() will be inlined. Function pointers, on the other hand, inhibit inlining. Therefore, contrary to the common wisdom that object-oriented design comes at a performance cost, STL-style algorithms used with functors often perform better than their C-style counterparts such as qsort, because the latter use function pointers that prevent the compiler from inlining.

Another great reason for making std::plus a functor rather than a non-member function becomes apparent when we start using STL function object adaptors.

STL Function Object Adaptors

Let us look at a slight modification of the vector example that we used earlier. Suppose we have a vector of Matrix objects, where, as before, operator+ is defined for matrices, and we want to add a constant value, say C, to each element of the vector, like this:
for(int i=0; i<vect.size(); ++i)
  vect[i] += C;
Again, since STL algorithms are to be preferred over explicit loops, you want to write something like this instead:
std::transform(
  vect1.begin(),
  vect1.end(),
  vect1.begin(),
  add_const<Matrix>(C)
  );
where add_const is a functor that takes one argument and returns the result of adding to this argument a value that is specified upon construction:
template<typename T>
class add_const
{
public:
  add_const(const T& c) : m_c(c) {}
  T operator()(const T& x) const
  { return x + m_c; }

private:
  const T m_c;
};
There is a more “generically correct” way of doing this, which I’ll get to in a moment, but let me first point out that we have just stumbled upon another reason to prefer functors over non-member functions: functors are full-fledged classes, and therefore they can store state. In this case, we used state to hold the constant value that the functor adds to its argument. Another thing that I find myself doing quite frequently is to use functors to query a collection of objects for information. For example, if you have a collection whose elements are themselves collections, like a vector of vectors, then you can obtain the collections’ maximum sizes by calling std::for_each with a functor that stores the maximum size of the collections that it has seen. Listing 1 shows the code. Notice how the algorithm std::for_each takes its last argument by value, but it returns a copy of that argument so that it can be inspected when the algorithm is finished.

So now you have two reasons to familiarize yourself with functors and use them with STL-style algorithms. But it gets better. Let’s go back to the example of adding a constant value to every element in a vector of matrices. The STL way of doing this is as follows:
std::transform(
  vect1.begin(),
  vect1.end(),
  vect1.begin(),
  std::bind2nd(std::plus<Matrix>(), C)
  );
Listing 2 shows a typical implementation of std::bind2nd. It is really just one of those little generating functions that the STL often uses to create objects. In this case, the object created is of type std::binder2nd. std::binder2nd is a typical example of an STL function object adaptor. It takes two constructor arguments: a functor func that takes two arguments, and a value secondArg for the second argument. The resulting object is a new functor that takes one argument. Calling this new functor with an argument x results in a call to the original functor func, with x as the first argument and secondArg as the second argument. In short, std::binder2nd turns a binary functor into a unary functor by fixing the second argument to a user-supplied value.

This takes us to the most important reason why things like std::plus are functors rather than non-member functions. If you take one quick look at std::binder2nd’s definition, you’ll notice immediately that the adaptor std::binder2nd needs quite a bit of type information about its adaptee func: it uses func’s result type as well as its two arguments’ types. When using functors instead of functions, this information can easily be provided in the form of typedefs. The STL has, of course, a standard way of providing this type information. Every time you write a unary functor, you should derive your class from std::unary_function, and when you write a binary functor, you should derive from std::binary_function. In both cases, you must specify the template parameters in the obvious way. Look at std::binder2nd in Listing 2 for an example. Your functor then provides all necessary type information via the typedefs result_type, first_argument_type, and second_argument_type. In STL terminology, your functor has become an adaptable functor, and it is thus ready to be used with STL-style adaptors such as std::binder2nd.

Just as with algorithms, iterators, functors, and the like, the STL provides only the most commonly needed function object adaptors, and you are encouraged to write your own. As a matter of fact, I personally find that the STL’s supply of adaptors is a bit on the meager side. For example, since much of my software development is scientific and mathematical in nature, I often need the composition of two functors, which the STL does not have. Matt Austern mentions composition adaptors as a “frequent addition to the STL” in Sections 15.8.6-15.8.8 of his book [1]. The ones that I wrote for myself can be found, together with some other stuff that you may find useful, in the code that came with my article on STL extensions in the June 2000 issue of CUJ [2]. Slightly different versions of the composition adaptors are available from the Boost website [3]. I would recommend that you use the Boost versions, because they have an excellent chance of being included in the STL in the future.

STL Member Function Adaptors

There is one more group of adaptors in the STL that I want to mention, namely, member function adaptors. As usual, I will not provide you with a complete documentation here. There are two reasons for mentioning these adaptors. First, they are so important that you simply must be familiar with them. Second, under certain circumstances, they bring out a rather ugly problem with the STL, which you must be warned about.

Suppose you wish to perform some action on each element in a container; say, you want to serialize each object in the container. The STL way of doing this is to call std::for_each with the begin and end iterators of the container and a suitable functor. Therefore, we would expect the code to look something like this:
std::vector<SomeClass> vect;

// Add elements to the vector
[...]

std::for_each(
  vect.begin(),
  vect.end(),
  FuncSerialize()
  );
Here, FuncSerialize would be a unary functor that takes an argument of type reference to SomeClass. The code above creates an unnamed temporary object of type FuncSerialize, and the algorithm std::for_each calls this function object on each element of the vector. Now is the time to remember that, despite the fact that generic programming is really cool, the primary paradigm of software that is written in C++ is still the object-oriented paradigm. Serializing an object should not be done by calling a non-member function that takes the object as an argument. It should be done by calling a member function. At this point, it looks as if we have to abandon the STL and write an explicit loop:
for(int i=0; i<vect1.size(); ++i)
  vect[i].Serialize();
Fortunately, this is not necessary. The STL provides us with a way to reconcile generic programming and object-oriented programming. Here’s what you must write to call the member function SomeClass::Serialize on each object in our vector vect:
std::for_each(
  vect.begin(),
  vect.end(),
  std::mem_fun_ref(SomeClass::Serialize)
  );
Even without looking at source code, it is not hard to guess how std::mem_fun_ref works here. It is a helper function that returns a unary functor. This functor stores the address of the member function SomeClass::Serialize. The functor takes one argument of type SomeClass&. When called with an argument x of type SomeClass, the functor will call x.Serialize.

The STL has a whole zoo of member function adaptors of this type. Their names all have “mem_fun” in them, and you should familiarize yourself with them. The STL implementation that comes with Microsoft Visual C++ 6.0 is incomplete in this area insofar as you cannot adapt const member functions, and you cannot adapt member functions that return void. The code that came with my article on STL extensions [2] provides what’s missing.

In the serialization example above, I assumed that the member function SomeClass::Serialize takes no arguments. In real life, you will probably encounter serialization functions that are declared like this:
void Serialize(Archive& ar);
Here, the Archive argument specifies the destination of the serialization process. The task of serializing each object in a container has just become a little more complex, but the STL can still handle it. Suppose ar is the archive to which you want to serialize each element of the vector vect. Here’s what you must write:
std::for_each(
  vect.begin(),
  vect.end(),
  std::bind2nd(std::mem_fun1_ref
              (SomeClass::Serialize), ar)
  );
std::mem_fun1_ref works much like std::mem_fun_ref, which we looked at earlier. The difference is that std::mem_fun1_ref adapts member functions that take one argument. The resulting functor takes two arguments. The first one is a reference to the object on which the member function is to be called. The second one is the argument that is to be passed to the member function. In other words, if x is an object of type SomeClass, and ar is an object of type Archive, then the functor call:
std::mem_fun1_ref(SomeClass::Serialize)
                 (x, ar);
results in the call:
x.Serialize(ar);
Since we already know all about std::bind2nd, it should be clear now how the for_each example above works. The algorithm simply calls:
x.Serialize(ar);
on each element x of the vector vect — so far, so good. This is exactly the way we’re supposed to do this. Except, the code won’t compile under any STL implementation that I am aware of. To see why, let us take a second look at binder2nd’s class definition in Listing 2. Specifically, look at the type of the second constructor argument. It is a const reference to the second argument type of the underlying binary functor. In our example, what is the underlying binary functor? It is the member function adaptor that std::mem_fun1_ref has created. And what is the second argument type of this functor? It is Archive&. Ouch. What binder2nd is presenting to the compiler here is the “type” Archive&const&. The compiler politely and rightfully declines to deal with this: references to references are not allowed.

Another area where the reference-to-reference problem tends to rear its ugly head is locales. In my last column, I had an example that used the STL algorithm std::transform to read characters from the standard input stream, upcase them, and write them back to standard out. To avoid having to address the reference-to-reference problem, I used the musty old toupper function from the C runtime library to achieve the upcasing. The STL way to do it would have been to create a functor that uses the locale object stored with the cin object. Do yourself a favor and use your STL documentation and literature to fully understand the following line of code, which creates such a functor:
std::bind2nd(
  std::ptr_fun(std::toupper<char>),
  std::cin.getloc()
  );
Then try to compile this and understand why you get a reference-to-reference error.

This problem is extremely annoying, and if you use the STL extensively, as you should, then you are practically guaranteed to run into it. Is there a solution to this problem? Yes, there is. Rather than grabbing the argument types of its underlying binary functor and slapping references on them, which will likely result in references to references, what binder2nd should do is this: it should ask each of these types if it is a reference. If so, the type is used as is. If not, a reference to the type is used. Does this remind you of something? It should. It’s a compile-time if-else of types. If some type is a reference, do one thing. Else, do another. It’s the kind of problem that can be solved with a traits class. I’ll explain the solution in detail in my next column. In the meantime, if you want to do some research on your own, check out Boost’s call_traits [4] or Andrei Alexandrescu’s type_traits [5]. Notice, however, that you cannot fix the reference-to-reference problem that we encountered when combining std::bind2nd and std::mem_fun1_ref by just downloading some code and adding it to your project. The source code of classes such as std::binder2nd needs to be changed. Boost has appropriate replacements for the STL function object adapters [6]. Notice that for all this to work, you need a compiler that supports partial template specialization. Most compilers these days do, the most notable exception being Microsoft’s Visual C++.

References

[1] Matthew H. Austern. Generic Programming and the STL (Addison Wesley, 1998).

[2] Thomas Becker. “STL & Generic Programming: Generic Extensions to the STL,” C/C++ Users Journal, June 2000. The code is available at <www.cuj.com/code/archive.htm> under June 2000.

[3] <www.boost.org/libs/compose>

[4] <www.boost.org/libs/utility/call_traits.htm>

[5] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001), Section 2.10.

[6] <www.boost.org/libs/functional>

Thomas Becker works as a senior software engineer for Zephyr Associates, Inc. in Zephyr Cove, NV. He can be reached at [email protected].

June 2002/STL & Generic Programming/Listing 1

Listing 1: Querying a vector of vectors for the maximal size

max_size<std::vector<Something> > the_max =
std::for_each(
  vect_of_vects.begin(),
  vect_of_vects.end(),
  max_size<std::vector<Something> >()
  );
size_t m = the_max.get_max();

template<typename T>
class max_size
{
public:
  max_size() : m_max_size(0) {}
  
  size_t get_max() const
  { return m_max_size; }
  
  void operator()(const T& cont)
  { m_max_size = std::_cpp_max(cont.size(), m_max_size); }

private:
  size_t m_max_size;
};
— End of Listing —

June 2002/STL & Generic Programming/Listing 2

Listing 2: A typical implementation of bind2nd and binder2nd

template<class BinFunc>
class binder2nd : public unary_function<
  BinFunc::first_argument_type,
  BinFunc::result_type
  > 
{
public:
  binder2nd(
    const BinFunc& func,
    const BinFunc::second_argument_type& secondArg
    ) : m_func(func), m_secondArg(secondArg) {}
  
  result_type operator()(const argument_type& first) const
  { return m_func(first, m_secondArg); }

protected:
  BinFunc m_func;
  BinFunc::second_argument_type m_secondArg;
};

template<class BinFunc, class SecondArg> inline
binder2nd<BinFunc> bind2nd(
  const BinFunc& func, 
  const SecondArg& secondArg
  )
{
  return binder2nd<BinFunc>(
    func,
    BinFunc::second_argument_type(secondArg)
    ); 
}
— End of Listing —