Channels ▼
RSS

Custom Containers & Iterators for STL-Friendly Code


March, 2005: Custom Containers & Iterators for STL-Friendly Code

Ethan McCallum is a freelance technology consultant who keeps busy with C++, Java, and Linux. He can be reached at ethanqm@penguinmail.com.


Be it a spoken or software language, vocabulary and syntax alone only go so far. Knowledge of idioms and other subtleties are what separate the native from the tourist.

One element of software language proficiency, then, is creating constructs that are more natural within the language realm. Developers frequently create functors to take advantage of C++'s STL algorithms; but what about the other side of the coin—creating custom containers? Client code could use these classes to fetch and store data, and manipulate their contents with STL algorithms.

There are two ways to do this: Add functionality to containers to make a class more STL friendly, or create new containers and iterators from scratch to make legacy code available to STL algorithms. In this article, I examine both methods: when they are useful, and the decisions you face when designing them. What makes this endeavor interesting is that it doesn't involve traditional inheritance. I've tested the code samples under Fedora Core 1 (GCC 3.3.2) and Fedora 3 (GCC 3.4.2). The first example should build (and run) under any STL-capable compiler, whereas the second example is based on a POSIX library call. The design advice for both examples is portable.

Value-Adding an Existing Container

Consider a WidgetBucket class, the purpose of which is to associate some information with a group of Widget objects:

  private:
  std::list<Widget> widgets_ ;
  std::string name_ ;
  // ... etc ...
}

Say the list of Widgets is meant for public consumption, though understandably, you don't want to directly expose it as a public member. Both friend access and Visitor [1] would be overkill. You could provide a getList() member function that returns a reference to the list, then follow that with a call to the ever-familiar for_each:

std::list<Widget>& lw = bucket.getList() ;
std::for_each(
   lw.begin() ,
   lw.end() ,
   SomeFunctor()
) ;

One problem with this approach is that it exposes a WidgetBucket implementation detail, notably that it uses a list<>. A public typedef to obscure the list<> would make the class only slightly less awkward to use.

Given that WidgetBucket is a value-added list, it would be more natural to access the Widgets as though WidgetBucket itself were a container:

std::for_each( bucket.begin() ,
  bucket.end() , SomeFunctor() ) ;

It's straightforward to do this: Simply expose pass-throughs to the underlying list<>'s const_iterator typedef and its iterator access functions:

class WidgetBucket {
  public:
  typedef
  std::list<Widget>::const_iterator
  const_iterator ;
  ...
  const_iterator begin() const {
    return( widgets_.begin() ) ;
  }
  ...
}

(The end() function is similar to begin().)

Now client code can access the list<> members, pass them through STL algorithms, and so on, and no one's the wiser that they are talking to an abstraction layer. Client code refers only to WidgetBucket functions and typedefs, thereby hiding the internals.

Congratulations. You've just created a container class, and you didn't have to inherit from list<> to do it. Some may call this cheating—it's just containment, after all—but it's how STL containers work.

STL Container "Inheritance," Isn't

Traditional polymorphism reflects an "is-a" relationship: An object of a derived class "is a(n)" object of its base class. You can exchange a pointer or reference to a base class object with one of a derived class, and client code is none the wiser.

STL container classes are described in terms of "concepts" [2] rather than interfaces. Like interfaces, concepts outline function signatures and typedefs expected by client code. Classes that implement these functions and typedefs are said to conform to that concept.

At the top of the hierarchy is the Container concept, which has members such as value_type, begin(), and size(). Refinements of Container include Associative Container, for key-based element access, and Sequence Container, which represents linear storage. As with abstract bases, a class may conform to several concepts to achieve greater functionality.

Performance aside, code that calls members of the Sequence Container concept can transparently swap a vector<> for a list<>. The same goes for map<>, set<>, and "Associative Container." If a code block only calls Container's size() member function, then it is oblivious as to whether it is operating on a vector<>, or map<>, or anything else.

Unlike the explicit, compiler-enforced "is-a" relationship between a base and derived class, containers share an informal "acts-like-a" relationship. There's no "Container" class from which to inherit, only an idea. (Hence the name "concept"—the relationship is conceptual.)

This lack of compiler-policed conformity gives you tremendous leeway in how container-like classes are implemented. WidgetBucket only provides begin() and end(), though it could also provide pass-through operations to all list<> members. (Case in point: WidgetBucket conspicuously lacks a means to add Widgets.) How do you decide which members to implement?

The purists may say, "all of them," per the Principle of Least Surprise. It's not unreasonable to expect a class that provides push_back() and begin() to also provide empty().

Then again, a surprise (compiler error) can yield enlightening conversation: "I omitted push_back() for a reason: That's a read-only container." Sometimes you can add functionality by removing it.

Furthermore, there are reasons to create special versions of concept members in lieu of strict pass-throughs. A selective WidgetBucket class's push_back() could, say, confirm the Widget meets some criteria before passing it to the underlying list<>.

Your class must also make sense to you at a code level. For example, what purpose would clear() serve in pseudocollection that is prepopulated at construction, such as a wrapper around a database library's result set?

Clearly, there are myriad ways to do this, all of which are "right."

All of these points share a common theme, that of client code expectations. For a private section of code, WidgetBucket's begin() and end() may be enough. A general-release library would be more strict—and less surprising—in its conformity to the concept.

Creating New Containers

The "acts-like-a" relationship of STL containers needn't be expressed in terms of containment and pass-throughs. Nor must a "container" hold true objects—it can be an abstraction layer for some idea. One reason to create a new container-like class is to put an STL-friendly face on other code, such as to use a legacy library's linked list with for_each() or a database API's result set with accumulate().

The theory is the same—expose container-related typedefs and member functions—but the mechanics are different. With no underlying STL container to handle pass-throughs, you must implement the member functions and iterators yourself. I'll use the POSIX glob() call as an example of what questions arise when a container is more of an abstraction than a pass-through.

glob() expands shell wildcards, such as * and ?, into lists of filenames, and stores the results in a glob_t data structure. Several such expansions may be stored in a single glob_t. glob_t.gl_pathv provides access to the list of matched files.

This has several features befitting a Container; the STLGlob class (Listing 1) is one such implementation:

  • Respectively, the constructor and destructor create and destroy the glob_t by calling glob() and globfree().
  • push_back() calls glob() to add elements to the glob_t. It throws an exception if the provided pattern fails to match anything.
  • size() returns the value of glob_t.gl_pathc, which is the number of files currently matched by the expansion.

glob_t structures must have a single owner, and as such, STLGlob's copy constructor and assignment operator are private and unimplemented [3]. (A more robust implementation could use reference counting.)

STLGlob closely matches the general Container concept. While it uses push_back() to add elements, it lacks several other functions expected of a Back Insertion Sequence Container; for example, there's no erase() or pop_back() because glob() provides no way to remove elements.

Custom Iterator Theory

Creating a glob()-based container is simple enough, but element access requires that STLGlob have an iterator.

It's said that iterators abstract client code from the container type; this is true, but the container type, iterator, and client code expectations still relate to one another. If client code drives the design, that influences the iterator's implementation (and, indirectly, that of the iterator's parent container). On the other hand, if the container drives the design, then that will dictate what client code can do with the container.

Iterators are to containers what pointers are to arrays: They abstract a sequence's element access. Increment (operator++()), and in some cases decrement (operator—()), functionality is available to position the iterator over a different element, similar to moving a database cursor through a table. The abstracted value (that is, the current sequence element) is available via dereference (operator*()) and member access (operator->()) functions. Iterators must be comparable (operator==(), operator!=()) if they are to be of any use in a loop.

Language semantics bind raw pointers to the type they abstract. Iterators, by comparison, describe themselves and the values they abstract through members of the std::iterator_traits<> template: iterator_category, value_type, difference_type, pointer, and reference. (iterator_traits<> is specialized for raw pointers, which is why those work as STL iterators.)

iterator_category describes the type of iterator using tag classes: std::forward_iterator_tag for forward iterators, std::output_iterator_tag for output iterators, and so on. In lieu of traditional, compiler-enforced inheritance, client code uses the tags to determine an iterator's capabilities and react accordingly.

(Don't be fooled. There exists a base class std::iterator<>, but it provides no function interface.)

value_type is the type of the value abstracted by the iterator. This needn't be the same as the type used internally by the container; instead, it's the data type presented to client code for iterator dereference and pointer-member access functions. For example, STLGlobIterator (see Listing 2) manages C char* strings internally, but returns C++ std::strings to client code on dereference.

difference_type measures the distance between two iterators. It is for use with random-access iterators, which support the distance() operation.

reference and pointer define the data types returned to clients on iterator dereference and member-access, respectively. Take, for example, a basic operator->():

// ... specialization of iterator_traits<SomeIter> ...
class SomeIter ... {
  ...
  // this could be simplified with a typedef
  std::iterator_traits<SomeIter>::pointer operator->() {
  ....
}

Part of the challenge of creating custom iterators is that they rely on a much more strict adherence to concepts. This is based, once again, on the notion of client code expectations: Whereas people use containers, iterators are more often passed through (preexisting black box) algorithms and are thus less forgiving of deviations from the norm.

Iterators in Practice: STLGlobIterator

Applying the previous section's discussion to STLGlob yields the aptly (an unimaginatively) named STLGlobIterator. The full code is available for download, and an excerpt is provided in Listing 2.

STLGlobIterator inherits from std::iterator<> (Listing 3), a convenience class that specializes std::iterator_traits<> and reflects those traits in the derived class as shortened typedefs. Only the category and value type must be provided; the template provides reasonable defaults for the rest. iterator<> provides no member function declarations.

Client code is expected to only use STLGlobIterator in single-pass algorithms, so forward_iterator_tag is the iterator_category trait. Redefining the class as a reverse iterator is left as an exercise, the first point of which would be to determine why STLGlobIterator would need decrement and distance() functionality.

STLGlobIterator is a lightweight wrapper around another iterator, the char** expansion list in the parent STLGlob's glob_ member variable. (This value is stored in STLGlobIterator as sequence_.) While not quite a straight pass-through, it's simple enough to translate iterator operations to pointer equivalents.

STLGlobIterator's default (that is, no-args) constructor initializes sequence_ to NULL, indicating the past-the-end element. The constructor overload accepts a char**, the glob_t.gl_pathv member of the parent STLGlob's glob_ member variable. This value is owned by the parent STLGlob, and as such, STLGlobIterator's destructor does not release sequence_'s memory. Changes made to the parent STLGlob container will likely invalidate STLGlobIterator.

Dereferencing sequence_ yields the current value. This is stored in a local std::string member variable, current_, such that operator->() and operator*() don't yield pointer ownership problems. Technically, list_ may change outside of the iterator and, thus, current_ will not reflect the value it copies; but that would happen only if the parent STLGlob container were to change, in which case the pointer would be invalid anyhow.

The copy constructor and assignment operator assign list_ and current_ to the target object. It's safe to assign the pointer because it is owned by the parent STLGlob object, and iterator operations do not affect its value.

The increment operators advance sequence_ and update current_ accordingly. glob()'s match list (and hence, sequence_) is NULL terminated. When this value is reached, sequence_ itself is set to NULL to match the past-the-end iterator.

Other iterators require a similar end-marker, past which the iterator is invalid. At this point, you have a choice on what to do when iterator functions are called—throw exceptions or dereference the (now invalid) pointer as the client code requests. (The latter should yield a segmentation fault.) This decision depends largely on your (or your team's) coding practices and stance on compensating for developer error.

Comparison requires special attention. Iterators don't just abstract some value, but some value at a particular position in a particular container. A value may exist in several containers or several positions in a single container. Hence, comparison is more involved than simply comparing two iterators' dereferenced values.

STLGlobIterator's comparison is greatly simplified because it can compare the memory addresses of sequence_'s members. glob_ts that contain the same expansion should point to different areas of memory. Comparison of iterators from different STLGlob parents should thus return false.

The sample program example_2 (see http://www.cuj.com/code/) demonstrates the use of STLGlob and STLGlobIterator. Patterns (command-line arguments) are pushed into an STLGlob object, then the container's contents are passed through std::for_each() using STLGlobIterators.

./example_2 '/tmp/*' '/home/*'

Note that the patterns are enclosed within single quotes, such that they are passed verbatim to the program and are not interpreted by the shell. Listing 4 is an excerpt of example_2.

Conclusion

In this article, I've presented a technique that makes existing code—even legacy code—more accessible to STL algorithms. While it can be a challenge to implement, end users of the new classes (fellow developers) will hardly notice. In a sense, that's the goal: Producing constructs that are indistinguishable from natural STL classes eases their learning curve and thus enhances their adoption.

By no means do I include everything you need to know on this subject. Review the Standard [4], sections 23 (containers) and 24 (iterators), or The C++ Programming Language [5], chapters 16 and 19, for more details. The sample code's comments are formatted for Doxygen [6].

Resources

[1] Gamma, Erich et al. Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.

[2] Standard Template Library Programmer's Guide (http://www.sgi.com/tech/stl/).

[3] Meyers, Scott. Effective C++: 50 Specific Ways to Improve Your Programs and Design, Item 27, Addison-Wesley, 1997.

[4] The C++ Standard, Sections 23 and 24.

[5] Stroustrup, Bjarne. The C++ Programming Language, Chapters 16 and 19, Addison-Wesley, 2000.

[6] Doxygen tool for API doc generation (http://www.stack.nl/~dimitri/doxygen/).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video