Dr. Dobb's | Pointer Containers

Pointer Containers

Smart containers are useful and safe utilities that can lead to flawless object-oriented programming.

October 01, 2005
URL:http://www.drdobbs.com/cpp/pointer-containers/184406287

October, 2005: Pointer Containers

Thorsten is a coowner of Dezide (http://www.dezide.com/), which specializes in troubleshooting programs based on Bayesian-network technology. He is currently writing a second major in computer science in the area of expert systems. He can be contacted at [email protected] or [email protected].

Object-oriented programming in C++ has always been a bit awkward for me because of the need to manage memory when filling containers with pointers. Not only must we manage memory, but the memory management strategy implies lots of extra syntax and typedefs that unnecessarily clutter the code. Ignoring garbage collection, there are two approaches to memory management:

Smart pointers, as in Listing One(a).
Making your own implementation, such as Listing One(b).

The good thing about smart pointers is that they are safe and fast to implement (just a typedef). On the downside, they are inefficient (in space and time) and have a clumsy syntax. Consequently, in this article, I describe the design and implementation of the second approach, which I call "pointer containers," that has all of the benefits of smart pointers but without the downside. The complete source code that implements this technique is available electronically; see "Resource Center," page 4.

Exception Safety

When dealing with memory, exception safety is always a concern. Basically, the concept can be summarized as follows (from stronger to weaker):

The "nothrow" guarantee means code will not throw.
The "strong" guarantee means code has roll-back semantics in case of exceptions.
The "basic" guarantee means code will only preserve invariants in case of exceptions—in all cases, no resources leak.

I start by extending the interface of the container, as in Listing Two(a). Granted, it is easy to implement push_back(), as in Listing Two(b), but vector<T*>::push_back() might throw an exception if a new buffer cannot be allocated (leading to a leak of it). Consequently, you have to change the implementation, as in Listing Two(c). Now, push_back() has the strong exception-safety guarantee. Although this seems simple, even experienced programmers seem to forget it.

Listing Three(a) is a naïve implementation of the iterator range insert(). You must heap allocate copies because the container takes ownership of its pointers. This is not only a naïve implementation, but it is also horribly flawed. First, the elements are inserted in reverse order—if they ever get inserted! Second, before might be invalidated by vector<T*>::insert(), leading to illegal code. Third, there are potential memory leaks (as with push_back()). And fourth, it is potentially slow because you might cause several reallocations of a larger buffer. Listing Three(b) is better.

That's it, right? Well, yes and no. Yes, because I have removed all the stupid errors; and no, because you might still get several reallocations. So why not just call:

vec_.reserve(distance(first,last)+
vec_.size() );

as the first line in insert()? The answer is that reserve() can invalidate before. (As a caveat, the iterator type I might be an InputIterator, which means you cannot determine the distance between them, so you would need to do your own tagging. I will ignore that issue for now, although the implementation must not.) Furthermore, if you decided to implement several pointer containers in one generic wrapper, you cannot call container-specific operations, such as reserve(), without extra traits. Still, you have only achieved the basic exception-safety guarantee (which would be acceptable), you're left with a clumsy implementation, and inserting pointers one-at-a-time means you're not taking advantage of the fact that you know that N pointers must be inserted.

What you really would like to do is to use the iterator range version of insert() in vector. One idea would be to make an iterator wrapper whose operator*() would return a newly allocated copy. Iterators that do something extra are sometimes called "smart iterators" (see "Smart Iterators and STL," by Thomas Becker, C/C++ Users Journal, September 1998). This would again lead to basic exception safety; however, the C++ Standard does not exactly specify how and when insertions take place and I could imagine prohibited memory errors occurring. In particular, the Standard does not guarantee that one element is inserted before the next iterator indirection happens.

Therefore, I decided to use a different strategy and go for the strong exception-safety guarantee. The only thing it requires is a temporary exception-safe, fixed-sized buffer to hold the new pointers while they are being allocated. This is my scoped_deleter class and its role is similar to auto_ptr<T> in the implementation of push_back(): It holds pointers and deletes them if an exception occurs. Once it is created, the implementation looks like Listing Four. There you have it—a generic, strongly exception-safe, and elegant implementation. Because copying a T* cannot throw, vec_.insert() is guaranteed to have the strong exception guarantee; hence, you never insert pointers that will also be deleted by the scoped_deleter if an exception occurs. The price you pay is one extra heap allocation for the internal buffer in scoped_deleter. An extra heap allocation is normally expensive, but it is acceptable here because you make N heap allocations while copying the pointers (and you could optimize this by using stack space for small buffers).

You can then continue in this fashion to ensure that the insert(), assign(), erase(), and clear() interfaces from std::vector are reimplemented in ptr_vector in an exception-safe manner.

Iterator Design

You might have wondered why new(T(*first )) would compile; after all, if the iterator is defined as the iterator of std::vector<T*>, you would certainly need two indirections (unless the constructor took a pointer argument).

The reason is that you want normal iterators to be indirected by default, and the "old" style iterators to be available if needed. Therefore, you have Listing Five in ptr_vector, along with similar versions for const iterators.

The indirect_iterator adaptor from Boost (http://www.boost.org/) will most likely be part of the new Standard. The adaptor does all the hard work for you and has several good consequences:

First of all, it promotes pointerless programming and gives code a clean look. For example, compare Listing Six(a) to Listing Six(b). If you allowed direct pointer assignment, you would at least have a memory leak and (most likely) a pending crash if the same pointer is deleted twice.
The second good consequence is that it allows for seamless integration between ptr_vector<T> and (for example) std::vector<T>. This can be useful if we deal with copyable, nonpolymorphic objects. In that case, you could say something like Listing Six(c). Of course, it will also work the other way around from v2 to v.
The third benefit is that it is just safer to use standard algorithms.
The fourth benefit is that you can use normal functors directly without speculating with wrappers that do the indirection. However, there are also situations where you want to use the old ptr_iterator. These situations are characterized by using mutating algorithms from the Standard Library, which need to copy objects. Copying the object is expensive (compared to copying a pointer) or simply prohibited (as in most polymorphic classes). So, if copying is cheap and possible, then you can just stick to the normal, safer iterators.

In short, iterators promote safety and interoperability while not restricting access to the nonindirected iterators when needed.

The Clonable Concept

Even though we have a solid iterator range insert(), we still demand that new T( *first ) is a valid expression. There are several reasons why this is not ideal. The first example I can think of involves objects that are created by object factories. A second example involves polymorphic objects that are not copyable—a polymorphic object should very rarely be copyable. In both cases, you cannot use the call new T( *first ), so you need a hook of some kind.

The required indirection is given by the "Clonable" concept; let t be an object of type T, then T is clonable if new T( t ) is a valid expression or if allocate_clone() has been overloaded or explicitly specialized for T; see Listing Seven. What the implementation of ptr_vector now has to ensure is that the right version of allocate_clone is actually called. There are several pitfalls here and they all resemble the problems that the Standard has with calling std::swap(). One solution is simply to call allocate_clone() unqualified within ptr_vector and to define the overloaded version in T's namespace. This way, you rely on argument-dependent lookup and replace the function template specialization with a simpler overloaded function. (Another possibility would be to add another template parameter to ptr_vector—the parameter could then be a type with allocate_clone() as a static function.)

Domain-Specific Functions

Because the pointer container manages resources, there is suddenly a whole new interface that makes sense—an interface that deals with releasing, acquiring, and transferring ownership. Consider first the possibility in Listing Eight(a) to release a single object, which makes it simple and safe to hand over objects in the container. Should the caller forget to save the return value, you still have no leaks.

While all standard containers have copy semantics, providing the same for ptr_vector would be a bit strange—after all, the objects you insert into the ptr_vector are only required to be clonable. The fact that copying (or cloning) a whole ptr_vector can be expensive also suggests that you need something different. Hence, you add the two functions in Listing Eight(b). The first makes a deep copy of the whole container using allocate_clone() and is easily made strongly exception safe. The second simply releases ownership of the whole container and it is also strongly exception safe. Notice that it cannot have the "nothrow" guarantee because you must allocate a new ptr_vector for the returned auto_ptr. The new constructor is then used to take ownership of the result of clone() or release(), and it gives the "nothrow" guarantee. What is really elegant about these functions is that they let you return whole containers as results from functions in an efficient and exception-safe manner.

Recall that the iterators prohibited direct pointer assignment. This is certainly a good thing, but you lack the ability to reset a pointer within the container. Therefore, you add two functions to ptr_vector; see Listing Eight(c). The first is a rather general version that makes sense on other containers, too, whereas the last only makes sense on random-access containers such as ptr_vector.

Now you can add these functions to your container; see Listing Nine. The idea behind these functions is that they let you transfer objects between different pointer containers efficiently without messing with the pointers directly. In a way, they are similar to splice() in std::list and can easily fulfill the strong exception-safety guarantee.

Caveats

There are still several issues that must be dealt with—support for custom deleters, for instance. Because you can specify how allocation is performed with allocate_clone(), you should also be able to specify deallocation with deallocate_clone(). The introduction will affect the implementation greatly because you need to use it whenever deletion is done; this means that you must scrap std::auto_ptr<T> because it cannot support a custom deleter, and change scoped_deleter<T> similarly.

There are certain cases where you still want to use the old-fashioned ptr_iterators with mutating algorithms. The good news is that some mutating algorithms can be used with an indirecting functor; that is, a functor that compares the objects instead of the pointers. The bad news is that "some" is not "all," and algorithms such as remove() and unique() just copy elements instead of swapping them. This leads to both memory leaks and undefined behavior (see "The Standard Librarian: Containers of Pointers," by Matt Austern, http://www.cuj.com/documents/s=7990/ cujcexp1910austern/). There is a workaround for remove() and its cousin remove_if(), but it is not very practical (see "A remove_if for vector<T*>," by Harold Nowak, C/C++ Users Journal, July 2001). Though I have not yet decided how to deal with this, I am leaning toward implementing these few error-prone functions as member functions.

While I've focused on the ptr_vector class, the same rules apply to all the standard containers. Thus, you have a wide range of pointer containers that suit almost any special situation. For sequences, the default choice should (as usual) be ptr_vector<T>. A ptr_list<T> should be reserved for the rare cases where you have a large container and insertions/deletions are done at places other than the two ends. By a "large container," I mean one that holds more than 100 elements (but this is only a rough estimate and the guideline differs from platform to platform). You have to remember that a list node often (but not always with special allocators) has to be on the heap each time an insertion takes place. Such an allocation can easily cost the same as moving 100 pointers in a ptr_vector<T>.

Conclusion

Clearly, implementing your own pointer containers is not a walk in the park. However, once done, you have a useful and safe utility that enables flawless object-oriented programming.

If you consider all the extra work that you have to do to make containers exception safe, it should not be surprising that garbage collection can be just as fast as manual memory management. The downside to garbage collectors, of course, is that they waste more memory (for example, a compacting garbage collector will probably double the memory used). It is ironic that with the use of smart pointers and pointer containers, we're close to not needing garbage collection at all.

DDJ

Listing One (a)

typedef vector< boost:: shared_ptr<T> > ptr_vector;

(b)

template< class T >
class ptr_vector
{
    std::vector<T*> vec_;
public:
    ~ptr_vector(); // delete objects
    // ... much more
   typedef <something> iterator;
};

Back to article

Listing Two (a)

void  push_back( T* );
template< class I > 
void  insert( iterator before, I first, I last );

(b)

void push_back( T* t )
{ vec_.push_back( t ); }

(c)

void push_back( T* t )
{ 
  std::auto_ptr<T> p( t ); 
  vec_.push_back( t );
  p.release(); 
}

Back to article

Listing Three (a)

void insert( iterator before, I first, I last )
template< class I >
{ 
  while( first != last ) 
  {
      vec_.insert( before, new T( *first ) );
      ++first; 
  } 
}

(b)

void insert( iterator before, I first, I last )
template< class I >
{ 
 while( first != last ) 
 {
  auto_ptr<T> p(new T( *first ));
  before = vec_.insert( before, p.get() );
  ++before; // to preserve order
  ++first; 
  p.release(); 
 }
}

Back to article

Listing Four

void insert( iterator before, I first, I last )  
{
 size_t n = distance(first, last);
 scoped_deleter<T> sd( n ); 
 for( ; first != last; ++first )
     sd.add( new T( *first ) ) );
 vec_.insert( before, sd.begin(), sd.end() );
 sd.release(); // cannot throw      
}

Back to article

Listing Five

typedef boost::indirect_iterator<ptr_iterator>::type iterator;
typedef boost::indirect_iterator<ptr_iterator> iterator;/
ptr_iterator ptr_begin();
ptr_iterator ptr_end();
iterator     begin();
iterator     end();

Back to article

Listing Six (a)

( *v.begin() )->foo(); 
*v.front() = X;
v[4]       = &X; // doh!
v[3]       = new X; // ooops!

(b)

v.begin()->foo();.
v.front()  = X;
v[3]       = &X; // compile error

(c)

vector<int>     v;
ptr_vector<int> v2;
 ...
std::copy( v.begin(), v.end(), std::back_inserter( v2 ) );

Back to article

Listing Seven

// primary template:
template< class T > 
T* allocate_clone( const T& t )
{ return new T( t ); }

// overloaded function template
template< class T >
X<T>* allocate_clone( const X<T>& t )
{ return factory::clone( t ); }
// function template specialization
template<>
Polymorphic* allocate_clone<Polymorphic >( const Polymorphic& p )
{ return p->clone(); }

Back to article

Listing Eight (a)

auto_ptr<T> release_back();
auto_ptr<T> release( iterator );

(b)

auto_ptr<ptr_vector<T> > clone() const;
auto_ptr<ptr_vector<T> > release();
ptr_vector( auto_ptr<ptr_vector<T> > );

(c)

void replace( iterator where, T* x );
void replace( size_type idx, T* x );

Back to article

Listing Nine

template< class I >
void transfer( iterator before, I first, I last, ptr_vector<T>& from );
void transfer( iterator before, I i, ptr_vector<T>& from );

Back to article