Extending the Reference-Counting Pattern

It's often nice to have storage management that's a bit smarter than you get with auto_ptr but rather less complex than automatic garbage collection. Here's one style of smart pointers that strikes such a careful balance.


September 01, 1998
URL:http://www.drdobbs.com/extending-the-reference-counting-pattern/184403543

September 1998/Extending the Reference-Counting Pattern

Introduction

Talks and heated discussions about conventional C/C++ pointers go far beyond the C/C++ programming community. Some people love them, some hate them. This situation is understandable, as pointers are difficult creatures to handle. Too many things can go wrong:

The situation is aggravated by a false sense that the pointer itself is the object. However, it is a separate object that stores the memory address of an object (or maybe not). Marginally different syntax adds to the confusion:

The above problems lead to memory corruption and memory leaks:

Object* p1;
Object* p2 = new Object;
Object* p3 = new Object;

p3 = p2; //Original p3 has been lost

delete p1; // Memory corruption.
delete p2; // p3 is invalid now.
delete p3; // Memory corruption

The STL template class auto_ptr, originally proposed by Greg Colvin, helps to address several of these problems. The template class provides a safer alternative to conventional pointers. Code using auto_ptr is often cleaner, shorter, and less prone to the memory management problems shown above:

auto_ptr<Object> p1;
auto_ptr<Object> p2 = new Object;
auto_ptr<Object> p3 = new Object;

p3 = p2; // Original p3 has been
         // deleted.
         // p3 owns the object
         // pointed to by p2.

// data will be automatically
// deleted when auto_ptrs
// go out of visibility scope.

auto_ptr introduces the notion of exclusive data ownership. Only one auto_ptr instance owns the data even if there are several instances pointing at the same object. The auto_ptr instance that owns the data deletes the data when it gets reassigned, or goes out of visibility scope.

Unfortunately, the exclusive ownership concept of the auto_ptr class differs in a subtle way from the shared ownership concept of conventional pointers. The code fragment shown in Figure 1 illustrates a case where the pointers work as expected but the auto_ptr objects do not.

The problem arises because the ownership of the object is first transferred to a copy of obj created on the stack and then to the object tmp within the function foo. When tmp goes out of scope it deletes the object! Therefore, when we return to func from foo, obj no longer owns the controlled object. Worse, it does not even point at valid data. The use of template class auto_ptr creates a false sense of security that it will handle memory-related problems for us. But in order to be used safely, the auto_ptr concept has to be modified to support the notion of shared ownership, as conventional pointers do.

The technique for implementing shared ownership here is called the Reference Counting Pattern. The idea behind the technique is to create "smart" pointers that keep track of all instances referring to (owning) an object. No single instance of a smart pointer is solely responsible for object management. You might say that the object manages itself. When no references to the object are left, among all the smart pointers that can refer to it, it will be destroyed automatically.

There are several benefits to this approach:

Greg Colvin submitted his version of smart pointers (template class counted_ptr) to the C++ Standards Committee along with auto_ptr. But counted_pointer was eventually removed from the draft C++ Standard. So I present here my version of smart pointers, using the reference-counting concept. It differs from the usual way of implementing the pattern (see [1-3]) in several ways:

Implementation

Listings 1 and 2 show my implementation of smart pointers using the reference-counting pattern. It uses a two-tiered design composed of template classes Handle and Counted. This approach decouples two fundamentally different tasks and allows them to be implemented separately and cleanly. Reference counting is managed by Counted and lightweight presentation and data sharing are managed by Handle.

The specialization Handle<T> requires that the class T define a default constructor T::T() and a copy constructor T::T(const T&). I consider this not a restriction but good programming style — part of Coplien's requirements for the "canonical" class implementation (see [5]). The internal structure is a little unconventional in that an object of class T is itself embedded in class Counted<T> instead of a pointer to the instance:

class Counted
{.....
   T          _instance;
   uint _num_references;
};

class ConventionalCounted
{.....
   T*         _instance;
   uint _num_references;
};

This approach has the following advantages:

1. Only one block of memory is allocated to hold a T instance together with a reference counter, instead of two separate blocks — for a T instance and a ConventionalCounted instance. The conventional approach incurs at least double the memory overhead of the Counted solution.

2. Counted does not introduce an additional level of indirection as the ConventionalCounted does, so it is at least marginally quicker.

3. A Handle<Data> instance is a natural pointer to a Data instance. Therefore, using a Handle<Data> object instead of Data* pointer is safe in mixed or legacy environments, and in situations such as:

extern void foo(int, ...);

Data* d = new Data;
Handle<Data> h; 

foo(5, o);
foo(5, h); // No Handle-to-Data
           // conversion invoked

4. A Data instance is created and destroyed along with each Handle<Data> instance. The existence of a Handle<Data> object thus guarantees the existence of a Data object. The ConventionalCounted approach inherits the usual uncertainty that comes with conventional pointers. It might represent no data at all. For example:

Data* p;
auto_ptr<Data> a;
Handle<Data> h;

p->data_method(); // Disaster
a->data_method(); // Disaster
h->data_method(); // OK

For the sake of safe and reliable Data resource management, the Handle<Data> object is solely responsible for the creation, accessing, and deletion of a Data object. Such a strong association gives the (accurate) impression that a Handle<Data> object is a Data object, eliminating the need to manipulate and address Data objects directly.

One drawback to this approach, however, is that the internal Data objects are being created with a default constructor only. If a Data object requires special initialization, it has to be done as follows:

Data* p = new Data(name, value);
auto_ptr<Data> a = new Data(name, value);
Handle<Data> h = Data(name, value);

or

Handle<Data> h;
h.initialize(name, value);

From my experience, this limitation is rarely an issue. The overwhelming majority of classes enjoy the benefits of safe and reliable management that the Handle class provides. There are methods of providing customized data initialization (other than default constructors) but such a discussion is beyond the scope of this article.

The reference-counting mechanism is encapsulated within template class Counted, so the Handle<T> class interface is simple and easy to understand. However, I should still say a few words about the two constructors:

Handle(const T&);
Handle(const T*);

Their existence is very unfortunate as they are T-to-Handle converters and copy an externally provided T instance to an internally stored T instance. Ideally they are not needed, since T instances are created within Handle<T>. But in real life, where T instances may be created explicitly, bypassing the Handle<T> mechanism, the constructors have to be used as a bridge. For example, third-party libraries that we are not able to modify may use T directly instead of Handle<T>. Quite a few old C library functions return pointers to statically allocated internal storage. The constructors take care of such unfortunately designed cases:

extern T* lib_func();

T* t = lib_func(); // Unusable without
                   // immediate copying
Handle<T> h = lib_func(); // OK to use

From my experience, the use of the constructors is limited to such areas where new meets old. Such situations can be very well encapsulated and localized.

Not Too Conventional

Being a smart pointer, Handle<Data> is transparent in its interactions with Data. It works very much the same way conventional pointers do, thanks to a family of converters and dereference operators:

operator         T& ();
operator         T* ();
operator   const T& () const;
operator   const T* () const;
const T* operator-> () const;
T*       operator-> ();

The operators properly extend the Handle<Data> transparency into the area of const objects. A const Data* pointer accepts no changes to the data it points at. const Handle<Data> behaves the same. Unfortunately, by comparison, auto_ptr uses a single dereference operator:

T* operator->() const;

Therefore, it does not propagate its constness onto the data it represents and cannot represent const data at all:

class Data
{
   ...
   void change_data();
};

const Data* d = new Data;
const Handle<Data> h;
const auto_ptr<Data> p(d); // Does not compile
const auto_ptr<Data> p(new Data);

d->change_data(); // Correctly does not compile
h->change_data(); // Correctly does not compile
p->change_data(); // Wrongly compiles

The converters always return pointers or references to an internal instance of a Data class. Even non-const converters do that. Here they differ from Kenneth Ngai's implementation published previously in CUJ [2] where non-const converters duplicate the data. In order to coexist with conventional pointers, a Handle instance has to behave like conventional pointers when used in their place. If a non-const pointer to an object is passed to a function, it is expected that the function can change the content of the object. That is the way Handle works when used in the pointer-oriented environment (functions handling Data* directly). It simulates a Data object being passed by reference:

void
change0(Data* data)
{
   data->change(); // changes the Data instance.
}

void
change1(Handle<Data> data)
{
   data->change(); // changes the Data instance
}

void
function()
{
   Data* d = new Data;
   Handle<Data> h;

   change0(d); // changes 'd'
   change0(h); // changes 'h'
   change1(h); // changes 'h'
}

Unfortunately, the converters violate the encapsulation of an internal Data instance within Handle<Data> and expose the instance to the cruelty of the world outside Handle's care. The whole discussion above about the Handle(const Data*) and Handle(const Data&) constructors applies as well to the converters. They serve as the second unfortunate but necessary part of the bridge to legacy libraries that deal with Data* pointers directly:

extern void old_func1(Data*);
extern void old_func1(const Data*);

Handle<Data> h;

old_func1(h); // OK
old_func2(h); // OK

The converters remain largely invisible to the programmer. Sadly, their existence itself is misleading — especially if someone has not grown accustomed to having a Handle<Data> representing a Data instance and tries to access the instance directly:

void
unfortunate(Handle<T> h)
{
   extern void do_something(const T*);

   T* p = h; // Completely unnecessary,
             // but guaranteed by converters.

   do_something(p);  // Unnecessary.
// do_something(h); // Much better and safer as p does
                    // not hang around
                    // and "Handle-to-T" conversion is
                    // implicit.
   delete p;        // Trouble.
}

Therefore, converters are there, but please do not use them. Luckily, developers of a T class can prepare their class for the integrated use within Handle<T> and prevent such an unfortunate usage by prohibiting public creation and deletion of T objects and letting Handle<T> manage resources instead:

class Foo
{
   ...
   friend class Handle<Foo>; 

   private:

  ~Foo ();
   Foo ();
};

or

class Foo
{
   ...
   private:

   operator    new (); // Not implemented.
   operator delete (); // ditto.
};

Handle<T> keeps the tradition of mimicking conventional pointers in assignment operators. The pointers just forget the data they were pointing at before assignment and certainly do not modify the data. Handle<T> does the same. Additionally, if there are no Handle instances left referencing the data, the data will be deleted. This is often a quick, safe, and clean memory-management solution that does not need special garbage-collection mechanisms:

void
func()
{
   Data* p0 = new Data;
   Data* p1 = new Data;
   Handle<Data> h0;
   Handle<Data> h1;

   p1 = p0; // Original p1 data lost unmodified.
   h1 = h0; // Original h1 data deleted unmodified.
}

There are no comparison operators. Implicit conversions are applied, so, comparisons work as expected:

Handle<Data> h0;
Handle<Data> h1 = h0;
Handle<Data> h2 = h0.duplicate();
Handle<Data> h3;

if (h0 == h1) ... // true
if (h0 == h2) ... // false
if (h0 == h3) ... // false

There is no implementation of the address-of operator:

Handle<T>* operator& ();

to prevent its being used. The program should have no access to the address of a Handle<T> instance, since such access bypasses the reference-counting mechanism. Handle<T> instances are passed by value (after all, they are only the size of a single pointer). Well, to be more precise, Handle<T> can be passed by reference. But doing so requires a lot of care because it disables the reference-counting mechanism. Consequently, it is theoretically quicker (++_num_references and --_num_references do not cost much). Therefore, it is not advisable to try to gain the questionable advantage of passing Handle<T> instances by-reference. However, such an operation is needed occasionally:

void
by_value (Handle<Data> h)
{
   Handle<Data> brand_new;

   // h is a stack-based copy
   // The copy(!) is reassigned
   // The original is unchanged 

   h = brand_new;
}

void
by_reference (Handle<Data>& h)
{
   Handle<Data> brand_new;

   // h is the original from the 'function'
   // The original is changed.

   h = brand_new;
}

void
function()
{
   Handle<Data> h;

   by_value(h); // Does not change h
   by_reference(h); // Changes h
}

An Error Processing Extension

A convenient extension to the standard reference counting has been implemented in the area of error processing. Conventional pointers have a nice feature — they have a built-in singular or error value (0, the null pointer). Every time something goes wrong, 0 or an error pointer value can be returned instead of a pointer to data. I find this feature extremely convenient for handling "soft" errors, as opposed to "hard" errors that are best handled by the exception mechanism. Therefore, I have introduced an _error variable within the Handle<T> class (at the expense of the _num_references variable), and a HandleError class for a friendlier error-processing interface.

Two levels of error processing are available, error indication and error reporting. With the former, we do not bother specifying what happened and just indicate an error:

Handle<T>
do_something()
{
   ...
   if (bad) return Handle<T>::error();
   ...
}

void
func()
{
   Handle<T> h = do_something();

   if (h.is_error()) ... // do_something() failed
}

With the latter, we can return a sophisticated report of what actually went wrong:

HandleError bad_value;
HandleError bad_format;
Handle<T>
do_something()
{
   ...
   if (bad-value)  return Handle<T>.error(bad_value);
   if (bad-format) return Handle<T>.error(bad_format);
   ...
}

void
func()
{
   Handle<T> h = do_something();

   if (h.is_error()) ... // An error
   {
      if (h.get_error() ==  bad_value)
         ... // Process bad-value  error.
      if (h.get_error() == bad_format)
         ... // Process bad-format error.
   }

or:

   if (h.get_error() ==   no_error)
         ... // Everything is OK
   else if (h.get_error() ==  bad_value)
         ... // Process bad-value  error
   else if (h.get_error() == bad_format)
         ... // Process bad-format error
}

The area where the Handle<T> class might be used is potentially very large. Obvious examples are situations where functions have to return something more complex than just an integer, like dynamically allocated strings, structures, etc. An example of a simple String class is shown in Listing 3 together with the way of handling various error conditions using Handle.

Bibliography

[1] Scott Meyers. More effective C++ (Addison-Wesley, 1996).

[2] Kenneth Ngai. "A Template for Reference Counting," CUJ, August 1997.

[3] R. G. White. "Advantages and Disadvantages of Unique Representation Patterns," C++ Report, September 1996.

[4] Jonathan S. Shapiro. C++ Toolkit (Prentice Hall, 1991).

[5] James O. Coplien. Advanced C++ (Addison-Wesley, 1992).

[6] Vladimir Batov. "A Quick And Simple Memory Allocator," CUJ, January 1998.

Vladimir Batov is a senior software engineer working for Raytheon Systems Company (Marlborough, MA) and specializing in Air Traffic Control software development. He has 15 years of experience in software development using C/C++. He can be reached at [email protected].

September 1998/Extending the Reference-Counting Pattern/Figure 1

Figure 1: Illustrates case where pointers work as expected but auto_ptr objects do not

void foo(Object* obj)
{
   Object* tmp = obj;
   tmp->something();
}
void func()
{
   Object* obj = new Object;
   foo(obj);
   obj->something();
}

void foo(auto_ptr<Object> obj)
{
   auto_ptr<Object> tmp = obj;
   tmp->something();

   // tmp owns the Object data
   // and deletes them when goes
   // out of visibility scope.
}
void func()
{
   auto_ptr<Object> obj = new Object;
   foo(obj);

   // Object data has been deleted
   // within foo().
   // obj points nowhere!

   obj->something(); // A problem.
}
//End of File
September 1998/Extending the Reference-Counting Pattern/Listing 1

Listing 1: Header file for smart pointers implementation

// Copyright (c) 1996-1997 Vladimir Batov
//
// Permission to use, copy, modify, distribute and sell this
// software and its documentation for any purpose is hereby
// granted without fee, provided that the above copyright notice
// appear in all copies and that both that copyright notice and
// this permission notice appear in supporting documentation.
// I make no representations about the suitability of this software
// for any purpose. It is provided "as is" without express or
// implied warranty.

// Handle<T> requires T class to have T::T() and
// T::T(const T&) constructors.

#ifndef BATOV_HANDLE_H
#define BATOV_HANDLE_H

typedef unsigned int uint;

class HandleError
{
   public:

      ~HandleError () {}
       HandleError () : _error(_allocate()) {}
       HandleError (uint err) : _error(err) {}
   operator   uint () const { return _error; }
   bool operator== (const HandleError& e) const
      { return _error == e._error; }

   static HandleError no_error;
   static HandleError  unknown;

   private:

   uint _error;
   uint _allocate();
};

template<class T/*, class Allocator = MemoryAllocator
                    When supported*/>
class Handle
{
   public:

   typedef HandleError Error;

   private:

   class Counted
   {
      public:

     ~Counted () {}
      Counted ();
      Counted (const T&);

      void dismiss () { if (!--_num_references) delete this; }
      void     use () { ++_num_references; }

      operator         T& ()       { return  _instance; }
      operator   const T& () const { return  _instance; }
      operator         T* ()       { return &_instance; }
      operator   const T* () const { return &_instance; }
      T*       operator-> ()       { return &_instance; }
      const T* operator-> () const { return &_instance; }

//    Possible memory optimization. See [6]. 
//    void*   operator new (size_t)
//       { return   _allocator.allocate(); }
//    void operator delete (void* storage)
//       { _allocator.deallocate(storage); }

      bool  is_shared () const
         { return 1 < _num_references ? true : false; }
      bool   is_error () const { return _error ? true : false; }
      Error get_error () const { return _error; }                
      void  set_error (Error);                                   

      private:

      T               _instance;
      uint _num_references : 16; // Reference counter.
                                 // (<=65535 looks enough)
      uint          _error : 16; // Error conditions flag.
                                 // (<=65535)

//    Possible memory optimization. See [6]. 
//    static MemoryAllocator _allocator;
   };

   public:

  ~Handle ();
   Handle ();
   Handle (const T&);
   Handle (const T*);
   Handle (const Handle<T>&);

   operator T& () { return _counted->operator T&(); }
   operator T* () { return _counted->operator T*(); }
   operator const T& () const { return _counted->operator T&(); }
   operator const T* () const { return _counted->operator T*(); }
   const T* operator-> () const
      { return _counted->operator ->(); }
   T* operator-> () { return _counted->operator ->(); }
   Handle<T>& operator = (const T&);
   Handle<T>& operator = (const T*);
   Handle<T>& operator = (const Handle<T>&);

   static Handle<T> error (Error =Error::unknown);
   void             set_error (Error =Error::unknown);
   Error     get_error () const { return _counted->get_error(); }
   bool      is_error () const { return _counted-> is_error(); }
   bool      is_shared () const { return _counted->is_shared(); }
   Handle<T> duplicate () const; // Convenience method to create 
                                 // a copy of a "T-Handle<T>" pair.
   private:

   Counted* _counted;

   // User has NO ACCESS TO THE ADDRESS OF A HANDLE instance
   // as such access bypasses reference counter mechanism.
   // User is forced to pass it by value (Handle) that is 
   // supervised by the mechanism.

   Handle<T>* operator& (); // Not defined to prevent usage.
};

template<class T>
inline
Handle<T>::Counted::Counted() : _num_references(0), 
                                _error(Error::no_error), 
                                _instance()
{
   // Preferred form of T instance creation.
}

template<class T>
inline
Handle<T>::Counted::Counted(const T& from)
                              : _num_references(0),
                                _error(Error::no_error),
                                _instance(from)
{
   // Should be avoided as an internal copy of 'from'
   // is created.
}

template<class T>
inline
void
Handle<T>::Counted::set_error(Error err)
{
   _error = err < Error::unknown
          ? err
          : Error::unknown;
}

template<class T>
inline
Handle<T> 
Handle<T>::error(Error num) 
{ 
   Handle<T> h; 
   h.set_error(num); 
   return h; 
}

template<class T>
inline
void         
Handle<T>::set_error(Error err) 
{ 
   _counted->set_error(err); 

}

template<class T>
inline
Handle<T>::~Handle()
{
   _counted->dismiss();
}

template<class T>
inline
Handle<T>::Handle() : _counted(new Counted())
{
   _counted->use();
}

template<class T>
inline
Handle<T>::Handle(const T& copy) : _counted(new Counted(copy))
{
   _counted->use();
}

template<class T>
inline
Handle<T>::Handle(const T* copy) 
   : _counted(copy ? new Counted(*copy) : new Counted())
{
   _counted->use();
}

template<class T>
inline
Handle<T>::Handle(const Handle<T>& ref) : _counted(ref._counted)
{
   _counted->use();
}

template<class T>
inline
Handle<T>&
Handle<T>::operator=(const T& src)  //8.
{
   if (*this != &src)
   {
      _counted->dismiss();
      _counted = new Counted(src);
      _counted->use();
   }
   return *this;
}

template<class T>
inline
Handle<T>&
Handle<T>::operator=(const T* src) //8.
{
   if (*this != src)
   {
      _counted->dismiss();
      _counted = src ? new Counted(*src) : new Counted();
      _counted->use();
   }
   return *this;
}

template<class T>
inline
Handle<T>&
Handle<T>::operator=(const Handle<T>& src)
{
   if (this->_counted != src._counted)
   {
      _counted->dismiss();
      _counted = src._counted;
      _counted->use();
   }
   return *this;
}

template<class T>
inline
Handle<T>    
Handle<T>::duplicate() const 
{ 
   return Handle<T>(operator->()); 
}

#endif // BATOV_HANDLE_H

// End of File
September 1998/Extending the Reference-Counting Pattern/Listing 2

Listing 2: HandleError class implementation file

#include "include/handle.h"

HandleError HandleError::no_error(0);

// max of 16 bits
HandleError HandleError:: unknown(65535); 

uint
HandleError::_allocate()
{
   static uint next = 0;
   return ++next;
}
//End of File
September 1998/Extending the Reference-Counting Pattern/Listing 3

Listing 3: An example of Handle class use with a String class


#include <stdio.h>
#include <string.h>
#include <iostream.h>
#include <vbtools/handle.h>

class _String 
{
   public:

   typedef int      Size;
   typedef char* Address;

       ~_String () { delete _text; } 
        _String () : _text(0), _size(0) {}
        _String (Address, Size =0);
   Address text () { return _text; }
   Size    size () { return _size; }
   void    text (Address, Size);

//   private:

   Address _text;
   Size    _size;
};

_String::_String(Address addr, Size sz) : _text(0), _size(0)
{
   text(addr, sz ? sz : addr ? strlen(addr) : 0);   
}

void
_String::text(Address addr, Size size)
{
   delete _text;
   _text = (Address) memcpy(new char[_size = size], addr, size);
}

// String instances are of 'void*' size. Easy to pass around.
// _String resources are handled automatically.

typedef Handle<_String> String;

HandleError string_is_empty;
HandleError string_too_long;

String
foo(String input)
{
   String s;

   if (!input->size()) 
   {
      String err = String::error(string_is_empty);

      // A value can be set to error instance as well.

      err->text("Usage: foo argument.\n");
      return err;
   }
   if (4 < input->size()) return s.error(string_too_long);

   s->text("Changed");

   return s;
}

int 
main(int argc, char** argv)
{
   String  input(1 < argc ? argv[1] : (char*) 0);
   String   orig = input;
   String output = foo(input);

   if (output.is_error())
   {
      if (output.get_error() == string_is_empty)
      {
         // A value and an error condition can be 
         // passed at the same time.

         cout << output->text() << endl; 
      }
      else if (output.get_error() == string_too_long)
      {
         cout << "Argument is too long.\n";
      }
   }
   else
   {
      cout << orig-> text() << endl; // argv[1] printed.
      cout << input->text() << endl; // "Changed" printed.
   }
}

//End of File

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.