Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Extending the Reference-Counting Pattern


September 1998/Extending the Reference-Counting Pattern

Introduction

Talks and heated discussions about conventional C/C++ pointers go far beyond the C/C++ programming community. Some people love them, some hate them. This situation is understandable, as pointers are difficult creatures to handle. Too many things can go wrong:

  • Pointers can point nowhere, being created but not assigned.
  • Having been assigned to, they do not guarantee the validity of data they point at. The data can be deleted, leaving the pointer unaware of the deletion.
  • Pointers have no strong associations with the data they represent and can lose track of the data, being reassigned to point somewhere else.

The situation is aggravated by a false sense that the pointer itself is the object. However, it is a separate object that stores the memory address of an object (or maybe not). Marginally different syntax adds to the confusion:

  • data.something means that data is an object of some class.
  • data->something means that data is a pointer that designates an object of some class.

The above problems lead to memory corruption and memory leaks:

Object* p1;
Object* p2 = new Object;
Object* p3 = new Object;

p3 = p2; //Original p3 has been lost

delete p1; // Memory corruption.
delete p2; // p3 is invalid now.
delete p3; // Memory corruption

The STL template class auto_ptr, originally proposed by Greg Colvin, helps to address several of these problems. The template class provides a safer alternative to conventional pointers. Code using auto_ptr is often cleaner, shorter, and less prone to the memory management problems shown above:

auto_ptr<Object> p1;
auto_ptr<Object> p2 = new Object;
auto_ptr<Object> p3 = new Object;

p3 = p2; // Original p3 has been
         // deleted.
         // p3 owns the object
         // pointed to by p2.

// data will be automatically
// deleted when auto_ptrs
// go out of visibility scope.

auto_ptr introduces the notion of exclusive data ownership. Only one auto_ptr instance owns the data even if there are several instances pointing at the same object. The auto_ptr instance that owns the data deletes the data when it gets reassigned, or goes out of visibility scope.

Unfortunately, the exclusive ownership concept of the auto_ptr class differs in a subtle way from the shared ownership concept of conventional pointers. The code fragment shown in Figure 1 illustrates a case where the pointers work as expected but the auto_ptr objects do not.

The problem arises because the ownership of the object is first transferred to a copy of obj created on the stack and then to the object tmp within the function foo. When tmp goes out of scope it deletes the object! Therefore, when we return to func from foo, obj no longer owns the controlled object. Worse, it does not even point at valid data. The use of template class auto_ptr creates a false sense of security that it will handle memory-related problems for us. But in order to be used safely, the auto_ptr concept has to be modified to support the notion of shared ownership, as conventional pointers do.

The technique for implementing shared ownership here is called the Reference Counting Pattern. The idea behind the technique is to create "smart" pointers that keep track of all instances referring to (owning) an object. No single instance of a smart pointer is solely responsible for object management. You might say that the object manages itself. When no references to the object are left, among all the smart pointers that can refer to it, it will be destroyed automatically.

There are several benefits to this approach:

  • It safely replaces conventional pointers by a class that closely models their behavior.
  • It allows for pointer objects that are reasonably transparent and non-intrusive.
  • It optimizes resource usage through shared access to objects.
  • It simplifies memory management much as auto_ptr does.

Greg Colvin submitted his version of smart pointers (template class counted_ptr) to the C++ Standards Committee along with auto_ptr. But counted_pointer was eventually removed from the draft C++ Standard. So I present here my version of smart pointers, using the reference-counting concept. It differs from the usual way of implementing the pattern (see [1-3]) in several ways:

  • It allocates no additional memory blocks.
  • It does not introduce an extra level of indirection.
  • It forms a natural pointer to data and is therefore reasonably safe to use in mixed or legacy environments.
  • It is extended with error processing facilities.

Implementation

Listings 1 and 2 show my implementation of smart pointers using the reference-counting pattern. It uses a two-tiered design composed of template classes Handle and Counted. This approach decouples two fundamentally different tasks and allows them to be implemented separately and cleanly. Reference counting is managed by Counted and lightweight presentation and data sharing are managed by Handle.

The specialization Handle<T> requires that the class T define a default constructor T::T() and a copy constructor T::T(const T&). I consider this not a restriction but good programming style — part of Coplien's requirements for the "canonical" class implementation (see [5]). The internal structure is a little unconventional in that an object of class T is itself embedded in class Counted<T> instead of a pointer to the instance:

class Counted
{.....
   T          _instance;
   uint _num_references;
};

class ConventionalCounted
{.....
   T*         _instance;
   uint _num_references;
};

This approach has the following advantages:

1. Only one block of memory is allocated to hold a T instance together with a reference counter, instead of two separate blocks — for a T instance and a ConventionalCounted instance. The conventional approach incurs at least double the memory overhead of the Counted solution.

2. Counted does not introduce an additional level of indirection as the ConventionalCounted does, so it is at least marginally quicker.

3. A Handle<Data> instance is a natural pointer to a Data instance. Therefore, using a Handle<Data> object instead of Data* pointer is safe in mixed or legacy environments, and in situations such as:

extern void foo(int, ...);

Data* d = new Data;
Handle<Data> h; 

foo(5, o);
foo(5, h); // No Handle-to-Data
           // conversion invoked

4. A Data instance is created and destroyed along with each Handle<Data> instance. The existence of a Handle<Data> object thus guarantees the existence of a Data object. The ConventionalCounted approach inherits the usual uncertainty that comes with conventional pointers. It might represent no data at all. For example:

Data* p;
auto_ptr<Data> a;
Handle<Data> h;

p->data_method(); // Disaster
a->data_method(); // Disaster
h->data_method(); // OK

For the sake of safe and reliable Data resource management, the Handle<Data> object is solely responsible for the creation, accessing, and deletion of a Data object. Such a strong association gives the (accurate) impression that a Handle<Data> object is a Data object, eliminating the need to manipulate and address Data objects directly.

One drawback to this approach, however, is that the internal Data objects are being created with a default constructor only. If a Data object requires special initialization, it has to be done as follows:

Data* p = new Data(name, value);
auto_ptr<Data> a = new Data(name, value);
Handle<Data> h = Data(name, value);

or

Handle<Data> h;
h.initialize(name, value);

From my experience, this limitation is rarely an issue. The overwhelming majority of classes enjoy the benefits of safe and reliable management that the Handle class provides. There are methods of providing customized data initialization (other than default constructors) but such a discussion is beyond the scope of this article.

The reference-counting mechanism is encapsulated within template class Counted, so the Handle<T> class interface is simple and easy to understand. However, I should still say a few words about the two constructors:

Handle(const T&);
Handle(const T*);

Their existence is very unfortunate as they are T-to-Handle converters and copy an externally provided T instance to an internally stored T instance. Ideally they are not needed, since T instances are created within Handle<T>. But in real life, where T instances may be created explicitly, bypassing the Handle<T> mechanism, the constructors have to be used as a bridge. For example, third-party libraries that we are not able to modify may use T directly instead of Handle<T>. Quite a few old C library functions return pointers to statically allocated internal storage. The constructors take care of such unfortunately designed cases:

extern T* lib_func();

T* t = lib_func(); // Unusable without
                   // immediate copying
Handle<T> h = lib_func(); // OK to use

From my experience, the use of the constructors is limited to such areas where new meets old. Such situations can be very well encapsulated and localized.

Not Too Conventional

Being a smart pointer, Handle<Data> is transparent in its interactions with Data. It works very much the same way conventional pointers do, thanks to a family of converters and dereference operators:

operator         T& ();
operator         T* ();
operator   const T& () const;
operator   const T* () const;
const T* operator-> () const;
T*       operator-> ();

The operators properly extend the Handle<Data> transparency into the area of const objects. A const Data* pointer accepts no changes to the data it points at. const Handle<Data> behaves the same. Unfortunately, by comparison, auto_ptr uses a single dereference operator:

T* operator->() const;

Therefore, it does not propagate its constness onto the data it represents and cannot represent const data at all:

class Data
{
   ...
   void change_data();
};

const Data* d = new Data;
const Handle<Data> h;
const auto_ptr<Data> p(d); // Does not compile
const auto_ptr<Data> p(new Data);

d->change_data(); // Correctly does not compile
h->change_data(); // Correctly does not compile
p->change_data(); // Wrongly compiles

The converters always return pointers or references to an internal instance of a Data class. Even non-const converters do that. Here they differ from Kenneth Ngai's implementation published previously in CUJ [2] where non-const converters duplicate the data. In order to coexist with conventional pointers, a Handle instance has to behave like conventional pointers when used in their place. If a non-const pointer to an object is passed to a function, it is expected that the function can change the content of the object. That is the way Handle works when used in the pointer-oriented environment (functions handling Data* directly). It simulates a Data object being passed by reference:

void
change0(Data* data)
{
   data->change(); // changes the Data instance.
}

void
change1(Handle<Data> data)
{
   data->change(); // changes the Data instance
}

void
function()
{
   Data* d = new Data;
   Handle<Data> h;

   change0(d); // changes 'd'
   change0(h); // changes 'h'
   change1(h); // changes 'h'
}

Unfortunately, the converters violate the encapsulation of an internal Data instance within Handle<Data> and expose the instance to the cruelty of the world outside Handle's care. The whole discussion above about the Handle(const Data*) and Handle(const Data&) constructors applies as well to the converters. They serve as the second unfortunate but necessary part of the bridge to legacy libraries that deal with Data* pointers directly:

extern void old_func1(Data*);
extern void old_func1(const Data*);

Handle<Data> h;

old_func1(h); // OK
old_func2(h); // OK

The converters remain largely invisible to the programmer. Sadly, their existence itself is misleading — especially if someone has not grown accustomed to having a Handle<Data> representing a Data instance and tries to access the instance directly:

void
unfortunate(Handle<T> h)
{
   extern void do_something(const T*);

   T* p = h; // Completely unnecessary,
             // but guaranteed by converters.

   do_something(p);  // Unnecessary.
// do_something(h); // Much better and safer as p does
                    // not hang around
                    // and "Handle-to-T" conversion is
                    // implicit.
   delete p;        // Trouble.
}

Therefore, converters are there, but please do not use them. Luckily, developers of a T class can prepare their class for the integrated use within Handle<T> and prevent such an unfortunate usage by prohibiting public creation and deletion of T objects and letting Handle<T> manage resources instead:

class Foo
{
   ...
   friend class Handle<Foo>; 

   private:

  ~Foo ();
   Foo ();
};

or

class Foo
{
   ...
   private:

   operator    new (); // Not implemented.
   operator delete (); // ditto.
};

Handle<T> keeps the tradition of mimicking conventional pointers in assignment operators. The pointers just forget the data they were pointing at before assignment and certainly do not modify the data. Handle<T> does the same. Additionally, if there are no Handle instances left referencing the data, the data will be deleted. This is often a quick, safe, and clean memory-management solution that does not need special garbage-collection mechanisms:

void
func()
{
   Data* p0 = new Data;
   Data* p1 = new Data;
   Handle<Data> h0;
   Handle<Data> h1;

   p1 = p0; // Original p1 data lost unmodified.
   h1 = h0; // Original h1 data deleted unmodified.
}

There are no comparison operators. Implicit conversions are applied, so, comparisons work as expected:

Handle<Data> h0;
Handle<Data> h1 = h0;
Handle<Data> h2 = h0.duplicate();
Handle<Data> h3;

if (h0 == h1) ... // true
if (h0 == h2) ... // false
if (h0 == h3) ... // false

There is no implementation of the address-of operator:

Handle<T>* operator& ();

to prevent its being used. The program should have no access to the address of a Handle<T> instance, since such access bypasses the reference-counting mechanism. Handle<T> instances are passed by value (after all, they are only the size of a single pointer). Well, to be more precise, Handle<T> can be passed by reference. But doing so requires a lot of care because it disables the reference-counting mechanism. Consequently, it is theoretically quicker (++_num_references and --_num_references do not cost much). Therefore, it is not advisable to try to gain the questionable advantage of passing Handle<T> instances by-reference. However, such an operation is needed occasionally:

void
by_value (Handle<Data> h)
{
   Handle<Data> brand_new;

   // h is a stack-based copy
   // The copy(!) is reassigned
   // The original is unchanged 

   h = brand_new;
}

void
by_reference (Handle<Data>& h)
{
   Handle<Data> brand_new;

   // h is the original from the 'function'
   // The original is changed.

   h = brand_new;
}

void
function()
{
   Handle<Data> h;

   by_value(h); // Does not change h
   by_reference(h); // Changes h
}

An Error Processing Extension

A convenient extension to the standard reference counting has been implemented in the area of error processing. Conventional pointers have a nice feature — they have a built-in singular or error value (0, the null pointer). Every time something goes wrong, 0 or an error pointer value can be returned instead of a pointer to data. I find this feature extremely convenient for handling "soft" errors, as opposed to "hard" errors that are best handled by the exception mechanism. Therefore, I have introduced an _error variable within the Handle<T> class (at the expense of the _num_references variable), and a HandleError class for a friendlier error-processing interface.

Two levels of error processing are available, error indication and error reporting. With the former, we do not bother specifying what happened and just indicate an error:

Handle<T>
do_something()
{
   ...
   if (bad) return Handle<T>::error();
   ...
}

void
func()
{
   Handle<T> h = do_something();

   if (h.is_error()) ... // do_something() failed
}

With the latter, we can return a sophisticated report of what actually went wrong:

HandleError bad_value;
HandleError bad_format;
Handle<T>
do_something()
{
   ...
   if (bad-value)  return Handle<T>.error(bad_value);
   if (bad-format) return Handle<T>.error(bad_format);
   ...
}

void
func()
{
   Handle<T> h = do_something();

   if (h.is_error()) ... // An error
   {
      if (h.get_error() ==  bad_value)
         ... // Process bad-value  error.
      if (h.get_error() == bad_format)
         ... // Process bad-format error.
   }

or:

   if (h.get_error() ==   no_error)
         ... // Everything is OK
   else if (h.get_error() ==  bad_value)
         ... // Process bad-value  error
   else if (h.get_error() == bad_format)
         ... // Process bad-format error
}

The area where the Handle<T> class might be used is potentially very large. Obvious examples are situations where functions have to return something more complex than just an integer, like dynamically allocated strings, structures, etc. An example of a simple String class is shown in Listing 3 together with the way of handling various error conditions using Handle.

Bibliography

[1] Scott Meyers. More effective C++ (Addison-Wesley, 1996).

[2] Kenneth Ngai. "A Template for Reference Counting," CUJ, August 1997.

[3] R. G. White. "Advantages and Disadvantages of Unique Representation Patterns," C++ Report, September 1996.

[4] Jonathan S. Shapiro. C++ Toolkit (Prentice Hall, 1991).

[5] James O. Coplien. Advanced C++ (Addison-Wesley, 1992).

[6] Vladimir Batov. "A Quick And Simple Memory Allocator," CUJ, January 1998.

Vladimir Batov is a senior software engineer working for Raytheon Systems Company (Marlborough, MA) and specializing in Air Traffic Control software development. He has 15 years of experience in software development using C/C++. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.