C/C++

Safe and Economical Reference-Counting in C++

By Vladimir Batov, June 01, 2000

Smart pointers keep getting smarter.

The functionality is replicated for all possible const and non-const combinations of the two arguments (lines 39-42 in Listing Two and lines 18-21 in Listing Three). These sets of functions help to deliver arguments to an appropriate Data constructor without losing const attributes. For example,

class Data
{ ...
   // Subtly different constructors.
   Data(const Arg1&, const Arg2&);
   Data(      Arg1&, const Arg2&);
};

Arg1             arg;
const Arg2 const_arg;

Handle<Data> dh(arg, const_arg);

The object dh is created with Handle(Arg1&, const Arg2&) (line 19 in Listing Three), which invokes and transfers the arguments to Counted(Arg1&, const Arg2&) (line 57 in Listing Four), which creates an internal Data instance with the Data constructor that best matches provided argument types (line 53 Listing Four). In the example above the constructor will be Data(Arg1&, const Arg&).

Lisiting Four: The Counted class implementation.

 class Counted
 {
    public:
 
   ~Counted() {}
    Counted() : _num_references(0), _instance() {}
 
    void dismiss () { if (!--_num_references) delete this; }
    void     use () { ++_num_references; }
 
    operator   Data& () { return  _instance; }
    operator   Data* () { return &_instance; }
    Data* operator-> () { return &_instance; }
 
    template<class Derived>
    void dyn_cast() const
    {
       dynamic_cast<Derived&>(_instance);
    }
 
    template<class Derived>
    void dyn_cast()
    {
       dynamic_cast<Derived&>(_instance);
    }
 
    private:
 
    typedef unsigned int uint;
 
    mutable uint _num_references; // Reference counter.
    Data               _instance;
 
    Counted(const Counted&); // Not implemented 
 
    public: // New stuff.
 
    // One argument.
 
    template<class Arg1>
    Counted(const Arg1& arg1) 
    : _num_references(0), _instance(arg1) {}
 
    template<class Arg1>
    Counted(Arg1& arg1) 
    : _num_references(0), _instance(arg1) {}
 
    // Two args.
 
 #define TEMPLATE template<class Arg1, class Arg2>
 #define CONSTRUCTOR(Arg1, Arg2)     \
    Counted(Arg1& arg1, Arg2& arg2)  \
    : _num_references(0), _instance(arg1, arg2) {}
 
 TEMPLATE CONSTRUCTOR(const Arg1, const Arg2)
 TEMPLATE CONSTRUCTOR(const Arg1,       Arg2)
 TEMPLATE CONSTRUCTOR(      Arg1, const Arg2)
 TEMPLATE CONSTRUCTOR(      Arg1,       Arg2)
 
 #undef TEMPLATE
 #undef CONSTRUCTOR
 
    // Three args require  8 constructors.
    // Four  args require 16 constructors.
    // Five  args require 32 constructors.
    // Implement when needed.

 };

The simplified versions of the include files (Listings Two and Three) handle up to three arguments. Although the files are likely to grow (according to the maximum number of incoming arguments you need to support), they have a very regular and easy-to-follow structure. Add support for more arguments when you need it.

It is the very general nature of Handle template constructors (Listing Three) that makes it possible to use the following syntax:

// Creates internal Data instance
// using Data::Data(int)
Handle<Data> dh(1);

// Creates internal Data instance
// using Data::Data(int, double)
Handle<Data> dh(1, 2.3);

Unfortunately, that friendly Data management syntax (it specifies how internal Data are created) overlaps with the syntax reserved for Handle copy-constructors. For example:

class Data
{
   ...
   Data(Handle<Other>);
};

// Create a new Handle-Other pair (a new
// Other instance and the first Handle
// pointing to it).
Handle<Other> oh(args);

// Create a new Handle-Data pair (a new
// Data instance and the first Handle
// pointing to it) using
// Data(Handle<Other>) constructor.
Handle<Data> dh = 
   Handle<Data>::create(oh);

// The following four lines do not 
// create new Handle-Data pairs (as the
// previous lines do) but rather create
// additional handles that point to the
// same data of the Other type as oh
// points to.

// Uses "unusual" template 
// copy-constructor.
Handle<Data> dh1(oh);   // 1.
// Uses basic copy-constructor.
Handle<Data> dh2(dh1);  // 2.
// Uses "unusual" template 
// copy-constructor.
Handle<Data> dh3 = oh;  // 3.
// Uses basic copy-constructor.
Handle<Data> dh4 = dh1; // 4.

Handle copy constructors are partial specializations of

template<class Arg1> 
Handle(const Arg1&);
template<class Arg1> Handle(Arg1&);

declared in unofficial.h (lines 3-9 in Listing Three). Therefore, the dh1-dh4 handles shown above are merely copies of oh. They are created using Handle copy constructors and point to the data initially created together with oh. What's more, if Other is not derived from Data, the lines will even fail to compile.

For this reason I introduced the Handle::create(...) functions — to provide a consistent interface for construction. It is the only interface that is able to create a new Handle-Data pair with a Data(Handle<...>) constructor.

That unfortunate inconsistency (and the only one, to my knowledge) "dethroned" the friendly Handle constructor-based interface and made it "unofficial." Nevertheless, I do prefer and use that syntax for its brevity and expressiveness. (Just keep in mind the special case.)

A Not-So-Conventional Pointer

Despite being similar to conventional pointers, Handle<Data> has a far stronger association with Data it represents. For the sake of performance and safe Data resource management, Handle<Data> is solely responsible for the complete life cycle of an associated Data instance — creation, access, and deletion. Therefore, a Handle<Data> instance guarantees the existence and validity of Data:

Data*          dp; // Unassigned, unusable.
auto_ptr<Data> ap; // Unassigned, unusable.
Handle<Data>   dh; // Creates a Data instance if the
                   // class has default constructor.
                   // Else, simply does not compile.
dp->modify(); // Trouble
ap->modify(); // Trouble
dh->modify(); // OK

The strong bond between Data and Handle<Data> supports the notion that objects should be declared when they are needed and, therefore, initialized. It differs from unassigned C-style declarations. This difference must be remembered when making the transition from raw pointers and auto_ptrs to Handles.

Data*          dp1, dp2, dp3;
auto_ptr<Data> ap1, ap2, ap3;
Handle<Data>   dh1, dh2, dh3;

Although the three lines look quite similar, the third one is far from being a mere replacement for the first two. This line actually creates three Handle<Data> instances together with Data instances. Thus, it is a proper replacement for:

Data* dp1 = new Data();
Data* dp2 = new Data();
Data* dp3 = new Data();

I understand that under rare circumstances the initialize-when-declared rule is difficult and/or inefficient to enforce. If that is the case, the function Handle::null (lines 103-108 in Listing One) comes to the rescue:

// Create an empty Handle instance
// No Data are associated with the 
// handle
Handle<Data> h = Handle<Data>::null();

h->access_data(); // Trouble.
...
if (something)
   h = Handle<Data>::create(arg1);
else
   h = Handle<Data>::create(arg2, arg3);

h->access_data(); // OK.

Handle::null returns a special Handle instance — an analog of null pointer — that is not associated with any data. The instance is potentially dangerous, as attempts to access non-existing data are obviously not valid. Therefore, the construction of the instance is explicit and highly visible. So, if you use Handle::null instances and face a mysterious memory corruption problem, start looking for Handle::null calls and then make sure that the corresponding handles are used properly.

Performance

Additional functionality carries additional performance overhead. The cost of passing a Handle<Data> to a function is roughly twice as much as simply passing a Data * pointer. In other words, it takes twice as long to call an empty func(Handle<Data>) as to call an empty func(Data*). Most often the overhead is negligible comparing to the application's overall operational costs. For example, when sorting of an array of ten integer elements, the function qsort is roughly 100 times as expensive as the overhead required to pass in a pointer to the array.

Also, it is often possible to pass Handle by reference. A func(const Handle<Data>&) call does not activate the reference-counting mechanism and does not incur any additional overhead. However, Handle was not designed to be passed by reference as a general technique; you have to understand the implications of doing so.

To Be Developed

For various reasons, I left out some functionality that I would still like to mention briefly.

Traditional C-style error processing (in which a function returns an error status rather than throws an exception) was one of the "features" of the original implementation in my last article. Since then, I grew up to accept Stroustrup's arguments (in The C++ Programming Language, Third Edition, pg. 355-356) favoring the C++ exception mechanism over the traditional C-style error handling. And Handle::null looks sufficient to handle "exceptions that are not errors" (The C++ Programming Language, Third Edition, pg. 374). In other words, conditions that are not "exceptional" enough to signify an error (for example, to indicate a not-found result of a find request). However, if you feel the need for Handle-based C-style error processing, take a look at the counter.h file posted on CUJ web site for a way to incorporate the functionality into Handle without performance sacrifice.
Handle in its presented form employs only the default memory allocation mechanism (global new and delete). The integration of custom allocators in the spirit of STL should be the next logical step in Handle development:

template<class Data, class Allocator =allocator<Data> >
class Handle {...};

Environment

The code for this article was developed and tested on SPARC Solaris 2.6 using gcc-2.95.2. I wouldn't be surprised if older compilers failed to support some of the newer features of the C++ Standard used by this implementation.

Summary of Advantages and Known Limitations

The Handle<Data> implementation here has several advantages over more traditional implementations. The following list is a recap of the advantages mentioned at the beginning of this article:

Handle<Data> is fully automatic.
It is safe.
It imposes no requirements on the Data class (such as modification to include a reference count).
It closely matches the syntax and behavior of conventional pointers.

In addition, Handle has some advantages not yet mentioned:

Handle allocates no additional memory blocks to accommodate reference-counting infrastructure (unlike Kevin S. Van Horn's "Smart Pointers" and Kenneth Ngai. "A Template for Reference Counting.") Therefore, it is quicker (allocates/deletes one memory block instead of two) and uses considerably less memory.
It uses one level of indirection to access both application data and reference-counting infrastructure.
Handle instances are truly lightweight objects, having a size of just one pointer (also unlike the aforementioned articles).
Handle clears up the eternal puzzle of the const Data* const syntax, in which it is difficult to remember which const modifies what. Handle<const Data> represents a non-constant handle to constant Data and const Handle<const Data> is clearly a constant handle to constant Data. Although not terribly important, this feature helps readability.
Handle provides "pointer semantics" and for many applications is a safe and trouble-free replacement for conventional pointers and explicit memory management (new and delete). I often find Handle to be an easier and safer alternative to the standard auto_ptr as well.

Following are a couple of limitations of the current design:

It is not capable of supporting multiple inheritance:

class A {...};
class B {...};
class C : public A, public B {...};

// Upcast.
Handle<A> ah = Handle<C>(); // OK
Handle<B> bh = Handle<C>(); // Problem

However, from my experience, multiple inheritance is an exceptionally rare beast. For my applications, the benefits of Handle generally outweigh this deficiency. However, if your situation is different, you must consider other alternatives.

Reference counting, as a programming concept, does not have many weak points. One of the most often mentioned is that it cannot handle cyclical references. A cyclical reference can occur when two or more objects possess reference-counted handles to each other. If the chain of references starting with any object leads back to that object, it is a cyclical reference. An example would be two instances having embedded handles pointing to each other:

class A
{ ...
   Handle<A> _other;

   void reference(Handle<A> ot) { _other = ot; }
}

Handle<A> a1;
Handle<A> a2;

a1->reference(a2);
a2->reference(a1);

Such a system will fail to handle destruction properly. In this case, there will be a memory leak when the handles go out of scope, because neither will be able to destroy its owned instance of A.

In the current design I do not implement exception handling. In failing to do so, I am not being deliberately malicious. Writing exception-safe software is not a simple task. Analysis of implications caused by exceptions would likely require an article of its own. Therefore, I deliberately avoided the issue. Although I currently use the Handle in its presented form, I consider the implementation being a reasonably well working concept rather than an industrial-strength product. Therefore, criticism and improvements are most welcome.

Additional References

Marshall Cline. C++ FAQ (part 5 of 9). http://www.faqs.org/faqs/C++-faq/part5 and http://www.cerfnet.com/~mpcline/c++-faq-lite/freestore-mgmt.html#[16.22].

Paul T. Miller. "Reference-Counted Smart Pointers and Inheritance," C++ Report, October 1999

Greg Colvin. Specifications for auto_ptr and counted_ptr. comp.std.c++, posted on 25 May 1995.

Vladimir Batov is a senior software engineer currently working for Raytheon Systems Company (Marlborough, MA). During his 16-year career, he has participated in various software development projects including a full-scale nuclear power station simulator, Air Traffic Control systems, and high-availability communication, monitoring, and financial systems in Unix using C/C++. Batov has written several other articles on C++ programming for C/C++ Users Journal. He can be reached at [email protected].

Previous 1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Safe and Economical Reference-Counting in C++

Lisiting Four: The Counted class implementation.

A Not-So-Conventional Pointer

Performance

To Be Developed

Environment

Summary of Advantages and Known Limitations

Additional References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Safe and Economical Reference-Counting in C++

Lisiting Four: The Counted class implementation.

A Not-So-Conventional Pointer

Performance

To Be Developed

Environment

Summary of Advantages and Known Limitations

Additional References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content