Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C++ Made Easier - Handles and Exception Safety, Part 2: Intrusive Reference Counting


October 2002/C++ Made Easier


Introduction

We began our last article by showing how a handle class can simplify memory management in object-oriented programs. We continued by noting that it is possible for such classes to have problems with exception safety, particularly in contexts that involve creating new objects as copies of existing ones.

We learned that it is often possible to forestall such problems by following a general rule: do everything that might throw an exception before doing anything that changes the state of the system. That way, if the exception occurs, there is no need to undo any state changes as part of dealing with the exception.

We also saw that our handle class has a potentially serious drawback: copying a handle always copies the corresponding object. In this article, we’ll see how to remove that drawback while maintaining exception safety. The classic solution to this problem is reference counting. Interestingly, reference counting does just what we want and gives us exception safety almost for free, provided that we use an intrusive strategy — one that makes the reference count part of the object itself. Intrusive reference counting is a fine strategy if we are able to change the class associated with the objects that we are counting so as to include a reference count. Our next article will deal with non-intrusive reference counts, and there we shall see that exception safety is much harder to attain. For the time being, we’ll stick with the simpler intrusive technique.

Our basis for this series of articles is the classic example of a hierarchy of geometric shapes rooted in a common base class:

class Shape_base {
public:
   virtual void draw() const = 0;
   virtual Shape_base* clone() const = 0;
   virtual ~Shape_base();
};

Here, the draw function does whatever is needed to display a shape, and the clone function makes a new copy of whatever kind of shape the object is and returns a pointer to the copy.

Each shape in the hierarchy is responsible for defining these two functions appropriately:

class Circle: public Shape_base {
public:
   // Every kind of shape must
   // define these two functions
   virtual void draw() const;
   virtual Shape_base* clone() const;
private:
   double diameter;
};

As a convenience to our users, we defined a handle class, named Shape, to manage objects of classes derived from Shape_base:

class Shape {
public:
   Shape(Shape_base* p0): p(p0) { }
   ~Shape() { delete p; }

   void draw() const { p->draw(); }
   // other functions as needed


   Shape(const Shape&);
   Shape& operator=(const Shape&);
private:
   Shape_base* p;
};

and we arranged for the copy constructor and assignment operator to copy the corresponding objects as needed:

Shape::Shape(const Shape& s):
   p(s.p->clone()) { }

Shape& Shape::operator=
   (const Shape& source)
{
   Shape_base *new_p =
      source->clone(); // might throw

   // No exceptions possible beyond
   // this point
   delete p;
   p = new_p;
   return *this;
}

A guiding design principle for these classes is that every Shape object corresponds to a single object of a class derived from Shape_base. For example, if we execute:

Shape s = new Circle;

then we have the following situation:

and if we then execute:

Shape s1 = s;

the situation will look like this:

Although this principle of making every Shape object correspond to an object from our hierarchy is easy to implement in code, it also copies objects needlessly. For example, so long as our program modifies neither s nor s1, there is no reason for s and s1 to refer to distinct Circle objects. We could do just as well with a data structure like this:

The remainder of this article will discuss one way of achieving this state of affairs.

Reference Counting

The difficulty in having multiple handles pointing to a single object is in knowing when to delete the object. For instance, we might continue our most recent example by destroying s1 and then s. How does the program know that it cannot destroy the Circle object when we destroy s1, but can (and should) do so after we destroy s?

The classical solution to this problem is to add a member to the base class; this member counts how many handles are attached to the corresponding object. So, for example, we might make our Shape_base class look like this:

class Shape_base {
   friend class Shape;      // added
public:
   Shape_base(): refcnt(0) { }  // added
   virtual void draw() const = 0;
   virtual Shape_base* clone() const = 0;
   virtual ~Shape_base();
private:
   unsigned refcnt;         // added
};

Every Shape_base object will now have a member, named refcnt, which will keep track of how many handles are attached to that particular object. We made the counter unsigned because the number of references can never be negative. We also added a constructor to Shape_base to initialize the counter to 0, because when we create a shape object, there are not yet any handles attached to it. Finally, we have made the Shape handle class a friend, because we intend for operations on Shape objects to be able to affect the value of the reference count.

Let us now turn our attention to the Shape class. We will first change the constructor and destructor to take notice of the reference count:

class Shape {
public:
   // We changed these two functions
   Shape(Shape_base* p0): p(p0) {
      ++p->refcnt;
   }
   ~Shape() {
      if (--p->refcnt == 0)
         delete p;
   }
   void draw() const { p->draw(); }

   // other functions as needed

   // We intend to change these
   // two functions also
   Shape(const Shape&);
   Shape& operator=(const Shape&);

private:
   Shape_base* p;
};

When we construct a Shape object from a pointer to an object of a class derived from Shape_base, we are attaching a handle to the object that did not previously exist. Therefore, we increment the object’s reference count. For example, we might execute:

Shape s = new Circle;

Here, the newly allocated Circle object has a reference count of zero (because we added a constructor to class Shape_base that sets the reference count to zero), and after we have finished constructing s, the reference count will be one. As another example, we might write:

Shape_base* p = new Circle;
Shape s = p;
Shape s1 = p;

Here, we wish to make s and t refer to the same object. Accordingly, when we allocate the object, the reference count will be zero, and after we have initialized s and s1, the reference count will be two:

When we destroy s1, the Shape destructor will decrement the reference count, because doing so does not bring the reference count to: zero, it will not destroy the circle object itself:

At this point, when we destroy s, the Shape destructor will discover that the reference count is now zero and will delete the Circle object accordingly.

The copy constructor for the Shape class is simplicity itself:

Shape::Shape(const Shape& s): p(s.p) {
   ++p->refcnt;
}

We are creating a Shape object as a copy of another one, so we know that the copy will refer to the same object as the original, and that the reference count will be one more than it was previously. Accordingly, we initialize p (i.e., this->p) as a copy of s.p and increment the reference count of the object to which p points.

It is only the assignment operator that requires any thought to speak of:

Shape& operator=(const Shape& s)
{
   ++s.p->refcnt;
   if (--refcnt == 0)
      delete p;
   p=s.p;
   return *this;
}

To understand what’s happening here, let’s start with the assumption that s and t refer to distinct objects:

Now, let’s see what happens when we execute:

s = t;

The first thing that happens is that we increment the reference count of the assignment’s source, in this case the object referred to by t, because that object will eventually have one more handle referring to it than it did previously:

Next, we decrement the reference count corresponding to the assignment’s destination, in this case the object referred to by s:

Doing so brings the reference count to zero, so we delete that object:

Finally, we make the destination’s pointer point at the source object:

What we did not mention about the assignment operator was the importance of incrementing t.p->refcnt before decrementing p->refcnt. To understand this importance, look at what happens if we begin with a single Shape object:

and attempt to assign it to itself by executing:

s = s;

First, we increment s.p->refcnt:

Then, we decrement p->refcnt, which in this case refers to the same object as s.p->refcnt:

When the smoke has cleared, we are back where we started. However, if we had decremented p->refcnt before incrementing s.p->refcnt, self-assignment would lead to disaster, because we would have decremented p->refcnt to zero, deleted the object, and then attempted to increment its reference count again.

Exception Safety and Copying

If we now take a step back and look at these classes together, we make a surprising discovery: there are no exception-safety problems anywhere in this code! The reason is simple: nowhere have we done anything that is capable of throwing an exception. The only operation that is capable of throwing an exception is creating an object of one of the classes in the Shape_base hierarchy, and we never create such objects — only our users do.

Of course, this happy state of affairs comes about only because we have surrendered a facility that our earlier solution had. In that version, copying a Shape object copied the object to which it was attached. In the present version, nothing we can do will ever copy an object from the Shape_base hierarchy. Clearly that won’t do.

A classical solution to this problem is to make it possible for the user of a Shape object to say “I would like to be able to write to the object to which this Shape is attached without affecting any other part of the system.” That requirement, in turn translates into “If the reference count of an object to which this Shape is attached is not 1, make a new copy of that object and attach my Shape to it.” A plausible name for such an operation is make_unique. So, for example, if we have the following situation:

and we call s.make_unique, we would then like the situation to be:

We can achieve this state of affairs by adding a declaration of make_unique to our Shape class:

class Shape {
public:
   Shape(Shape_base* p0): p(p0) {
      ++p->refcnt;
   }
   ~Shape() {
      if (--p->refcnt == 0)
         delete p;
   }

   void make_unique();
   void draw() const { p->draw(); }

   // other functions as needed
   Shape(const Shape&);
   Shape& operator=(const Shape&);

private:
   Shape_base* p;
};

and defining this function as follows:

void Shape::make_unique()
{
   if (p->refcnt > 1) {
      Shape_base* new_p = p->clone();
      --p->refcnt;
      ++new_p->refcnt;
      p = new_p;
   }
}

Why did we define new_p as a local variable rather than doing the following?

// What is wrong with this code?
   if (p->refcnt > 1) {
      --p->refcnt;
      p = p->clone();
      ++p->refcnt;
   }

The reason is that p->clone is the one operation that can possibly throw an exception, and the easiest way to make our code exception safe is to do everything that might throw an exception before we do anything that changes the state of our system. If we change p->refcnt before we call p->clone, and p->clone throws an exception, we have changed the system in a way that is difficult to undo without catching the exception. By calling p->clone first and saving its result, we are assured that throwing an exception will cause no harmful consequences.

Discussion

Our last article discussed a handle class with the property that copying the handle always copies the corresponding object. Such classes are easy to write and to use, but it is better to avoid needless copies if possible.

We can avoid such copies by adding a reference count to the objects to which we attach our handles. Such a reference count is called intrusive, because it requires a change to the subject class, not just to the handle. Once we have added such a reference count, all of our exception-safety problems go away, because we don’t copy any objects.

In practice, however, we cannot get away with never copying any objects. In particular, a program may wish to modify an object without affecting any other handles that might be attached to it. A logical name for such an operation is make_unique.

Because make_unique potentially copies an object, we must consider exception safety when we design it. As we saw in our previous article, we can avoid problems by following the general principle of doing everything that might throw an exception before making any changes to the state of our system.

In our next article, we will change our reference-counting strategy so that it is no longer intrusive. This change makes it possible to design a reference-counted handle class that can be attached to classes written by others. However, it turns out to be significantly more difficult to make such a class exception-safe.

Andrew Koenig is a member of the Large-Scale Programming Research Department at AT&T’s Shannon Laboratory, and the Project Editor of the C++ standards committee. A programmer for more than 30 years, 15 of them in C++, he has published more than 150 articles about C++ and speaks on the topic worldwide. He is the author of C Traps and Pitfalls and co-author of Ruminations on C++.

Barbara E. Moo is an independent consultant with 20 years’ experience in the software field. During her nearly 15 years at AT&T, she worked on one of the first commercial projects ever written in C++, managed the company’s first C++ compiler project, and directed the development of AT&T’s award-winning WorldNet Internet service business. She is co-author of Ruminations on C++ and lectures worldwide.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.