"Just Enough" Thread Safety

By Herb Sutter, September 01, 2004

When it comes to thread-safety, "good enough can be just that.

September, 2004: "Just Enough" Thread Safety

Herb Sutter (http://www.gotw.ca/) is a leading authority and trainer on C++ software development. He chairs the ISO C++ Standards committee and is a Visual C++ architect for Microsoft, where he is responsible for leading the design of C++ language extensions for .NET programming. His two most recent books are Exceptional C++ Style and C++ Coding Standards (forthcoming October 2004).

This column is a bit odd because it's about a nonproblem. It's a nonproblem, however, that people keep reporting to me as though it were a problem—including bright and experienced people, including even multiple fellow CUJ columnists. So I decided to write a column about why the problem isn't real.

In More Exceptional C++ [1] Items 13-16 and Appendixes A and B, I covered in some detail the key issues related to making copy-on-write (COW) strings thread-safe. This month's column presents the essence of the code in question, describes two common forms of the reported problem, which claims that the code apparently still isn't thread-safe for a subtle reason, and then explains why the problem is a phantom for an equally subtle but more basic reason.

Along the way, we'll recall the key division of responsibilities and related techniques for writing thread-safe code.

Let's dive right in.

Motivating Example: COW Strings, Single Threaded

Just to keep things interesting for the many readers for whom this background section is review, I'll gratuitously inject some new idioms into the code. For example, you'll notice that copy assignment takes its parameter by value, that Swap returns a reference to *this for the convenience of many of its callers, and that EnsureUnique's implementation is restructured to flatten it out and remove nesting. I also decided to make EnsureUnique a const member function and data_ a mutable member; even though it doesn't matter directly in the excerpted functions shown, in general EnsureUnique should be called at the beginning of every possibly mutating operation, and some of those can be const operations. (For example, const char& operator[]( size_t ) const yields a reference into the String that the user could use to change the String's contents via a const_cast. Although such usage style might be questionable, in a complete String we would probably want to support it.)

Consider a few essential functions of a sample copy-on-write (reference-counted) String class. This example is based on one of the examples given in [1]:

// Example 1: Parts of a basic COW String
// (Note: NOT safe for multithreaded use)
//
class String {
public:
  String();
  String( const String& );
  String& operator=( String );
  ~String();
  String& Swap( String& other ) /* throw() */;
  void Append( char );
  // ... eliding other stuff, like copy 
  // assignment, Length, operator[], etc. ...
private:
  void EnsureUnique( size_t ) const;
  struct StringBuf {
    // ... I've elided a constructor that takes 
   // a size hint, various helper functions, 
    // and the string's representation here ...
    size_t refs;	// integral reference 				// count
  };
  mutable StringBuf* data_;
};

In a single-threaded world, implementing the Big Four special member functions is straightforward:

// Example 1, continued
//
String::String() : data_( new StringBuf ) { }

String::String( const String& other ) {
  data_ = other.data_; // shallow copy; just 	          		// point at the same data_
  // update shared reference count
  ++data_->refs;		
}
String& String::operator=( String temp ) {	// NB: by value
  return Swap( temp );
}
String::~String() {
  if( —data_->refs < 1 ) // last one out, 
    turn off the lights
    delete data_;
}
String& String::Swap( String& other ) 
  /* throw() */ {
  // swap our guts with others guts
  swap( data_, other.data_ ); 
  return *this;   // this return can be 			        // convenient for callers
}

That's it for the Big Four special member functions and their canonical nofail-safe Swap helper. (For further discussion of the nofail and other error-safety guarantees, see last month's column [2].)

Next is our sample mutating member function Append, which has to call EnsureUnique to force a deep copy before it modifies the string:

// Example 1, continued
//
void String::Append( char c ) {
  // perform deep copy if needed, and request  
  // room for an additional char
  EnsureUnique( data_->Length() +1 );
  // ... do the work of appending to data_ ... 
}

Finally, the EnsureUnique function is where all the deep copying happens. EnsureUnique takes a hint at what the split-off representation's minimum capacity should be, and passes that onward to a helpful StringBuf constructor that takes the minimum-capacity hint (not shown):

// Example 1, continued
//
void String::EnsureUnique( size_t n ) const {
  // if we're not shared, and our existing capacity
  // is already sufficient, then there's nothing to do
  if( data_->refs < 2 && data_->Capacity() >= n )
    return;
  StringBuf* newdata = new StringBuf( *data_, n );
  if( —data_->refs < 1 )	// last one out, turn off the lights
    delete data_;
  data_ = newdata;
}

Enter the Hydra: COW Strings, Multithreaded

What does it take to make this String safe for use in multithreaded environments? The answer, in short, is this:

"We must make String exactly as safe to use as it would be if it didn't use copy-on-write and do under-the-covers data sharing. It's wasteful to do more, and just plain wrong to do less."

Note: Doing more than this minimum, such as serializing all access to all member functions in an attempt to allow calling code to avoid the need to do any locking whatsoever, is technically possible but almost certainly misguided and therefore strongly discouraged. Synchronizing all member functions is not only hideously wasteful and almost always unnecessary, but it's almost always insufficient to boot. (See Appendix A of [1] for an example.) The well-intentioned authors of the Java 1.0 containers learned this lesson the hard way.

So String has to do just enough extra serialization work to enable calling code to work correctly as long as the calling code does its usual thing for thread-safe operation, namely to serialize access to individual visible String objects that the calling code knows are or can be actually shared across threads.

To illustrate: Say that a program has only two String objects, s1, which is shared across threads and s2, which is never shared. Then the calling code has to perform locking whenever it wants to manipulate s1, but it must not be required to do anything special when manipulating s2. That works fine by default with non-COW string types, but the COW String in Example 1 fails miserably here because the two distinct String objects s1 and s2 might share data under the covers, and thus invisibly be not fully distinct at all. Serializing all manipulations of s1 is no longer enough for the program to ensure that it's serializing all manipulations of s1's contents, because those contents can also be got at via s2. In fact, even if the calling code was aware of possible sharing issues, the calling code still can't lock all the right things because it has no way of knowing that (or when) two apparently distinct String objects aren't really fully distinct. In our example, there might not even be a single piece of code that knows about both strings s1 and s2; they could easily be in modules that don't even know about each other and could still be sharing data under the covers if they both happen to start life as copies of a common third String s3 somewhere else. (For a detailed example of how this can happen, see the error-reporting subsystem example in Appendix A of [1].)

So, what is the minimum responsibility that our COW String has to accept to buck up and behave itself? String must take responsibility, not for solving all possible thread-safety problems, but only for solving its share: If it uses copy-on-write, then it must do enough extra serialization work of its own to make itself as safe to use as if it wasn't sharing data under the covers.

Example 2 shows one of the least inefficient ways to do it. The key observation is that we only need to serialize access to the reference count itself, and this can be done most efficiently using system-specific atomic integer operations, rather than more heavyweight options such as mutexes (which would still be correct but are needlessly expensive). I'll postulate the following three atomic integer functions of interest:

IAIncr increments the integer value by 1 and returns the new value.
IADecr similarly decrements it by 1 and returns the new value.
IAComp returns -1, 0, or 1 if the integer is less than, equal to, or greater than a given value, respectively, much like strcmp.

Most platforms have these functions under various names. (For alternative strategies and implementations other than atomic integer operations, and an analysis of relative costs and opportunities, see [1].)

Here's the corrected code, showing only the three functions that need to change from Example 1:

// Example 2: Changes to Example 1's COW String
// to make it safe for multi-threaded use
//
String::String( const String& other ) {
  data_ = other.data_;
  IAIncr( data_->refs );
}
String::~String() {
  if( IADecr( data_->refs ) < 1 )
    delete data_;
}

That's two out of the three that need to change. Finally, the only other copying code is the actual deep copy in EnsureUnique. EnsureUnique must be called by other String member functions before every possibly mutating operation, here Append. After all, we wouldn't want appending a character to one caller-visible String object to also append it to some other distinct String object, just because the two were sharing state and forgot to unshare it in time. So that's where all the actual deep-copying logic is found, and it's the only other function that needs to change:

// Example 2, continued
//
void String::EnsureUnique( size_t n ) const {
  // if we're not shared, and our existing capacity
  // is already sufficient, then there's nothing to do
  if( IAComp( data_->refs, 2 ) < 0 && data_->Capacity() >= n )
    return;
  StringBuf* newdata = new StringBuf( *data_, n );
  if( IADecr( data_->refs ) < 1 ) 	// Line A
    delete data_;
  data_ = newdata;
}
// ... all other functions are the same as in Example 1 ...

Note that the test on Line A now protects against two different kinds of conditions. First, as in Example 1, it protects against the case where we're allocating a new StringBuf even when the String isn't shared (data_->refs < 2) because we need to grow to accommodate the new character being appended (data_->Capacity() < n).

The second condition is a subtle race condition that arises only in multithreaded use: What if two String objects s1 and s2 are sharing this representation (that is, the reference count is 2), and one thread calls Append on s1 at about the same time as another thread calls Append on s2? It could be that both threads get to Line A at the same time. Of course, both threads can't perform the IADecr simultaneously; that's the whole reason IADecr exists after all. But suppose that thread 1 performs its IADecr and then is immediately interrupted, thread 2 performs its IADecr and runs to the end of the function, and then thread 1 resumes: Even though the reference count was 2 at the beginning of the function, it's 0 now that this thread is letting go. That means that after its call to IADecr, thread 1 discovers that it has got two copies of the StringBuf buffer, both of which it now owns uniquely. What the threads have done is to each take their own copy of the StringBuf, only to discover that they're going to throw away the original StringBuf. Maybe, in a sense, one of the deep copies was useless because if thread 1 had somehow known this race condition would happen, it could have avoided taking its copy only to throw away the original StringBuf; but, since there's no way it could know, we accept that in such rare cases, we might incur a bit of needless deep copying work. Eh bien, c'est la vie.

That's it; we're done. The other functions in Example 1 are fine as written.

Segue: What About Swap?

Did you notice that we didn't have to change Swap? That might bear some explanation, and it's a useful segue into our ultimate discussion of the (alleged) bugs.

It might be tempting to reason that, as currently written, Swap only performs a pointer swap, but even that can't be assumed to be an atomic operation. Fortunately that doesn't matter because, as it turns out, Swap doesn't need additional protection within String even if it were to become considerably more complex than it is in Example 1. Why not? Consider the two main scenarios where Swap is used: Either calling code calls it directly, as in:

a.Swap( b );

or else it calls it indirectly via the copy assignment operator, as in:

a = b;

where a and b are distinct, visible String objects in both cases. But clearly such code knows that two String objects are involved, and therefore the calling code is required to serialize each of the two above statements with respect to all other uses of a and b, and so there is no problem. (Even if there are other String objects sharing representations with either a or b, that doesn't matter; we're not affecting reference counts or anything here.)

(Alleged) Bug #1: Copy Construction

The foregoing discussion about Swap leads nicely into the discussion about the nonbugs in Example 2 because it turns out that all of the so-called "bugs" stem from confusion about who is responsible for what part of thread-safety.

To see the first alleged bug, recall the copy constructor from Example 2:

// Repeated from Example 2
//
String::String( const String& other ) {
  data_ = other.data_;
  IAIncr( data_->refs );
}

The objection is that the above code is not safe for multithreaded use and that the IAIncr should be performed first. "After all, say other's reference count is originally 1," the reasoning goes. "Then what if thread 1 executes this code and successfully assigns data_ = other.data_; but is immediately interrupted before setting the reference count to 2, then another thread 2 destroys the other object, which will drop other.data_->refs to 0 and therefore delete other.data_, and then thread 1 resumes and tries to increment data_->refs where data_ is now pointing to freed memory?" If that could happen, it certainly would be a problem. Fortunately, it can't happen—not if everyone is playing by the rules.

The fallacy in this argument is that the problem scenario can't happen if the calling code is meeting its required and normal responsibility for thread-safety. Remember, calling code that knows an object is shared across threads is responsible for serializing all access to it. In this case, calling code is responsible for serializing access to all visible String objects that are shared across threads. If it does so, then the above problem scenario can't happen. Why not? Because the problem scenario assumes that thread 1 is executing the copy constructor to take a copy of a String object s—while thread 2 is simultaneously destroying that same visible object s! It is the calling code, and not String, that is thread-unsafe if such a scenario can really happen. Calling code is required to prevent such simultaneous access to the same String object.

The bottom line: A COW String doesn't have to solve all thread-safety issues. It only has to make itself as safe for multithreaded use as it would have been if it wasn't using COW.

(Alleged) Bug #2: Append While Copying

Now consider a similar alleged bug in the EnsureUnique deep-copying function:

// Repeated from Example 2
//
void String::EnsureUnique( size_t n ) const {
  // if we're not shared, and our 
  // existing capacity is already
  // sufficient, then there's nothing to do
  if( IAComp( data_->refs, 2 ) 
    < 0 && data_->Capacity() >= n )
    return;
  StringBuf* newdata = new 
     StringBuf( *data_, n );
  if( IADecr( data_->refs ) < 1 ) // Line A
    delete data_;
  data_ = newdata;
}

Here, the objection is similar: "Say the reference count is originally 1," the reasoning goes. "Then what if thread 1 is executing this code on a String object s and successfully calls IADecr in Line A to drop refs to 0 but then is immediately interrupted, then thread 2 takes a copy of the same string (for example, with code like String s2 = s;), which increments the reference count again to bring it back to 1, and then thread 1 resumes and deletes data_, which leaves thread 2's copy of the string with a data_ that points to freed memory?" Again, if that could happen, it certainly would be a problem.

By now, the fallacy is probably obvious: This scenario presumes that thread 1 is executing some possibly mutating function on a String object s (for example, with code like s.Append('a');) while thread 2 is simultaneously taking a copy of that same visible object s (for example, with code like String s2 = s;)! This scenario can only occur if the calling code fails to correctly serialize access to the visible String object s, because s is used by multiple threads and therefore the code that's using s is responsible for making sure that s is never used simultaneously on different threads. It should definitely not be appending to s on one thread while taking a copy of s on another.

Summary

The issues described in this column apply not only to copy-on-write techniques for strings, but to techniques in any class that cause shared data under the covers between two different observable objects that appear distinct but really aren't deeply distinct.

In the presence of such implementation sharing, such a class must take responsibility, not for solving all possible thread-safety problems, but only for solving its share: If it uses copy-on-write, then it must do enough extra serialization work of its own to make itself as safe to use as if it wasn't sharing data under the covers.

Code that uses an object is responsible for knowing whether the object is shared across threads and, if so, for serializing access to the object. This is the normal duty of care required for thread-safe coding in general, and it applies to any shared object, no matter whether it is some kind of string object, or an STL container like a vector, or an object of some newfangled type that was authored yesterday by your team lead's great-aunt's youngest second cousin once removed. Specifically, it applies regardless of whether the object is covertly sharing state with other objects under the covers; if the shared object is doing such things, then it is responsible for guaranteeing that the sharing shenanigans don't matter.

The shared object isn't responsible for all possible thread-safety or for solving world hunger. It's only responsible for restoring "just enough" thread safety to make it possible for calling code to correctly fulfill its usual duty of care by serializing access to shared visible objects. And this is one of the rare cases in software development where "good enough" really is good enough.

References

[1] Sutter, H. More Exceptional C++, Addison-Wesley, 2002.
[2] Sutter, H. "When and How To Use Exceptions," C/C++ Users Journal, 22(8), August 2004.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

"Just Enough" Thread Safety

Motivating Example: COW Strings, Single Threaded

Enter the Hydra: COW Strings, Multithreaded

Segue: What About Swap?

(Alleged) Bug #1: Copy Construction

(Alleged) Bug #2: Append While Copying

Summary

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

"Just Enough" Thread Safety

Motivating Example: COW Strings, Single Threaded

Enter the Hydra: COW Strings, Multithreaded

Segue: What About Swap?

(Alleged) Bug #1: Copy Construction

(Alleged) Bug #2: Append While Copying

Summary

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content