Channels ▼


Associate Mutexes with Data to Prevent Races

Solution: Associate Data with Locks

Observation: If we knew which mutex covered which data, we could verify at test time that when we access shared data we're holding the right locks. So let's explicitly group the mutex and the data it protects together in code, in the same struct or class behind an interface that we can check at test time:

// Example 1: Illustrating the mutex association pattern
struct MyData {
     // provide access to data 
     vector<int>&    v()       {  <b>assert( m_.is_held() );   return v_; </b>  }
      Widget*  w()             {  <b>assert( m_.is_held() );   return w_;</b>  }
    // provide access to mutex
    void     lock()          {  m_.lock();  }
    bool     try_lock()      {  return m_.try_lock();  }
    void     unlock()        {  m_.unlock();  }
    // hide
    vector<int>    v_;
    Widget*        w_;
    <b>mutex_type     m_;</b>

We require that callers always access the shared object through the accessor methods. That's enough to let us provide automated checking on every use of the shared object, and find many latent race conditions deterministically at test time based on code coverage alone.

Let's consider a few examples of how this helps verify that our program is race-free. First, what if we accidentally try to access the data without taking a lock at all? Consider:

<b>MyData data1 = …, data2 = …;</b>
// Later, but without having taken any locks:
<font color="red">data1.v().push_back( 10 );    // A: error: good, will assert
data2.w()->ProcessYearEnd();    // B: error: good, will assert

In line A, the call to data1.v() first performs assert( data1.m_.is_held() ), and because the lock isn't held the program will stop with a test-time error and point directly to the offending line. Note that we will catch this error automatically and deterministically based on code coverage alone; as long as our testing exercises line A, we will discover the error the first time we try to execute the line. Similarly, line B's the call to data2.w() first performs assert( data2.m_.is_held() ), which will fail deterministically and point directly to the offending line.

What if our testing doesn't do full code coverage, so that we can ship knowing we have no races? For example, what if during testing, we fail to exercise lines A and B at all, or they are callable along multiple paths -- each of which must take the lock -- and we fail to exercise each path at least once? Even with incomplete testing, the worst case is that the error will be consistently diagnosed in production the very first time the offending code path is actually exercised, with a clear diagnostic that points directly to the offending line -- whether or not the potential race condition actually manifests by having another thread performing a conflicting access at the same time. Unlike today, even in the worst case (including no testing at all), the faults are never intermittent, timing-dependent, or hard to reproduce.

Next, what if we try to access the data while having taken a lock, but it's the wrong lock? Consider:

{    // acquire lock on data1
     lock_guard<MyData> hold ( <b>data1</b> );
     <b>data1.v</b>().push_back( 10 );    // C: ok
     <font color="red">data2.w()->ProcessYearEnd();    // D: error: good, will assert</font>

In line C, the call to data1.v() again first performs assert( data1.m_.is_held() ), and this time because the lock is held the program continues normally, having validated that the access to data1.v_ is not a race.

In line D, however, the call to data2.w() first performs assert( data2.m_.is_held() ), which fails because although we are holding a lock, we're not holding the one on data2. Again, because the lock isn't held the program will stop with a test-time error and point directly to the offending line.

There is only one specific abusive caller pattern that can't be checked automatically using this technique. Callers that remember a direct pointer or reference to the object for later use must be prevented by programmer education and team policy:

// The only hole: Require and enforce that programmers don't do this.
vector<int>* sneaky = nullptr;
{              // enter critical section
     lock_guard<MyData> hold( <b>data1</b> );
     sneaky = &data1.v();          // E: compiles, but avoid doing this
     sneaky->push_back( 10 );    // F: ok, but not checked
<font color="red">sneaky->push_back( 10 );      // G: error: race, but won't assert</font>

In line E, we correctly verify that data1.m._is_held() and allow access to the member, but the caller remembers a sneaky pointer or reference to the object. That's unsafe. It's not inherently always wrong: For example, the access in line F is still correct and not a race, but the problem is that we can no longer verify that it's correct. It's unsafe because it opens up the one hole demonstrated in line G, where the call to sneaky->push_back( 10 ) is a race, but still compiles and runs and won't be caught as it bypasses the validation.

The mutex association pattern catches both the "Oops, forgot to lock" and "Oops, took the wrong lock" mistakes. As a bonus, it also provides a reasonable migration for existing source code, where the impact on existing calling code can sometimes be made as small as just adding ()or a similar minor syntactic change.

Finally, and this can't really be overemphasized -- this is a vast improvement over intermittent timing-dependent hard-to-reproduce bug reports from customers.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.