Channels ▼
RSS

Concurrent Access Control & C++


January 04:

In multithreaded environments, several components often access shared resources concurrently. When multiple threads read and write to them without proper synchronization, the shared resources will be in an inconsistent state. For this reason, concurrent access control has been studied extensively in various languages and systems. A preliminary solution is to bracket each critical section in a proper acquire-lock-or-wait and release-lock pair. For example:

mutex lock;
lock.do_lock();
  do something;
lock.do_unlock();

This solves the problem of race condition, but it is tedious and error prone. Monitor [1] is a simple and efficient solution that automatically wraps a critical section in an acquire-lock-or-wait and release-lock pair. Many languages provide the implementations of monitor, including Java's synchronized [2], Modula-3's LOCK [3], and Mesa's MONITOR [4]. For example:

Object lock;
synchronized(lock)
{
   // do something
}

Unfortunately, C++ has no such facility because Standard C++ is silent on the subject of threads. In this article, we show that Standard C++ does let you implement the monitor facility efficiently in only 50 simple lines — and without any extensions to Standard C++.

The Basic Idea

Looking at the acquire-lock-or-wait and release-lock pair again, anyone familiar with C++ immediately realizes that lock is a resource and you should use the "resource acquisition is initialization" idiom [5] and leverage destructors for automatic resource deallocation. For each critical section, there is a corresponding object. The constructor of that object tries to acquire the lock-or-wait and the destructor releases the lock. For example:

class auto_lock
{
public:
    auto_lock(Lock& lock)
        : _lock(&lock) 
		{ _lock->do_lock(); }
    ~auto_lock()
        { _lock->do_unlock(); }
private:
    Lock* _lock;
};

where Lock is some existing Mutex class or something similar that supports do_lock() and do_unlock() operations. The implicitly generated copy constructor can easily lead to errors: The destructors of the copied-to and copied-from objects perform too many do_unlock() operations. Generally, you may declare private copy constructors and assignment operators to ensure that the lock is released exactly once. But we won't do it here. The reason will be clear later.

To serialize the access to a shared resource, you must declare an auto_lock object before any other statements at the beginning of a critical section:

Mutex lock;
{
   auto_lock lock(&lock);
   // do something
}

Since auto_lock releases lock only in the destructor, the lock is held in the entire lifetime of auto_lock. Thus, a critical section must be included in a proper block in which auto_lock is declared as auto variable at the first statement. When the block is finished, auto_lock is automatically destroyed, then the lock is also released.

This approach works fine, but in the real world, it isn't very neat. The problems with the "resource acquisition is initialization" technique in this context are:

  • Users must explicitly declare auto_lock objects.
  • Users must control the lifetime of the auto_lock objects the same way they would control the critical sections.

A Real-World Approach

When you have to handle a race condition in the real world, you may not want to explicitly declare an object and worry about its lifetime. This can be easily solved. Recall that if you declare an object in an if statement, C++ automatically frees it from the stack when the if block is done. Consequently, you can define this simple macro:

#define synchronized(lock) \
  if (const auto_lock lock_##__LINE__(lock))

where you declare the lock object as lock_##__LINE__ to handle the potential name conflict in case of nested synchronized statements.

You also need to extend auto_lock with an explicit type conversion operator bool() that always returns true:

class auto_lock
{
public:
   // ...
  operator bool() const
    { return true; }
  // ...
};

Now you can tackle a race condition in C++ the same way as you would in Java:

Mutex lock;
synchronized(lock)
{
   // do something
}

Generic<Programming>::Monitor

It seems that the monitor has been well implemented. However, you still have a lot of work to do if you want to write a template version of auto_lock. Application programmers use different locks (such as spin lock, mutex, semaphore, read-write lock, and so on) for different situations.

You can obtain a template version of the aforementioned auto_lock immediately:

template < typename Lock_T >
class auto_lock
{
public:
    auto_lock(Lock_T& lock)
        : _lock(&lock) 
		{ _lock->do_lock(); }
    ~auto_lock()
        { _lock->do_unlock(); }
    operator bool() const
        { return true; }
private:
    Lock_T * _lock;
};

where you still assume that the type Lock_T provides the operations do_lock() and do_unlock(). It is easy to drop this requirement with the help of some adapter classes. You just need to modify the definition of synchronized:

#define synchronized(Lock_T, lock) \
   if (const auto_lock<Lock_T>lock_##__LINE__(lock))

This new awkward definition requires users to provide one more parameter — the type of lock:

Mutex mlock;
Spin_lock slock;
synchronized(Mutex, mlock)
{
    // do something
}
synchronized(Spin_lock, slock)
{
    // do something
}

If users want to change the type of a lock, they need to modify all the synchronized statements besides the declaration of the lock. Again, this is a tedious and error-prone process. To make it easy to use synchronized, we present a helper function that creates and returns an auto_lock object (recall that we do not prevent copying auto_lock objects, so we can return an auto_lock that causes an implicit object copy):

template < typename Lock_T >
inline auto_lock<lock_T>
make_lock(const Lock_T& lock)
{
   return auto_lock<Lock_T>(lock);
}

where auto_lock is dervived from lock_block:

class lock_block
{
public:
    operator bool() const
        { return true; }
};
template < typename Lock_T >
class auto_lock : public lock_block
{
public:
    auto_lock(Lock_T& lock)
        : _lock(&lock) 
		{ _lock->do_lock(); }
    ~auto_lock()
        { _lock->do_unlock(); }
private:
    Lock_T * _lock;
};

By virtue of the C++ compiler, which deduces template arguments for template functions, we do not need to specify template arguments for make_lock() to create an auto_lock object. lock_block is a general class and does not need any type parameters. Thus, you may have a better definition of synchronized:

#define synchronized(lock) \
   if (const lock_block& lock_##__LINE__ = \
      make_lock(lock))

Things are looking better, but someone may claim that there is a bug in the aforementioned definition. They might argue that we need to declare a virtual destructor for lock_block. They might also point out that we declare a lock_block object that actually refers to an auto_lock object. Without the virtual destructor, the lock is not released when the object is destroyed because the destructor of lock_block instead of auto_lock is called.

This may seem like a serious problem: but we do not need to worry about it because we can achieve polymorphic behavior of the destructor without a virtual destructor! According to the C++ Standard, a const reference initialized with a temporary value makes that temporary value live for the lifetime of the reference itself. If you write:

Mutex lock;
const lock_block& magic_lock = make_lock(lock);

where make_lock creates a temporary auto_lock object. This temporary object is assigned to the const reference magic_lock. The language rule — that the temporary value lives at least as long as the reference — ensures that the correct destructor is called when the temporary value is destroyed. In turn, the destructor releases the lock. We also achieve high efficiency because there is no virtual call involved. This trick was first used in Andrei Alexandrescu and Petru Marginean's "Generic <Programming>" column (CUJ, December 2000) [6] for exception safety.

At this point, we have implemented a powerful monitor to control concurrent accesses.

ACID Transactions

Based on the same idea, we can also solve another important problem — ACID transactions. In the context of database transactions, ACID is short for "Atomic, Consistent, Isolation, and Durable." Transactions provide a simple model of success or failure. A transaction either commits (that is, all its actions happen) or aborts (all its actions are undone). This all-or-nothing quality makes for a simple programming model.

When applying the monitor idea to transactions, you need be more careful because you now have two choices if the transaction is over — either commit or abort. Consequently, we extend the lock_block to a trans_block that has the Boolean variable _abort to track whether a transaction finishes or aborts:

class trans_block
{
public:
    trans_block()
        : _abort(true) { }
    operator bool() const
        { return _abort; }
    void finish() const
        { _abort = false; }
protected:
    mutable bool _abort;
};

Derived from trans_block, class auto_trans starts a transaction in the constructor and rolls back or commits it in the destructor depending on whether it aborts:

template < typename Trans_T >
class auto_trans : public trans_block
{
public:
    auto_trans(Trans_T& trans)
        : _trans(&trans) { _trans->start(); }
    ~auto_trans()
    {
        if (_abort) _trans->rollback();
        else _trans->commit();
    }
private:
    Trans_T* _trans;
};

You can now define the counterpart of synchronized, the macro transaction:

#define transaction(trans) \
  for (const trans_block& auto_trans_##__LINE__ \
             = make_trans(trans); \
       auto_trans_##__LINE__; \
       auto_trans_##__LINE__.finish())

where make_trans() creates and returns an auto_trans object. To understand how the macro transaction works, examine this example:

database_session dbs;
transaction(dbs)
{
    // do something
    if ( something is wrong )
        break;
    // do something
    if ( something is wrong )
        throw string("error");
    // do something
}

where transaction(dbs) is actually a for statement whose initial part creates an auto_trans object with member variable _abort initialized to true. If everything goes well, the for statement finishes the first time loop, and the method finish(), which assigns false to _abort, is called. Then the second (test) part (that is, the type conversion operator bool() of auto_trans) is executed. Since operator bool() returns the value of _abort, which is now false, the for loop is not executed again, and the auto_trans object is destroyed. In the destructor of auto_trans, the transaction is committed because the variable _abort is false.

If something is wrong, you can stop executing the rest of the transaction with a break statement. In this case, the for loop is over and the finish() method is not called. Thus, _abort is still true. Consequently, the transaction rolls back when the auto_trans object is destroyed. You can also throw an exception. In this case, the auto_trans object is destroyed during the stack-unwinding process and, for the same reason, the transaction rolls back.

Efficiency

We've presented a flexible, general, and convenient monitor to control concurrent access. However, it would be useless if it has a serious impact on efficiency. To estimate this impact, we measured the runtime to synchronize a critical section 10 million times, by the monitor or the preliminary lock approach:

for ( size_t i = 0; i < 10*1000*1000; )
{
  // Monitor Approach
  synchronized(lock) {
      ++i; 
  }
  // or Preliminary Lock Approach
  lock.do_lock();
  ++i;
  lock.do_unlock();
}

Thus, the measurements show the overhead of the monitor compared to the preliminary lock approach. To consider the impact of the compiler and machine architecture, we ran the tests on five different machine architectures. On all five machines, we used g++ (four different versions) without any optimization. We ran the tests 100 times. Table 1 lists the average runtimes (in seconds) and standard deviations.

The results show that our implementation is about two times slower than the preliminary lock approach. However, for nontrivial C++ code, the most obvious difference was between runs using different levels of optimization [7]. Recall that we declare every function inline and use a trick to avoid virtual calls. Besides, operator bool() always returns the constant value true. So, you can expect that a good optimizer will eliminate all overhead and the monitor should not actually perform any more computation than the preliminary lock. After turning on the optimization option (-O3), we ran the test again and obtained the improved results shown in Table 2.

Clearly, there are no significant differences between the cost of the preliminary lock and the cost of the monitor on all five machines. (We did not compare the runtimes on different machines because they are totally different hardware.) In short, our implementation seems quite efficient.

References

[1] Hoare, C.A.R. "Monitors: An Operating System Structuring Concept," Communications of the ACM, 1974.

[2] Lea, Douglas. Concurrent Programming in Java, Addison-Wesley, 1997.

[3] Nelson, G. (editor). Systems Programming with Modula-3, Prentice Hall, 1991.

[4] Mitchell, J.G., W. Maybury, and R. Sweet. Mesa Language Manual. XEROX PARC, 1979.

[5] Stroustrup, Bjarne. The C++ Programming Language, Third Edition, Addison-Wesley, 1997.

[6] Alexandrescu, A. and P. Marginean. "Change the Way You Write Exception-Safe Code — Forever," C/C++ Users Journal, December 2000.

[7] Stroustrup, Bjarne. "Wrapping C++ Member Function Calls," The C++ Report, June 2000.


Haifeng Li is a Ph.D. candidate in the department of computer science and engineering, University of California, Riverside. Keshu Zhang is a Ph.D. candidate in the department of elecrical engineering at the University of New Orleans. They can be contacted at hli@cs.ucr.edu and kzhang1@uno.edu, respectively.



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video