Channels ▼


Developing Lightweight, Statically Initializable C++ Mutexes

Developing Class QLock

Next, I looked at the Pthread-w32 library. The code I discovered in that library's implementation of pthread_once (pthread_once.c, CVS version 1.16) used an approach radically different from either of the two techniques just discussed: If multiple threads enter pthread_once simultaneously, the first thread that has to wait (the second thread overall) creates a synchronization object (a Windows event object, in this case). The last thread to leave the routine destroys the object. An atomic reference count keeps track of the number of threads involved. All of the problems of the previous two approaches seemed to be avoided. Unfortunately, a closer examination of the code uncovered a race condition. After a few attempts at fixing the problem (see pthread-w32 mailing list archive if you are interested in the details;, it became clear that this general approach is fundamentally flawed and cannot be fixed.

Although reference counting used by the pthread_once implementation did not work, trying to fix it gave me some ideas. What if each thread that had to be suspended (waiting for another thread to complete the initialization of a Singleton), created and then destroyed its own synchronization object? For instance, I knew how the MCS implementation of spin-locks, to reduce contention, makes each thread wait spinning on its own flag. Could I use a similar technique, but instead of spinning, make each thread wait on its own semaphore or event object? It turned out that it was not only possible to make such a technique work, but the resulting code was fairly simple, too. Listing Nine (available electronically, see mutex.) uses this specialized waiting technique to implement my Singleton.

The central idea of this new approach is encapsulated in class QLock. Each instance of this class represents a lock held on the mutex that was passed as a constructor argument. A mutex itself, which is represented by the QLock::Mutex type, is just a pointer, which is set to zero when the mutex is not locked. When the mutex is locked, it points to the tail of the queue of threads waiting on this mutex. Each thread in the queue is represented by the instance of class QLock that the thread created to obtain the lock. In other words, the mutex is a pointer to the QLock instance created by the last thread waiting in the queue. This QLock instance is linked to another QLock instance, which was created by the previous thread in the queue, and so on. The QLock instance at the head of the queue corresponds to the thread that currently holds the lock.

A Closer Look at QLock

Table 1 lists the QLock class data members, while Table 2 lists the values that d_readyFlag and d_nextFlag may have. The constructor QLock::QLock atomically exchanges the mutex pointer for this. If the mutex pointer was 0 prior to the exchange, the lock is obtained and the constructor completes. If the mutex pointer was not zero, this instance of QLock has to join the queue and wait for the ready flag.

Data Member Description
d_mutex Pointer to the mutex locked by this QLock instance.
d_next Pointer to the next instance of QLock in queue waiting to obtain the lock.
d_readyFlag Flag used to indicate that the lock has been released by the predecessor.
d_nextFlag Flag used to indicate that the next pointer (d_next) has been set by a successor.

Table 1: QLock class data members.

Value Description
0 Initial value indicating that the flag is not set.
-1 Indicates that the flag was set before an event object had been installed into the flag.
Neither 0 nor -1 Indicates that an event object has been installed into the flag, which now holds the handle of the event object.

Table 2: Values that d_readyFlag and d_nextFlag may have.

The destructor QLock::~QLock, using one atomic compare-and-exchange operation, checks if the mutex pointer contains the value of this pointer. If it does, it resets the mutex pointer to zero, thus releasing the mutex. If the mutex no longer contains the value of this pointer, the queue has been formed, and the destructor must pass the lock to the next instance in the queue. The destructor first waits on the d_nextFlag to make sure that the next pointer has been set, then sets the d_readyFlag.

The algorithms used in QLock's constructor/destructor are basically the same as those used to lock/unlock MCS spin-locks. The setFlag and waitOnFlag methods are where we make our important deviation from MCS locks. Instead of spinning (as is appropriate for a spin-lock), the waitOnFlag routine:

  1. Creates an event object.
  2. Atomically checks that the flag has not yet been set and installs the event object's handle into the flag.
  3. Proceeds to wait on the event object.

Now compare our QLock-based solution for the Singleton problem with the two previous solutions (spin-lock-based and named-mutex-based). Similar to the named-mutex-based solution, you avoid the unpredictable behavior of the spin-lock: Instead of spinning, waiting threads are suspended by the kernel, and resumed as soon as the mutex is available. Additionally, you avoid the problems with the named mutex: Event objects used for synchronization are process local, do not require artificial names, and are created only when threads need to be suspended due to contention. When there is no contention (when only one thread enters the Singleton::instance method at the time of initialization), no synchronization objects are created: The overhead is simply that of a few atomic operations.


After using the synchronization mechanisms implemented by the QLock class to solve the Singleton problem on Windows (and more generally, the problem with the static initialization of mutexes on Windows), I discovered that QLock has some other important properties that made it attractive for other applications (including on UNIX). QLock::Mutex has a small memory footprint (one pointer), and a small initialization cost (that of initializing a pointer with 0). Additionally, when there is no contention, QLock is fast. If no wait is necessary, locking a QLock requires just one atomic instruction and no system calls. If there are no waiting threads, unlocking is also just one atomic instruction.

Furthermore, when there is contention, the cost of constructing event objects (or semaphores on UNIX) can be virtually eliminated by using a lock-free pool of allocated event objects. This pool works particularly well because the number of event objects required never exceeds the number of waiting threads—an inherently small number.

One common scenario in which QLock appears to be of great use is when you have a large collection of some kind of items that are accessed from multiple threads. To increase concurrency, it may be logically possible and desirable to maintain a mutex per item of the collection (instead of maintaining a single mutex for the entire collection). It is often the case, though, that the memory footprint and the initialization cost of a standard mutex or critical section would be prohibitive due to the (sometimes very large) size of the collection. QLock, because of its small size and negligible initialization cost, may be able to provide a viable solution to fine-grain locking.

Vladimir is a senior architect and head of application infrastructure at Bloomberg LP. He can be contacted at

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.