Claudio Taglienti is a wireless data architect at U.S. Cellular. He can be contacted at ctaglientiameritech.net.
Producer/consumer tasks, threads, and processes are common in operating systems. Applications of the producer/consumer problem, also known as the "bounded buffer problem," range from compilers producing assembly code consumed by assemblers, to the pipe command implemented in the UNIX shell. In this article, I present a C++ template implementation of a bounded buffer as a circular queue. I also show how to make the circular queue compatible with STL algorithms by providing a special iterator used as an interface between the algorithms and the circular queue.
The bounded buffer problem requires:
- Mutually exclusive access by the producer and consumer to the buffer.
- Synchronization between the producer and consumer when the buffer is full or empty.
The solution I present here addresses the mutually exclusive access problem in the context of an examplea timer thread that does not require compliance with the second condition. (The complete source code that implements this timer and technique is available at http://www.cuj.com/code/.) A typical implementation of a circular queue utilizes the STL deque container, since insertions/deletions at either end of the deque are efficient, taking place in constant time. The producer adds elements by invoking the push_back member function, while the consumer removes items by invoking pop_front. The problem, however, is that STL containers are not thread-safe. The only assumptions across STL implementations that can be made with respect to concurrency are that multiple readers are safe, and that multiple writers to different containers are safe .
To address the thread-safety problem in the deque example, you can use operating-system synchronization to lock access to the container for the duration of the call to the push_back and pop_front member functions . However, using operating-system synchronization primitives to provide mutually exclusive access to nonthread-safe code has disadvantages:
- It can't be utilized in interrupt-service routines where the code is not allowed to block (that is, cause suspension of the currently executing environment).
- Performance overhead when invoking operating-system primitives. This is especially true in operating systems that implement kernel and user mode (Windows NT and UNIX, for example).
- Moving the code to new operating systems requires a port for a different implementation of the synchronization primitive.
Luckily, there is no need to utilize any operating-system synchronization primitives to provide mutually exclusive access to a circular queueas long as the number of concurrent producers and consumers is limited to one each. The fundamental problem  that leads to race conditions in multithreaded environments occurs when two or more threads sharing a variable also try to modify it.
The circular queue implementation solves this problem using the classic "bounded buffer In/Out" pointers approach, in which the producer adds items to the queue via the pointer In, which is not shared with the consumer. The In pointer represents the next available position where an element can be added to the buffer. The consumer removes items from the queue using an Out pointer, also not shared with the producer. The Out pointer represents the position of the next element to be removed from a nonempty buffer. The queue is implemented as an array with Capacity=N+1, where N represents the maximum number of items allowed in the queue and Capacity is the physical size of the buffer. The queue is empty when In == Out, which ensures that the consumer can only consume what is produced. This implies that the invariant (Out + 1) % Capacity <= In is guaranteed by the circular queue for the consumer. The queue is full when (In +1) % Capacity == Out, which ensures that the producer output is bounded. Figure 2 is an example of a circular queue that is full.
The Circular Buffer
The circular buffer (see Figure 1) is a template class that parameterizes the type of elements that can be stored in the buffer, thus allowing creation of a buffer capable of holding elements of any data type. The circular buffer is an "unbounded buffer" because it never gets full. What makes the circular buffer appear to be of infinite size is the overwrite member function. Adding new elements when the physical size of the buffer is exceeded causes the oldest element in the buffer to be replaced by the new added element.
The overwrite member function removes the oldest element, then adds the new element. But herein lies the problem. The producer should not modify the Out pointeronly the consumer can. However, the overwrite member does exactly that, thus breaking the cardinal rule of ensuring that shared variables among threads are not modified. Consider the scenario in Figure 2, where a full circular buffer is accessed concurrently by a producer and a consumer. The producer adds a new element E to a full buffer and the consumer consumes two elements from the buffer. Then either of two conditions should be True if access to the circular buffer code is thread-safe:
- The consumer runs twice and removes A and B from the buffer and Out equals 2. The producer runs and adds element E in position 4.
- The consumer removes A before the producer runs. The producer runs and adds element E in position 4. The consumer runs again and removes elements B from the buffer and Out equals 2.
- The producer runs and overwrites element A with E in position 4.
- The consumer runs twice and removes elements B and C from the buffer and Out equals 3.
Assume the following sequence of events occurs, starting with the full buffer containing elements A, B, C, D:
- The consumer invokes member function Remove, which executes item=data[Out]. The value of Out is read into a CPU register. Out == 0 and In == 4.
- The consumer loses the CPU. The producer runs and invokes the Add member function, which overwrites element A with new element E. Out == 1 and In == 0.
- The consumer gets the CPU again. Out == 0 from step 1's execution of item = data[Out], yielding Ethe element that is returned to the consumer.
- The consumer continues and executes Advance(Out) next and Out == 1 or 2, depending on whether the value of Out is retrieved from the register or read again from memory.
The consumer erroneously retrieves element E, then either B or C.
The Circular Queue
The circular queue (also a template class as in Figure 1) further specializes the behavior of the circular buffer by disallowing overwrite behavior when the circular buffer is full. Consequently, it makes buffer addition and removal operations thread-safe, and it makes the buffer bounded.
An example of a multiple-producers/ single-consumer problem is a timer thread that sleeps for a unit of time (timer resolution), wakes up, and processes start-timer/stop-timer requests from multiple threads. A typical implementation would use a single buffer protected by a mutually exclusive synchronization primitive that allows multiple producer threads to invoke the timer services by adding requests to a single shared buffer; see Figure 3.
However, this solution is inefficient not only because of the use of a synchronization primitive but also because it requires a buffer that allows deletions from the middle. This occurs when a producer thread terminates and all of its outstanding timer requests need to be flushed to prevent the timer from firing for a nonexistent thread . This is known as the "dead reference" problem . This new requirement precludes the use of the circular queue that only allows constant time deletions from one end of the queue. A more appropriate solution is to use an STL container such as a list. However, this solution would incur a higher cost for insertions and deletions than the circular queue.
Divide and Conquer
Another way to solve this problem is to allocate a buffer to each producer, thus turning the multiple-producers/single-consumer problem into multiple single-producer/single-consumer problems, as in Figure 4.
With this implementation , each producer adds timer requests to its own queue. The timer thread processes requests from all queues each time it wakes up. The problems presented in the previous solution are eliminated and you can utilize the circular queue. The requirement for a synchronization primitive is removed, as this is now a single-producer/single-consumer problem. The requirement for a producer to delete outstanding timer requests before terminating is accomplished by having the producer invoke the Flush operation of its own circular queue.
Circular Buffer And STL Algorithms
To make the circular buffer and circular queue accessible to STL algorithms such as find, for_each, copy, and the like, the circular buffer implements the required begin and end member functions, returning a circular buffer Iterator (see Figure 1), which points to the beginning and end of the buffer, respectively. The circular buffer Iterator acts as a smart pointer, thus satisfying the STL requirement that iterators behave just like pointers. It is important to note that, in general, access to containers via iterators makes code nonthread-safe. Iterators represent a snapshot in time of the beginning/end positions of a container. Changes to the container, both in terms of its content and the beginning/end positions, are not synchronized with the original view of the container provided by the original iterators.
In this article, I've presented a solution to the bounded buffer problem that's based on a circular queue, which, in the case of a single-producer/single-consumer problem, requires no synchronization primitive to guarantee mutual exclusive access to the buffer. The circular buffer is a building block for the creation of the circular queue. The circular buffera nonthread-safe containerfinds its use in situations where a thread needs to keep track of its own history (such as keeping track of the state transitions of this thread or keeping track of messages/requests received by this thread).
All the code for the circular buffer, circular queue, CircularBuffer Iterator, and examples are available at http://www.cuj.com/code/. Timer_Emulation.cpp uses Windows MFC Threads to exercise the circular queue in a producer/consumer environment, where a timer thread accepts "start/stop timer" requests from two producing threads. TestCQ shows how STL algorithms can be used with the circular queue.
Thanks to Mike Polyakov for reviewing both the article and the code.
-  See Item 12: "Having realistic expectations about the thread safety of STL containers" of Scott Meyers' Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library (Addison-Wesley, 2001).
-  For an explanation of how to use different synchronization primitives to synchronize producer and consumer, see Andrew S. Tanenbaum's Modern Operating Systems (Prentice Hall, 1992).
-  Also known as the "readers/writers" problem, first discovered and solved by P.J. Courtois, F. Heymans, and D.L. Parnas, "Cuncurrent Control with 'Readers' and 'Writers,'" Communications of the ACM, October 1971.
-  In production code, both the timer and producers would be implemented as objects with the timer object holding pointers to producer objects in order to be able to invoke the appropriate timeout member functions of the producer object.
-  For a good description of this problem, refer to Modern C++ Design: Generic Programming and Design Patterns Applied, by Andrei Alexandrescu (Addison-Wesley, 2001).
-  This approach can be generalized to solve similar problems, as long as serialization of requests is a requirement on a per producer basis, as is the case for the timer thread. If serialization of requests is a requirement across producers, then this approach cannot be used.