The C++11 standard provides several long-requested concurrency features such as the std::thread, std::future, and others. While those are a welcome addition to the language, in this article, I will show that they are not sufficient for all but the most basic concurrency needs. I will argue that the primitives in C++11 are particularly ill-suited for modern applications that must deal with the concurrency imposed by I/O operations and exploit multicore at the same time.
Fortunately, many of these limitations can be addressed by augmenting C++11 futures with continuations, based on experience with the Parallel Patterns Library (PPL) at Microsoft. The reader is expected to have a working knowledge of C++11 and some experience writing parallel code, but familiarity with the PPL is optional.
Connected and Multicore
We take it for granted that the software we use daily is both connected to the Internet and able to harness multiple cores. It is natural to think of the Internet (or the cloud) and multiple cores not as two distinct capabilities of a program, but as a single elastic compute resource. As Herb Sutter puts it in Welcome To The Parallel Jungle, "The network is just another bus to more compute cores."
However, a developer building a modern connected multicore application faces two distinct challenges:
- Building a well-performing connected program requires dealing with the latency and unpredictability typical of I/O operations. This is difficult, and getting it right results in a an application that is responsive and scalable, but not necessarily faster.
- Building a well-performing multicore program is a parallel programming job a different challenge altogether, often requiring a different tool set and different skills. When this is done right, the program runs faster, although its speed is usually orthogonal to the responsiveness and the scalability.
Taking the view of the cloud as a natural extension of multicore behooves us to find a programming model that, at the very minimum, gives us a way to efficiently compose the I/O operations and the multicore operations.
Concurrency in C++
In the last decade, the software industry has developed a many tools for multicore programming in C++. Libraries such as Intel's TBB (Threading Building Blocks) or Microsoft's PPL (Parallel Patterns Library) are the state of the art. These tools excel at parallel decomposition partitioning serial code into multiple "chores" that run on multiple cores.
But there is more to being "connected and multicore" than just parallelism. Well-performing concurrent programs must combine the connected components, with their inherent latency and unreliability, with the parallel components. Put another way, if parallelism is about decomposing the program into independent parts, concurrency is about both decomposing and composing the program from the parts that work well individually and together.
I believe that it's in the composition of connected and multicore components where today's C++ libraries are still lacking.
The Dreaded Wait
Most of the concurrency primitives in C++11 are composed via waiting. One can spawn a thread (by creating an instance of std::thread), then wait for it to finish by calling the join method. Likewise, the result of a future object (represented by std::future) can be retrieved by calling the get method during which the calling thread waits for the result to become available.
Why is this a problem?
Waiting on the GUI thread means that the user of the application is rewarded with the "hourglass" or the "spinning donut" while the thread is waiting for an operation to complete. This is bad enough for a CPU-bound operation, but the length of an I/O-bound call can be truly unpredictable and therefore very long.
Clearly, the GUI thread of the application is a scarce resource, and we want to return it to the message pump as soon as possible but let's not kid ourselves by thinking that all we need to do is offload the long-running operation to another thread. If we did that, how would we synchronize the two threads without waiting?
The woes of composition-by-waiting are not limited to GUI programs. By default, at creation, a thread reserves 1 MB of stack space on Windows and 8 MB on Linux. This value is configurable, but reducing the stack size may break programs with deep call chains or multiple stack-allocated objects. In other words, not only GUI threads are expensive all threads are. This can be felt acutely in a multithreaded server application. If many threads decide to block at the same time, waiting can bring the server to its knees very quickly.
Continuations
In C/C++, continuations are commonly known as "callbacks," and they are often used for asynchronous programming. This is not to say that the concept is unique to C++, but because the language has been a laggard in adopting mainstream functional programming features such as the lambda expressions, C++ libraries that use continuations consistently are still rare.
The concept of the continuation was pioneered by Scheme, which introduced the style of programming where instead of returning a value, a function takes an additional parameter the continuation that is invoked to process the return value of the function. Naturally, the continuation itself is also a function that can take continuations, and so on.
Continuations make the flow of control explicit instead of invoking a function and waiting for it to complete, a program written in a continuation-passing style specifies explicitly what to do with the return value when it is available.
For concurrent programs, continuations are a boon because they allow us to avoid blocking waits which, as I stated above, greatly hinder responsiveness and scalability.
JavaScript has made continuations ubiquitous in Web programming. Because JavaScript is single-threaded, waiting for the server to produce the data would freeze the browser. Instead, JavaScript uses a technique known as AJAX, where the act of issuing a request to the server is separated from the act of handling the data retrieved from the server:
http.open("GET", "customer.html");
http.onreadystatechange = function() {
if(http.readyState == 4) {
var serverResponse = http.responseText;
// process serverResponse here: ...
}
}
More recently, Node.js has been very successful at capturing the mindshare of the developer community thanks to its use of continuations for server-side programming. In Windows Runtime, which powers the Metro-style apps in Windows 8, the concept of continuations is used holistically for all potentially long-running applications. Continuations are, in fact, the only way of working with asynchronous operations.
Tasks, Futures, and Promises
A beloved child has many names, as the saying goes. The concept of the "task" also known as the future or the promise, depending on the language and library represents a relatively straightforward idea.



