Dr. Dobb's | volatile vs. volatile

volatile vs. volatile

A tale of two similar but different tools

January 08, 2009
URL:http://www.drdobbs.com/cpp/volatile-vs-volatile/212701484

Herb is a bestselling author and consultant on software development topics, and a software architect at Microsoft. He can be contacted at www.gotw.ca.

What does the volatile keyword mean? How should you use it? Confusingly, there are two common answers, because depending on the language you use volatile supports one or the other of two different programming techniques: lock-free programming, and dealing with 'unusual' memory. (See Figure 1.)

[Click image to view at full size]

Figure 1: A tale of two requirements.

Adding to the confusion, these two different uses have overlapping requirements and impose overlapping restrictions, which makes them appear more similar than they are. Let's define and understand them clearly, and see how to spell them correctly in C, C++, Java and C# -- and not always as volatile.

[Click image to view at full size]

Table 1: Comparing the overlapping but different requirements.

Case 1: Ordered Atomic Variables For Lock-Free Programming

Lock-free programming is all about doing communication and synchronization between threads using lower-level tools than mutex locks. In the past and even today, however, these tools are all over the map. In rough historical order, they include explicit fences/barriers (e.g., Linux's mb()), order-inducing special API calls (e.g., Windows' InterlockedExchange), and various flavors of special atomic types. Many of these tools are tedious and/or difficult, and their wide variety means that lock-free code ends up being written differently in different environments.

The last few years, however, have seen a major convergence across hardware and software vendors: The computing industry is coalescing around sequentially consistent ordered atomic variables as the default or only way to write lock-free code using the major languages and OS platforms. In a nutshell, ordered atomic variables are safe to read and write on multiple threads at the same time without doing any explicit locking because they provide two guarantees: their reads and writes are guaranteed to be executed in the order they appear in your program's source code; and each read or write is guaranteed to be atomic, all-or-nothing. They also have special operations such as compareAndSet that are guaranteed to be executed atomically. See [1] for further details about ordered atomic variables and how to use them correctly.

Ordered atomic variables are available in Java, C# and other .NET languages, and the forthcoming ISO C++ Standard, but under different names:

Java provides ordered atomics under the volatile keyword (e.g., volatile int), and solidified this support in Java 5 (2004). Java additionally provides a few named types in java.util.concurrent.atomic, such as AtomicLongArray, that you can use for the same purpose.
.NET mostly added them in Visual Studio 2005, also under the volatile keyword (e.g., volatile int). These are suitable for nearly all lock-free code uses, except for rare examples similar to Dekker's algorithm. .NET is fixing these remaining corner cases in Visual Studio 2010, which is in early beta as of this writing.
ISO C++ added them to the C++0x draft Standard in 2007, under the templated name atomic<T> (e.g., atomic). They started to become available beginning in 2008 in the Boost project and other implementations. [2]. The ISO C++ atomics library also provides a C-compatible way to spell those types and their operations (e.g., atomic_int), and these appear to be likely to be adopted by ISO C in the future.

A Word About Optimization

We're going to look at how ordered atomics restrict the optimizations that compilers, CPUs, cache effects, and other parts of your execution environment might perform. So let's first briefly review some basic rules of optimization.

The most fundamental rule of optimization in pretty much any language is this: Optimizations that rearrange ('transform') your code's execution are always legal if they don't change the meaning of the program, so that the program can't tell the difference between executing the original code and the transformed code. In some languages, this is also known as the 'as if' rule -- which gets its name from the fact that the transformed code has the same observable effects 'as if' the original source code had been executed as written.

This rule cuts two ways: First, an optimization must never make it possible to get a result that wasn't possible before, or break any guarantees that the original code was allowed to rely on, including language semantics. If we produce an impossible result, after all, the program and the user certainly can tell the difference, and it's not just 'as if' we'd executed the original untransformed code.

Second, optimizations are permitted to reduce the set of possible executions. For example, an optimization might make some potential (but not guaranteed) interleavings never actually happen. This is okay, because the program couldn't rely on them happening anyway.

Ordered Atomics and Optimization

Using ordered atomic variables restricts the kinds of optimizations your compiler and processor and cache system can do. [3] There are two kinds of optimizations to consider:

Optimizations on the ordered atomic reads and writes themselves.
Optimizations on nearby ordinary reads and writes.

First, all of the ordered atomic reads and writes on a given thread must execute exactly in source code order, because that's one of the fundamental guarantees of ordered atomic variables. However, we can still perform some optimizations, in particular, optimizations that have the same effect as if this thread just always executed so quickly that another thread didn't happen to ever interleave at certain points.

For instance, consider this code, where a is an ordered atomic variable:


a = 1;		// A
a = 2;		// B

Is it legal for a compiler, processor, cache, or other part of the execution environment to transform the above code into the following, eliminating the redundant write in line A?


		       // A': OK: eliminate line A entirely

a = 2;		// B

The answer is, 'Yes.' This is legal because the program can't tell the difference; it's as if this thread always ran so fast that no other thread accessing a concurrently ever got to interleave between lines A and B to see the intermediate value. [4]

Similarly, if a is an ordered atomic variable and local is an unshared local variable, it is legal to transform


a = 1;		// C: write to a
local = a;	      // D: read from a


a = 1;		// C: write to a
local = 1;	      // D': OK, apply "constant propagation"

which eliminates the read from a. Even if another thread is concurrently trying to write to a, its as if this thread always ran so fast that the other thread never got to interleave between lines C and D to change the value before we can read our own back into local.

Second, nearby ordinary reads and writes can still be reordered around ordered atomic reads and writes, but subject to some restrictions. In particular, as described in [3], ordinary reads and writes can't move upward across (from after to before) an ordered atomic read, and can't move downward across (from before to after) an ordered atomic write. In brief, that could move them out of a critical section of code, and you can write programs that can tell the difference. For more details, see [3].

That's it for lock-free programming and ordered atomics. What about the other case that some "volatiles" address?

Case 2: Semantic-Free Variables for "Unusual" Memory Semantics

The second requirement is to deal with "unusual" memory that goes beyond a given language's memory model, where the compiler must assume that the variable can change value at any time and/or that reads and writes may have unknowable semantics and consequences. Classic examples include:

Hardware registers, part 1: Asynchronous changes. For example, consider a memory location M on a custom board that is connected to an instrument that directly writes to M. Unlike ordinary memory that is only modified by the program itself, the value stored in M can change at any time even if no program thread writes to it; therefore, the compiler cannot make any assumptions that the value will be stable.
Hardware registers, part 2: Semantics. For example, consider a memory location M on a custom board where writing to that location always automatically increments by one. Unlike an ordinary RAM memory location, the compiler can't even assume that performing a write to M and then immediately following it with a read from M will necessarily read the same value that was written.
Memory having more than one address. If a given memory location is accessible using two different addresses A1 and A2, a compiler or processor may not be able to know that writing to location A1 can change the value at location A2. Any optimization that assumes a write to A1 does not change the value of A2 will break the program, and must be prevented.

Variables in such memory locations are unoptimizable variables, because the compiler can't safely make any assumptions about them at all. Put another way, the compiler needs to be told that such a variable doesn't participate in the normal type system, even if it appears to have a given type. For example, if memory location M or A1/A2 in the aforementioned examples are typed as an "int" in the program, what does that really mean? It means at most that it has the size and layout of an int, but it cannot mean that it behaves like an int -- after all, ints don't autoincrement themselves when you write to them, or mysteriously change their values when you write to what looks like a different variable at some other address.

We need a way to turn off all optimizations on their reads and writes. ISO C and C++ have a portable, standard way to tell the compiler that this is such a special variable that it must not optimize: volatile.

Java and .NET have no comparable concept. Managed environments are supposed to know the full semantics of the program they execute, after all, so it's not surprising that they don't support memory with "unknowable" semantics. But both Java and .NET do provide escape hatches to leave the managed environment and invoke native code: Java provides the Java Native Interface (JNI) and .NET provides Platform Invoke (P/Invoke). The JNI specification [5] is silent about volatile, however, and doesn't mention either Java volatile or C/C++ volatile at all; similarly, the P/Invoke documentation doesn't mention interactions with .NET volatile or C/C++ volatile. So to correctly access an unoptimizable memory location in either Java or .NET, you must write C/C++ functions that use C/C++ volatile to do the needed work on behalf of their managed calling code, make sure they entirely encapsulate and hide the volatile memory (i.e., neither take nor return anything volatile), and call those functions through JNI and P/Invoke.

Unoptimizable Variables and (Not) Optimization

All reads and writes of unoptimizable variables on a given thread must execute exactly as written; no optimizations are allowed at all, because the compiler can't know the full semantics of the variable and when and how its value can change. This is a stronger statement than for ordered atomics, which only need to execute in source code order.

Consider again the two transformations we considered before, but this time replacing the ordered atomic variable a with the unoptimizable (C/C++ volatile) variable v:


v = 1;	// A
v = 2;	// B

Is it legal to transform this as follows to remove the apparently redundant write in line A?


	        // A': invalid, cannot eliminate write
v = 2;	// B

The answer is no, because the compiler cannot possibly know that eliminating line A's write to v won't change the meaning of the program. For example, v could be a location accessed by custom hardware that expects to see the value 1 before the value 2 and won't work correctly otherwise.

Similarly, if v is an unoptimizable variable and local is an unshared local variable, it is not legal to transform


v = 1;	              // C: write to v
local = v;	          // D: read from v


a = 1;   	       // C: write to v
local = 1;l ;	// D': invalid, cannot do
// "constant propagation"

to eliminate the read from v. For example, v could be a hardware address that automatically increments itself every time it's written, so that writing 1 will yield the value 2 on the next read.

Second, what about nearby ordinary reads and writes -- can those still be reordered around unoptimizable reads and writes? Today, there is no practical portable answer because C/C++ compiler implementations vary widely and aren't likely to converge anytime soon. For example, one interpretation of the C++ Standard holds that ordinary reads can move freely in either direction across a C/C++ volatile read or write, but that an ordinary write cannot move at all across a C/C++ volatile read or write -- which would make C/C++ volatile both less restrictive and more restrictive, respectively, than an ordered atomic. Some compiler vendors support that interpretation; others don't optimize across volatile reads or writes at all; and still others have their own preferred semantics.

Summary

To safely write lock-free code that communicates between threads without using locks, prefer to use ordered atomic variables: Java/.NET volatile, C++0x atomic<T>, and C-compatible atomic_T.

To safely communicate with special hardware or other memory that has unusual semantics, use unoptimizable variables: ISO C/C++ volatile. Remember that reads and writes of these variables are not necessarily atomic, however.

Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic<T>.

Notes

[1] H. Sutter. "Writing Lock-Free Code: A Corrected Queue" (DDJ, October 2008). Available online at www.ddj.com/hpc-high-performance-computing/210604448.

[2] See www.boost.org.

[3] H. Sutter. "Apply Critical Sections Consistently" (DDJ, November 2007). Available online at www.ddj.com/hpc-high-performance-computing/202401098.

[4] A common objection at this point is: "In the original code, it was possible for another thread to see the intermediate value, but that is impossible in the transformed code. Isn't that a change in observable behavior? " The answer is, "No, " because the program was never guaranteed to ever actually interleave just in time to see that value; it was already a legal outcome for this thread to just always run so fast that the interleaving happened to never manifest. Again, what this optimization does is reduce the set of possible executions, which is always legal.

[5] S. Liang. Java Native Interface: Programmer's Guide and Specification. (Prentice Hall, 1999). Available online at java.sun.com/docs/books/jni/.