Herb is a bestselling author and consultant on software development topics, and a software architect at Microsoft. He can be contacted at www.gotw.ca.
What does the volatile keyword mean? How should you use it? Confusingly, there are two common answers, because depending on the language you use volatile supports one or the other of two different programming techniques: lock-free programming, and dealing with 'unusual' memory. (See Figure 1.)
Adding to the confusion, these two different uses have overlapping requirements and impose overlapping restrictions, which makes them appear more similar than they are. Let's define and understand them clearly, and see how to spell them correctly in C, C++, Java and C# -- and not always as volatile.
Case 1: Ordered Atomic Variables For Lock-Free Programming
Lock-free programming is all about doing communication and synchronization between threads using lower-level tools than mutex locks. In the past and even today, however, these tools are all over the map. In rough historical order, they include explicit fences/barriers (e.g., Linux's mb()), order-inducing special API calls (e.g., Windows' InterlockedExchange), and various flavors of special atomic types. Many of these tools are tedious and/or difficult, and their wide variety means that lock-free code ends up being written differently in different environments.
The last few years, however, have seen a major convergence across hardware and software vendors: The computing industry is coalescing around sequentially consistent ordered atomic variables as the default or only way to write lock-free code using the major languages and OS platforms. In a nutshell, ordered atomic variables are safe to read and write on multiple threads at the same time without doing any explicit locking because they provide two guarantees: their reads and writes are guaranteed to be executed in the order they appear in your program's source code; and each read or write is guaranteed to be atomic, all-or-nothing. They also have special operations such as compareAndSet that are guaranteed to be executed atomically. See  for further details about ordered atomic variables and how to use them correctly.
Ordered atomic variables are available in Java, C# and other .NET languages, and the forthcoming ISO C++ Standard, but under different names:
- Java provides ordered atomics under the volatile keyword (e.g., volatile int), and solidified this support in Java 5 (2004). Java additionally provides a few named types in java.util.concurrent.atomic, such as AtomicLongArray, that you can use for the same purpose.
- .NET mostly added them in Visual Studio 2005, also under the volatile keyword (e.g., volatile int). These are suitable for nearly all lock-free code uses, except for rare examples similar to Dekker's algorithm. .NET is fixing these remaining corner cases in Visual Studio 2010, which is in early beta as of this writing.
- ISO C++ added them to the C++0x draft Standard in 2007, under the templated name atomic<T> (e.g., atomic
). They started to become available beginning in 2008 in the Boost project and other implementations. . The ISO C++ atomics library also provides a C-compatible way to spell those types and their operations (e.g., atomic_int), and these appear to be likely to be adopted by ISO C in the future.
A Word About Optimization
We're going to look at how ordered atomics restrict the optimizations that compilers, CPUs, cache effects, and other parts of your execution environment might perform. So let's first briefly review some basic rules of optimization.
The most fundamental rule of optimization in pretty much any language is this: Optimizations that rearrange ('transform') your code's execution are always legal if they don't change the meaning of the program, so that the program can't tell the difference between executing the original code and the transformed code. In some languages, this is also known as the 'as if' rule -- which gets its name from the fact that the transformed code has the same observable effects 'as if' the original source code had been executed as written.
This rule cuts two ways: First, an optimization must never make it possible to get a result that wasn't possible before, or break any guarantees that the original code was allowed to rely on, including language semantics. If we produce an impossible result, after all, the program and the user certainly can tell the difference, and it's not just 'as if' we'd executed the original untransformed code.
Second, optimizations are permitted to reduce the set of possible executions. For example, an optimization might make some potential (but not guaranteed) interleavings never actually happen. This is okay, because the program couldn't rely on them happening anyway.
Ordered Atomics and Optimization
Using ordered atomic variables restricts the kinds of optimizations your compiler and processor and cache system can do.  There are two kinds of optimizations to consider:
- Optimizations on the ordered atomic reads and writes themselves.
- Optimizations on nearby ordinary reads and writes.
First, all of the ordered atomic reads and writes on a given thread must execute exactly in source code order, because that's one of the fundamental guarantees of ordered atomic variables. However, we can still perform some optimizations, in particular, optimizations that have the same effect as if this thread just always executed so quickly that another thread didn't happen to ever interleave at certain points.
For instance, consider this code, where a is an ordered atomic variable:
a = 1; // A a = 2; // B
Is it legal for a compiler, processor, cache, or other part of the execution environment to transform the above code into the following, eliminating the redundant write in line A?
<FONT COLOR="FF000"> // A': OK: eliminate line A entirely</FONT><br> a = 2; // B
The answer is, 'Yes.' This is legal because the program can't tell the difference; it's as if this thread always ran so fast that no other thread accessing a concurrently ever got to interleave between lines A and B to see the intermediate value. 
Similarly, if a is an ordered atomic variable and local is an unshared local variable, it is legal to transform
a = 1; // C: write to a local = a; // D: read from a
a = 1; // C: write to a local = <FONT COLOR="FF000">1</FONT>; <FONT COLOR="FF000"> // D': OK, apply "constant propagation</FONT>"
which eliminates the read from a. Even if another thread is concurrently trying to write to a, its as if this thread always ran so fast that the other thread never got to interleave between lines C and D to change the value before we can read our own back into local.
Second, nearby ordinary reads and writes can still be reordered around ordered atomic reads and writes, but subject to some restrictions. In particular, as described in , ordinary reads and writes can't move upward across (from after to before) an ordered atomic read, and can't move downward across (from before to after) an ordered atomic write. In brief, that could move them out of a critical section of code, and you can write programs that can tell the difference. For more details, see .
That's it for lock-free programming and ordered atomics. What about the other case that some "volatiles" address?