Parallel programming is gaining favor as the multicore architecture becomes ubiquitous. On processors based on the x86 architecture, running threads in parallel presents memory problems beyond the typical concerns of keeping threads from interfering with each other's data. This article looks at a few more subtle issues regarding memory access that developers need to keep in mind as they design and implement multithreaded software.
At any given instant in time in a purely sequential program, memory has a well-defined state. This state is called sequential consistency. In parallel programs, consistency depends on who's observing processor activity. Two writes to memory by a hardware thread may be seen in a different order by another thread. The reason is that when a hardware thread writes to memory, the written data goes through a series of buffers and caches before reaching main memory (that is, RAM). Due to these various way stations along the path, a later write may reach main memory sooner than a previous write operation. Similar effects apply to reads. If one read requires a fetch from main memory and a later read finds its data in cache, the processor may allow the faster read to "pass" the slower read. Likewise, reads and writes might pass each other.
A processor has to see its own reads and writes in the order it issues them, otherwise programs would break. But the processor does not have to guarantee that other processors see those reads and writes in the original order. Systems that allow this reordering are said to exhibit relaxed consistency.
Because relaxed consistency relates to how hardware threads observe each other's actions, it is not an issue for programs running on a single hardware thread. Inattention to consistency issues can result in concurrent programs that run correctly on single-threaded hardware, but fail when run on multithreaded hardware with disjoint caches.
The hardware is not the only cause of relaxed consistency. Often, compilers are free to reorder instructions, and this reordering is critical to most major compiler optimizations. For instance, compilers typically move loop-invariant reads out of a loop, so that the read is done once per loop instead of once per loop iteration. Language rules typically grant the compiler license to presume the code is single-threaded, even if it is not. This is particularly true for older languages such as Fortran, C, and C++. Languages based on C syntax, such as Java, C#, and of course C/C++, can make the compiler more circumspect about these assumptions by use of the keyword
volatile. Unlike hardware reordering, compiler reordering can affect code even when it is running on a single hardware thread. Thus, the programmer must be on the lookout for reordering by the hardware or the compiler.