The Tell-Tale Compiler
In my last post, I warned about the need to be aware of the memory models of both the programming language and the hardware when writing and executing threaded applications. In this post, I want to remind you also to consider issues that your parallel code can encounter from the compiler.
More Insights
White Papers
- A How-To Guide on Using Cloud Services for Security-Rich Data Backup
- How to test and launch a world-class application
Reports
- Best Practices: Using Apple's Global Proxy to Boost Mobile Security
- InformationWeek 2013 IT Spending Priorities Survey
Webcasts
- The Untapped Potential of Mobile Apps for Commercial Customers
- Secure Cloud: Taking Advantage of the Intelligent WAN
Normally I think of a compiler as being a "friend" to me and my application. I'm old enough to remember assembly language programming being taught to budding Computer Science students. This topic has disappeared, for the most part, from modern CS curriculum. It might be featured in a computer architecture and organization course, but no in-depth study is offered. Why? Compilers have gotten so good at code optimizations that trying to eke out a slightly more optimal execution for a small portion of your application by inlining some assembly language is typically more trouble than it is worth. And it is that optimization from the compiler that can trip you up in parallel code.
For example, let me remind you of the example I previously used to illustrate how the memory model of a processor could result in incorrect execution. I want to update a shared variable in one thread, threadZero, and another thread, threadOne, is going to read that newly updated value. To force the ordered execution I use a shared flag as synchronization, as shown in the following code fragment.
// Shared declarations
int Done = FALSE;
int N;
void threadZero(void *pArg)
{
. . .
N = SomeLocalValue;
Done = TRUE;
. . .
}
void threadOne(void *pArg)
{
. . .
while (!Done) {} // spin-wait
SomeOtherLocalValue = N;
. . .
}
If I use a smart compiler, it could notice that the value of Done is not modified in the spin-wait loop of threadOne, or is loop invariant. In other words, the compiler is free to assume that the value of Done does not change in the course of executing the (empty) while loop when executing the function. Thus, the compiler is justified in transforming the relevant code in threadOne to the following:
void threadOne(void *pArg)
{
. . .
int tmp = Done;
while (!tmp) {} // spin-wait
SomeOtherLocalValue = N;
. . .
}
Unless the value of Done is non-zero when threadOne executes the code above, threadOne will enter into an infinite loop. If you were to analyze this code for data races with some software tool, it would point out that there is a race on the Done variable. But, you already know about that data race; in fact, the original source code relies on the data race to work correctly. The compiler has turned what should have been a benign data race into a fatal deadlock situation. (Here is more evidence that even benign data races should be protected with mutually exclusive access synchronization.)

