Improving Data Dependence Analysis
In the data dependence problems considered so far, memory was accessed through a single variable. In general, however, compilers must be aware of complications that arise due to aliasing the situation where the same memory locations can be accessed through multiple variables. This situation is best illustrated with the following example:
Due to pointer arithmetic in C, the pointer variables
q can be associated with identical or partially overlapping memory regions. This potential aliasing allows for data dependences in all directions, illustrated as follows with three different invocations of the function:
Because compilers make conservative assumptions about the possibility of such forms of aliasing in a program, usually inaccurate data dependence information results. The inaccuracy due to assumed data dependences can be remedied, however, with one or all of the following methods: compiler hints, aliasing analysis, and dynamic data dependence analysis.
Compiler Hints for Data Dependences
As an example of the first method, many compilers support the keyword
restrict that can be used to assert that a pointer variable provides exclusive access to its associated memory region. If function
copy() is only applied to distinct arrays, for example, this non-aliasing information can be conveyed to the compiler as follows. Note that using such a language extension may require an additional compiler switch, like
Even though compiler hints can improve the performance of a program in a relatively simple manner, hints must always be used with care since incorrect usage may ultimately cause an undesired change in semantics.
State-of-the-art compilers perform intraprocedural (within a single function) and interprocedural (between several or all functions in a program) aliasing analysis to gather information on all variables that could potentially provide access to the same memory locations. Simple approaches are based on the observation that only a variable whose address is taken can be associated with a pointer variable.
In the function
copy(), for instance, neither
i can be modified through pointer variables
q, because the address of these local variables is never taken. More advanced approaches also analyze the way addresses propagate through the whole program. For example, if all calls to
copy() associate addresses of disjoint data structures to the formal arguments
q, the compiler can safely discard the possibility of aliasing in this function.
Some programming languages pose further restrictions on aliasing. The Fortran standard, for example, states that as long as an actual argument is associated with more than one formal argument, no modifications to the actual argument may occur. This implies that potential data dependences between array accesses that use different formal array arguments can be safely discarded. Incorporating such language-specific restrictions in the alias analysis substantially improves the accuracy of the data dependences that are eventually reported. (A detailed presentation of aliasing analysis can be found in Muchnick's textbook Advanced Compiler Design and Implementation, Chapter 10)
Dynamic Data Dependence Analysis
In situations where the previous methods fail to resolve assumed data dependences, the compiler can resort to dynamic data dependence analysis, where the outcome of a run-time test decides between a data dependent and a data independent version of the same code, each of which is subsequently optimized accordingly. A simple overlap test is based on the observation that for two half open intervals
a,b,c,d ∈ ⊄ with
a < b and
c < d, the property
A∩ B ≠ ø ⇔ a < d ∧ c < b
defines a necessary and sufficient condition for overlap between
B. This observation gives rise to the following method to resolve assumed data dependences in a well-behaved loop. First, for every memory reference in the loop, an approximated address interval of the accessed memory locations is constructed. Some sample address intervals for memory references through pointer
s32 *r in a well-behaved loop with index
i and upper bound
U are shown in Table 2.
Table 2: Sample address intervals for memory references.
Subsequently, for every pair of memory references in the loop that may cause a data dependence, the compiler generates code that tests overlap between the corresponding address intervals using the property described in the previous paragraph. The outcome of these run-time tests determines whether a data dependent or data independent version of the loop is executed.
For the function
copy() discussed earlier, for example, the assumed data dependences between
*q in the
i-loop can be resolved as follows.
Short-circuit evaluation of the condition ensures that the second loop is executed as soon as data independence has been proven. The compiler can eventually optimize this loop under the safe assumption of data independence. Although generating this kind of multi-version code can be very effective, a potential drawback of dynamic data dependence analysis is that the performance gains obtained by better optimization are outweighed by introducing testing overhead, since potentially every left-hand side memory reference has to be tested against every left-hand and right-hand side memory reference. Therefore, in general it is best to restrict dynamic data dependence analysis to loops with only a few memory references. Furthermore, even though the overlap test provides a convenient way to resolve assumed data dependences for memory references with simple subscripts, other sorts of run-time tests may be required to deal with more complicated situations, where approximated address intervals become too conservative.
Some programming languages pose restrictions on aliasing that can be exploited in the run-time tests. For example, because in Java programming language, reference variables which effectively are the same as pointer variables in C are either associated with disjoint or identical memory regions, an overlap test between two pointer variables can simply be done by testing equivalence of the two pointers, as exploited in some Java JIT compilers. The loop in function
copy() could even be vectorized unconditionally in Java, because the memory regions are either disjoint (
p != q) or equivalent (
p == q,) where the latter case gives rise only to the harmless loop-independent self-data-dependence:
As can be seen, the more developers are aware of data dependence in code that is executed often or needs optimal performance, the better the results will be.
Aart J.C. Bik is currently a Senior Staff Engineer at Intel Corporation, working on development of high performance Intel C++ and Fortran compilers. He received an Intel Achievement Award (the highest company award available) for his work in making Intel's Streaming SIMD Extensions 2 (SSE2) easier to use through automatic vectorization.
This article is based on material in Bik's book, The Software Vectorization Handbook, and it was edited by Andrew Binstock.