Data Dependence Problems
Consider a program fragment with an occurrence of an
n-dimensional array a at the left-hand side of an assignment statement
Sx in a nested loop with loop indices
i1,...,idx and iteration space
I, and with another occurrence of the same array at the right-hand side of an assignment statement
Sy in a nested loop with loop indices
j1,...,jdy and iteration space
Sx: a[f1(i1,...,idx)]...[fn(i1,...,idx)] = ... ;
Sy: ... = a[g1(j1,...,jdy)]...[gn(j1,...,jdy)] ;
Then, deciding if the two statements can be involved in a flow or anti-dependence involves solving the problem of whether statement instances of Sx and Sy can reference the same memory location during execution of the program. This data dependence problem is formulated as the following system of equations, Constraints 1:
Since the subscript functions
gi are typically affine functions of loop indices, Constraints 1 usually forms a system of linear equations. In fact, since solving an under-specified data dependence problem is conservative, non-affine subscript functions can simply be omitted from the system.
The iteration space constraints
(i1,...,idx) ∈ I and (j1,...,jdy) ∈ J give rise to a system of linear inequalities, Constraints 2, where, again, non-affine loop bounds can simply be left out to obtain an under-specified data dependence problem.
Even more constraints arise when testing for a particular data dependence distance or direction vector. For example, if the statements have
d ≤ min(dx,dy) of the outermost loops in common ik ≡ jk for
1 ≤ k ≤ d, then specifically testing for a flow dependence between
Sy that is carried by the
id-loop gives rise to Constraints 3, because the write must be executed before the read.
Testing for a similar antidependence requires replacing the latter constraint with
jd < id. Testing for a flow dependence carried by less than four iterations of the
0 < jd − id < 4 instead. Such constraints are particularly useful to test the validity of vectorization with a particular short vector length.
The systems of linear equations (Constraints 1) and linear inequalities (Constraints 2) together with any additional constraints (Constraints 3) form the data dependence system of the data dependence problem. A similar data dependence system results for the reversed situation where the array occurrences appear at the right-hand side of
Sx and the left-hand side of
Sy, with potential anti- and flow dependences, or for the situation where the array occurrences appear at the left-hand sides of both statements, with potential output dependences.
Data Dependence Solvers
If a data dependence system is provably inconsistent (that is, no solution exists), then the data dependence considered in the data dependence problem cannot exist. In all cases where inconsistency cannot be shown, the tested data dependence is assumed. Most state-of-the-art compilers attack the data dependence system with a series of data dependence solvers that progressively increase in accuracy as well as computational and storage requirements. If simple inexpensive solvers fail to prove inconsistency, the compiler resorts to more powerful but potentially expensive solvers, such as those based on Fourier-Motzkin elimination. Central to this elimination method is the observation that a variable
i can be eliminated from a system of linear inequalities by replacing each pair-wise combination of inequalities that define a lower and upper bound for that variable, as expressed below for two integer constants
c1, c2 > 0.
After this elimination, which only requires integer arithmetic, another system of inequalities that does not involve the variable
i results. For a variable
i∈ℜ, the original system is consistent if and only if the resulting system is consistent. On the other hand, for a variable
i ∈Z, the elimination given above may be inaccurate. For example, eliminating the variable
i from the pair of inequalities 16 ≤ 3
i and 2
i ≤ 11 yields a consistent system 32 ≤ 33, whereas the original system has no integer solution for
Solving a data dependence problem is illustrated by the following loop, which contains a single assignment statement
This assignment statement is involved in a self-flow or antidependence if two statement instances
S(j) write and read the same element of array
a, which can be formulated as the following data dependence problem defined by subscript functions
f(i) = i and
g(j) = j + 30.
Specifically testing for a flow or antidependence adds, respectively, either inequality
i < j or inequality
j ≤ i to this data dependence system. The latter constraint still includes the possibility of loop-independent self-antidependences. Although such simple data dependence systems, illustrated in Figure 2, can easily be handled by the inexpensive solvers alluded to above, the inconsistency of the former system is proven in the following with Fourier-Motzkin elimination. Note that
i < j rewrites into
i + 1 ≤ j for integers.
The latter system has several solutions for
j ≤ i, although none for
i = j, which can be summarized with static data dependence
Figure 2: Data dependence systems.
Hierarchical Data Dependence Analysis
A structured method for finding static data dependences between a pair of memory references was proposed. Given a data dependence system defined by only the linear equations (Constraints 1) and inequalities (Constraints 2), the method first tests for data dependence direction vector
imposing no additional constraints on the system. If independence cannot be proven, one
*-component is refined into directions <, =, and >, which gives rise to three new data dependence systems, each of which has one additional constraint in the system (Constraints 3). The data dependence hierarchy that arises in this manner is illustrated in Figure 3 for
d = 2.
Figure 3: Data Dependence hierarchy.
If independence has been proven for a particular data dependence direction vector, this proof effectively prunes all further refinements. On the other hand, if a particular data dependence cannot be further refined but still gives rise to a consistent data dependence system, a flow, anti-, or output dependence with this data dependence direction vector is reported, depending on whether the data dependence is between a write and a read, a read and a write, or two writes, respectively. An implausible data dependence, as shown in Table 1, is first reversed into a plausible data dependence by reversing each non-equal component of the data dependence direction vector as < ↔ >, reversing a flow dependence into an antidependence, and vice versa.
This kind of hierarchical data dependence analysis is illustrated with the data dependence problem arising in the following simple program fragment:
Simultaneously testing for a potential flow or antidependence between statement instances
S1(i1,i2) and S2(j1,j2) gives rise to the following initial data dependence system:
Since this system is consistent, the data dependence direction vector
( ρ1, ρ2) = (*,*) is refined into ,code>(<,*), (=,*), and
(>,*), which gives rise to three new data dependence systems with additional constraints
i,sub>1 < j1, i1=j1, i1 > j1, respectively. Since only the system belonging to
(>,*) is consistent, the data dependence systems for
(>,<), (>,=), and
(>,>) are considered, which further adds the constraints
i2 < j2, i2 = j2, i2 > j2, respectively. Only the system belonging to the flow dependence
is consistent. This implausible data dependence, by nature, reverses into the plausible antidependence
before reporting. Note that this particular case only considers seven systems rather than the nine systems that would have been considered if all possible data dependence direction vectors were tested directly.