Channels ▼
RSS

Design

How Data Dependence Affects Performance


Improving Data Dependence Analysis

In the data dependence problems considered so far, memory was accessed through a single variable. In general, however, compilers must be aware of complications that arise due to aliasing — the situation where the same memory locations can be accessed through multiple variables. This situation is best illustrated with the following example:

Due to pointer arithmetic in C, the pointer variables p and q can be associated with identical or partially overlapping memory regions. This potential aliasing allows for data dependences in all directions, illustrated as follows with three different invocations of the function:

Because compilers make conservative assumptions about the possibility of such forms of aliasing in a program, usually inaccurate data dependence information results. The inaccuracy due to assumed data dependences can be remedied, however, with one or all of the following methods: compiler hints, aliasing analysis, and dynamic data dependence analysis.

Compiler Hints for Data Dependences

As an example of the first method, many compilers support the keyword restrict that can be used to assert that a pointer variable provides exclusive access to its associated memory region. If function copy() is only applied to distinct arrays, for example, this non-aliasing information can be conveyed to the compiler as follows. Note that using such a language extension may require an additional compiler switch, like -Qrestrict.

Even though compiler hints can improve the performance of a program in a relatively simple manner, hints must always be used with care since incorrect usage may ultimately cause an undesired change in semantics.

Aliasing Analysis

State-of-the-art compilers perform intraprocedural (within a single function) and interprocedural (between several or all functions in a program) aliasing analysis to gather information on all variables that could potentially provide access to the same memory locations. Simple approaches are based on the observation that only a variable whose address is taken can be associated with a pointer variable.

In the function copy(), for instance, neither n nor i can be modified through pointer variables p or q, because the address of these local variables is never taken. More advanced approaches also analyze the way addresses propagate through the whole program. For example, if all calls to copy() associate addresses of disjoint data structures to the formal arguments p and q, the compiler can safely discard the possibility of aliasing in this function.

Some programming languages pose further restrictions on aliasing. The Fortran standard, for example, states that as long as an actual argument is associated with more than one formal argument, no modifications to the actual argument may occur. This implies that potential data dependences between array accesses that use different formal array arguments can be safely discarded. Incorporating such language-specific restrictions in the alias analysis substantially improves the accuracy of the data dependences that are eventually reported. (A detailed presentation of aliasing analysis can be found in Muchnick's textbook Advanced Compiler Design and Implementation, Chapter 10)

Dynamic Data Dependence Analysis

In situations where the previous methods fail to resolve assumed data dependences, the compiler can resort to dynamic data dependence analysis, where the outcome of a run-time test decides between a data dependent and a data independent version of the same code, each of which is subsequently optimized accordingly. A simple overlap test is based on the observation that for two half open intervals

where a,b,c,d ∈ ⊄ with a < b and c < d, the property

A∩ B ≠ ø ⇔ a < d ∧ c < b

defines a necessary and sufficient condition for overlap between A and B. This observation gives rise to the following method to resolve assumed data dependences in a well-behaved loop. First, for every memory reference in the loop, an approximated address interval of the accessed memory locations is constructed. Some sample address intervals for memory references through pointer s32 *r in a well-behaved loop with index i and upper bound U are shown in Table 2.

Table 2: Sample address intervals for memory references.

Subsequently, for every pair of memory references in the loop that may cause a data dependence, the compiler generates code that tests overlap between the corresponding address intervals using the property described in the previous paragraph. The outcome of these run-time tests determines whether a data dependent or data independent version of the loop is executed.

For the function copy() discussed earlier, for example, the assumed data dependences between *p and *q in the i-loop can be resolved as follows.

Short-circuit evaluation of the condition ensures that the second loop is executed as soon as data independence has been proven. The compiler can eventually optimize this loop under the safe assumption of data independence. Although generating this kind of multi-version code can be very effective, a potential drawback of dynamic data dependence analysis is that the performance gains obtained by better optimization are outweighed by introducing testing overhead, since potentially every left-hand side memory reference has to be tested against every left-hand and right-hand side memory reference. Therefore, in general it is best to restrict dynamic data dependence analysis to loops with only a few memory references. Furthermore, even though the overlap test provides a convenient way to resolve assumed data dependences for memory references with simple subscripts, other sorts of run-time tests may be required to deal with more complicated situations, where approximated address intervals become too conservative.

Some programming languages pose restrictions on aliasing that can be exploited in the run-time tests. For example, because in Java programming language, reference variables — which effectively are the same as pointer variables in C — are either associated with disjoint or identical memory regions, an overlap test between two pointer variables can simply be done by testing equivalence of the two pointers, as exploited in some Java JIT compilers. The loop in function copy() could even be vectorized unconditionally in Java, because the memory regions are either disjoint (p != q) or equivalent (p == q,) where the latter case gives rise only to the harmless loop-independent self-data-dependence:

As can be seen, the more developers are aware of data dependence in code that is executed often or needs optimal performance, the better the results will be.


Aart J.C. Bik is currently a Senior Staff Engineer at Intel Corporation, working on development of high performance Intel C++ and Fortran compilers. He received an Intel Achievement Award (the highest company award available) for his work in making Intel's Streaming SIMD Extensions 2 (SSE2) easier to use through automatic vectorization.


This article is based on material in Bik's book, The Software Vectorization Handbook, and it was edited by Andrew Binstock.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video