Channels ▼
RSS

C/C++

How Data Dependence Affects Performance


An imperative programming language like C imposes a total order on the execution of statement instances in a program. Central to exploiting any form of implicit parallelism in a sequential program is the observation that this total order can be relaxed into the partial order that is defined by data dependences between statements without affecting the semantics of the original program. Data dependences, therefore, provide compilers with essential information on the validity of applying transformations that change the execution order in a program. In this article, I examine data dependences and briefly discusses data dependence analysis and its role on program operation.

Data Dependences

Data dependences impose essential execution order constraints on the statement instances in a program, as formalized in this section.

Data Dependence Definitions

Suppose that a programmer has written the following two statements in C:

S1: a = 100;
S2: b = 200;

The sequential semantics of the language pose the following execution order on the statements: first execute statement S1, which assigns the value 100 to variable a, and then execute statement S2, which assigns the value 200 to variable b. In this case, however, executing S2 before S1 or even executing the statement simultaneously has no effect on the final values of the variables. In contrast, suppose the statement sequence reads as follows.

S1: a = 100;
S2: b = a + 200;

The sequential execution order must now be respected to avoid a change in semantics. Swapping the two statements would yield the value 200 for b — assuming an initial value 0 for a — rather than the intended value 300. The difference with the first example is that, in the second case, a read-after-write data dependence exists between statements S1 and S2.

This concept can be generalized to statements that appear in well-behaved loops. Consider a statement Sx in a nested loop with loop indices i1,...,idx and statement Sy in a nested loop with loop indices j1,...,jdy, where both statements have d ≤; min(dx,dy) outer loops in common: ik Ξ jk for 1 ≤; k ≤; d. If sequential semantics define that statement instance Sx(i1,...,idx) must be executed before statement instance Sy(j1,...,jdy), then the following memory-based data dependences between the two statement instances prohibit changing this execution order:

  • A flow dependence, denoted by Sx(i1,...,idx) δf Sy(j1,...,jdy), occurs if the former instance writes to a memory location that is subsequently read by the latter — read-after-write.
  • An antidependence, denoted by Sx(i1,...,idx) δa Sy(j1,...,jdy), occurs if the former instance reads from a memory location that is subsequently overwritten by the latter — write-after-read.
  • An output dependence, denoted by Sx(i1,...,idx) δo Sy(j1,...,jdy), occurs if the former instance writes to a memory location that is subsequently overwritten by the latter — write-after-write.

Any change in execution order that preserves the partial order defined by the data dependences arising in the original sequential code does not affect the final values appearing in memory and, hence, the semantics of the original program. The necessary execution order on I/O statements is easily preserved within this framework by modeling the effect of such statements with a write to the corresponding file pointer variable.

Data Dependence Terminology

For a data dependence between two statement instances Sx(i1,...,idx) and Sy(j1,...,jdy) that have d outer loops in common, the data dependence distance vector (σ1,..., σd) defines each component as σk = jk − ik. The data dependence direction vector (ρ1,..., ρd) defines each component as follows:

A data dependence is loop-independent if every ρi is =, or loop carried otherwise. Furthermore, for x < y, x > y and x = y, the data dependence is called lexically forward, lexically backward, and a self-data-dependence, respectively. Lexically backward and self-data-dependence are always loop-carried, although a loop-independent, self-antidependence is possible by noting that all reads in a statement instance must occur prior to writing the result.

Consider, for example, the following well-behaved loop:

The data dependences S1(i) δf S2(i) that are caused by writing and reading the same element of array a in each iteration 0 ≤ i < U have a data dependence distance vector (0) and data dependence direction vector (=). These flow dependences are loop-independent, lexically forward data dependences. The data dependences S2(i) δa S2(i + 1) that are caused by reading an element of array c that is overwritten in the next iteration have a data dependence distance vector (1) and data dependence direction vector (<). These anti-dependences are loop-carried self-data-dependences.

Data dependences that arise in sequential programs are plausible as defined in Table 1 for a data dependence direction vector that is positive — first nonequal component is <; zero — all components are =, or loop-independent; or negative — first nonequal component is >. Note that implausible data dependences are nevertheless useful for solving data dependence problems. (Textbooks often define plausibility for nonzero direction vectors. The definition used here can also classify loop-independent data dependences.)

Table 1: Plausible and implausible data dependences.

Data Dependence Graphs

Because data dependences arise between statement instances, it is generally infeasible to represent all the data dependences that arise in a program. Therefore, usually only static data dependences are recorded by dropping all instance information and instead annotating the data dependence with a data dependence direction vector. For example, the following loop has 10,000 flow dependences S1(i) δf S2(i) for 0 ≤ i < 10,000 caused by writing and reading an element of array a in the same iteration and 9,999 flow dependences S2(i) δf S2(i + 1) for 0 ≤ i < 9,999 caused by writing an element of array c that will be read in the next iteration:

This large set of data dependences is represented compactly with only two static data dependences:

Information on static data dependences can be easily represented by a data dependence graph G = (V,E), where each vertex v ∈ V corresponds to a statement and each edge e ∈ E with E ⊆ V Å~ V represents one or more static dependences between two statements. Edges can be labeled with more information on the static data dependences, such as type — flow, anti-, or output — and data dependence direction vector. Figure 1 shows the actual data dependences in the example given above together with the following data dependence graph representation:

G = ( {S1,S2}, {(S1,S2),(S2,S2)} )

Figure 1: Data dependence graph.

Although a static representation may lose information by lumping all statement instances together, data dependence graphs usually still provide sufficient information on implicit parallelism in a program.

Data Dependence Analysis

Data dependence analysis involves solving the problem of whether two statement instances can refer to the same memory location where at least one of these memory references is a write. Although this problem is undecidable in general, in most practical cases the data dependence problem can be formulated as an integer linear programming problem. However, since these problems are NP-complete, most compilers trade off some accuracy for efficiency while solving the problem. In any case, the outcome of data dependence analysis must be conservative — it is safe to report a data dependence when in reality none exists — but independence must never be reported incorrectly.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video