Designing Parallel Algorithms: Part 2

Local Communication

A local communication structure is obtained when an operation requires data from a small number of other tasks. It is then straightforward to define channels that link the task responsible for performing the operation (the consumer) with the tasks holding the required data (the producers) and to introduce appropriate send and receive operations in the producer and consumer tasks, respectively.

For illustrative purposes, we consider the communication requirements associated with a simple numerical computation, namely a Jacobi finite difference method. In this class of numerical method, a multidimensional grid is repeatedly updated by replacing the value at each point with some function of the values at a small, fixed number of neighboring points. The set of values required to update a single grid point is called that grid point's stencil. For example, the following expression uses a five-point stencil to update each element of a two-dimensional grid X :

Equation 1

This update is applied repeatedly to compute a sequence of values and so on. The notation , denotes the value of grid point , at step t .

Figure 4: Task and channel structure for a two-dimensional finite difference computation with five-point update stencil. In this simple fine-grained formulation, each task encapsulates a single element of a two-dimensional grid and must both send its value to four neighbors and receive values from four neighbors. Only the channels used by the shaded task are shown.

Let us assume that a partition has used domain decomposition techniques to create a distinct task for each point in the two-dimensional grid. Hence, a task allocated the grid point must compute the sequence:

This computation requires in turn the four corresponding sequences:

which are produced by the four tasks handling grid points and , that is, by its four neighbors in the grid. For these values to be communicated, we define channels linking each task that requires a value with the task that generates that value. This yields the channel structure illustrated in Figure 4. Each task then executes the following logic:

```for  t=0
to  T-1
send <img src="http://twimgs.com/ddj/images/article/2010/1003/100303anl_pt2_eq9.gif"> to each neighbor
receive <img src="http://twimgs.com/ddj/images/article/2010/1003/100303anl_pt2_eq10.gif"> <img src="http://twimgs.com/ddj/images/article/2010/1003/100303anl_pt2_eq11.gif">  from neighbors
compute using Equation 1
endfor

```

We observed earlier that the best sequential and parallel solutions to a problem may use different techniques. This situation arises in finite difference problems. In sequential computing, Gauss-Seidel update strategies are often preferred over Jacobi strategies because they allow solutions of comparable accuracy to be obtained using fewer iterations. In a Gauss-Seidel scheme, elements are updated in a particular order so that the computation of each element can use the most up-to-date value of other elements. For example, the Jacobi update of Equation 1 may be reformulated as follows (notice the use of values and ):

Equation 2

While Jacobi schemes are trivial to parallelize (all grid points can be updated concurrently), this is not the case for all Gauss-Seidel schemes. For example, the update scheme of Equation 2 allows only an average of around N/2 points within an N N grid to be updated concurrently. Fortunately, many different Gauss-Seidel orderings are possible for most problems, and we are usually free to choose the ordering that maximizes available parallelism. In particular, we can choose to update first the odd-numbered elements and then the even-numbered elements of an array. Each update uses the most recent information, yet the updates to the odd-numbered points are independent and can proceed concurrently, as can the updates to the even-numbered points. This update strategy yields what is referred to as a red-black algorithm, since the points can be thought of as being colored as on a chess board: either red (odd) or black (even); points of the same color can be updated concurrently. Figure 5 illustrates both the Gauss-Seidel scheme of Equation 2 and a red-black scheme, and shows how the latter scheme increases opportunities for parallel execution.

Figure 5: Two finite difference update strategies, here applied on a two-dimensional grid with a five-point stencil. In both figures, shaded grid points have already been updated to step t+1 ; unshaded grid points are still at step t . The arrows show data dependencies for one of the latter points. The figure on the left illustrates a simple Gauss-Seidel scheme and highlights the five grid points that can be updated at a particular point in time. In this scheme, the update proceeds in a wavefront from the top left corner to the bottom right. On the right, we show a red-black update scheme. Here, all the grid points at step t can be updated concurrently.

This example indicates the important role that choice of solution strategy can play in determining the performance of a parallel program. While the simple Gauss-Seidel update strategy of Equation 2 may be appropriate in a sequential program, it is not ideal on a parallel computer. The Jacobi update strategy is efficient on a parallel computer but is inferior numerically. The red-black scheme combines the advantages of both approaches.

More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.