*Editor's Note: This multi-part are on parallel algorithm design is based on the book Designing and Building Parallel Programs by Ian Foster. Designing and Building Parallel Programs promotes a view of parallel programming as an engineering discipline, in which programs are developed in a methodical fashion and both cost and performance are considered in a design. Part 1 focuses on the topics of Methodical Design and Partitioning. Subsequent installments will focus on Communication, Agglomeration, and Mapping, before finally examining case studies of parallel algorithms in action. A special thanks to Ian Foster. *

Designing Parallel Algorithms: Part 1

Designing Parallel Algorithms: Part 2

Designing Parallel Algorithms: Part 3

Designing Parallel Algorithms: Part 4

VLSI is a process used to build electronic components such as microprocessors and memory chips comprising millions of transistors. The design of VLSI components is a computationally demanding process. Computers are used extensively to verify the correctness of a circuit design, to lay out a circuit in a two-dimensional area, and to generate the patterns used to test circuits once they have been fabricated. Many of these problems involve either an exhaustive or a heuristically guided search of a large space of possible solutions. Here, we consider a layout problem. The first stage of the VLSI design process typically produces a set of indivisible rectangular blocks called "cells". In a second stage, interconnection information is used to determine the relative placements of these cells. In a third stage, implementations are selected for the various cells with the goal of optimizing the total area. It is the third stage, **floorplan optimization**, for which we shall develop a parallel algorithm. This is an important part of the design process, since the cost of a chip is usually dominated by its area.

VLSI floorplan optimization can be explained by analogy with the problem of designing a kitchen. Assume that we have decided on the components the kitchen is to contain (this action is stage 1 of the VLSI design process) and how these components are to be arranged (stage 2). For example, we may wish to have a stove, refrigerator, table, and sink and may require that the stove be next to the refrigerator and the table next to the sink. Assume also that we can choose among several possible models for each of these components, with different models having different shapes but occupying the same floor area. In the floorplan optimization phase of our kitchen design, we select models so as make the best use of available floorspace.

In VLSI, a floorplan is represented as a pair of polar graphs, conventionally called the and ** G** and

**graphs. (A polar graph is a directed acyclic graph with a single source and a single sink. The term directed means that edges have a direction, and acyclic means that there are no cycles.) These graphs specify which cells are adjacent in the vertical and horizontal directions, respectively. Each arc denotes a cell, and nodes (other than the source and sink) link cells that must have touching edges.**

*H*

Although a cell has a fixed area, it may have several possible implementations with different aspect ratios. If we have **N** cells, and if cell **c( _{i})**has implementations, then the total number of possible floorplan configurations is

For example, Figure 27 shows a floorplan optimization problem with three cells and six possible configurations:

**Figure 27**: A floorplan optimization problem. The three cells A, B, and C, have 1, 3, and 2 implementations each, respectively. In (a) are the alternative implementations. In (b) are the

**and**

*G***graphs, which state that B must be above C, and that A must be to the left of B and C, respectively. In (c) are the**

*H***1 x 2 x 3 = 6**alternative floorplans that satisfy the constraints; each is labeled with its area. The lowest area floorplan is constructed from A, B0, and C1 and has an area of 130.

**Figure 28**: Solving a floorplan optimization problem. This is the search tree corresponding to the problem illustrated in Figure 27. Level 0 is the root. At level 1, an implementation has been chosen for A; the three level 2 subtrees represent the choices for B and the level 3 leaves the choices for C. The number in each tree node represents the area of the associated (partial) solution. The optimal configuration is (A,B0,C1) and has area 130.

The problem then is to identify the configuration with the lowest area, where area is defined as the product of the maximum horizontal and vertical extents. This identification can be achieved by using a search algorithm to explore a search tree representing all possible configurations. As shown in Figure 28, level i of this tree corresponds to the situation in which implementations have been chosen for **i** cells. We can explore this search tree by using Algorithm 1.1. An initial call search(root) causes the entire tree to be visited, with the path used to get to each leaf node reported as a solution.

**Algorithm 1.1**: A recursive formulation of a simple search algorithm. When called to expand a search tree node, this procedure checks to see whether the node in question represents a solution. If not, the algorithm makes recursive calls to the same procedure to expand each of the offspring nodes.

Algorithm 1.1 implements an exhaustive search that visits all nodes of the search tree. Unfortunately, this strategy is computationally infeasible for any but the smallest problems. For example, a problem with just 20 cells and 6 implementations per cell has a search space of **6 ^{20}≈ 4x10^{15}** nodes. Fortunately, the number of nodes explored can be reduced considerably by using a technique called branch-and-bound search. The basic idea is to keep track of the best (lowest area) solution found so far. Before "expanding'' a node (that is, looking at its subtrees), we check whether the area of the partial configuration represented by that node is already greater than that of the best known solution. If so, we know that this node cannot yield a better solution, and the subtree rooted at that node can be abandoned, or pruned (Figure 29). This approach is specified as Algorithm 2.2, with the global variable

**A**used to maintain a record of the best solution.

_{min}

**Figure 29**: Branch-and-bound search. This figure shows the nodes actually explored in the example problem, assuming a depth-first and left-to-right search strategy. The subtree rooted at the second node on level 2 is pruned because the cost of this node (170) is greater than that of the cheapest solution already found (130).

**Algorithm 2.2**: Branch-and-bound search is similar to Algorithm 1.1 (simple search), but uses a search bound

**A**to prune the search.

_{min}On a sequential computer, the **foreach** in Algorithm 2.2 can examine each subtree in turn, thereby giving a depth-first search algorithm that explores the tree depth-first and left-to-right. In this case, pruning can reduce the number of nodes explored enormously. In one experiment reported in the literature, the number of nodes explored in a typical 20-cell problem was reduced from 4x10^{15} to 6x10^{6}. As we shall see, efficient pruning is a difficult problem in a parallel environment and, to a large extent, determines the structure of our parallel algorithm.

In summary, the fundamental operation to be performed in the floorplan optimization problem is branch-and-bound search. This is an interesting algorithm from a parallel computing perspective because of its irregular computational structure: the size and shape of the search tree that must be explored are not known ahead of time. Also, the need for pruning introduces a need both to manage the order in which the tree is explored and to acquire and propagate global knowledge of computation state. In these respects this problem is typical of many algorithms in symbolic (nonnumeric) computing.