Editor's Note: This multi-part are on parallel algorithm design is based on the book Designing and Building Parallel Programs by Ian Foster. Designing and Building Parallel Programs promotes a view of parallel programming as an engineering discipline, in which programs are developed in a methodical fashion and both cost and performance are considered in a design. Part 1 focuses on the topics of Methodical Design and Partitioning. Subsequent installments will focus on Communication, Agglomeration, and Mapping, before finally examining case studies of parallel algorithms in action. A special thanks to Ian Foster.
VLSI is a process used to build electronic components such as microprocessors and memory chips comprising millions of transistors. The design of VLSI components is a computationally demanding process. Computers are used extensively to verify the correctness of a circuit design, to lay out a circuit in a two-dimensional area, and to generate the patterns used to test circuits once they have been fabricated. Many of these problems involve either an exhaustive or a heuristically guided search of a large space of possible solutions. Here, we consider a layout problem. The first stage of the VLSI design process typically produces a set of indivisible rectangular blocks called "cells". In a second stage, interconnection information is used to determine the relative placements of these cells. In a third stage, implementations are selected for the various cells with the goal of optimizing the total area. It is the third stage, floorplan optimization, for which we shall develop a parallel algorithm. This is an important part of the design process, since the cost of a chip is usually dominated by its area.
VLSI floorplan optimization can be explained by analogy with the problem of designing a kitchen. Assume that we have decided on the components the kitchen is to contain (this action is stage 1 of the VLSI design process) and how these components are to be arranged (stage 2). For example, we may wish to have a stove, refrigerator, table, and sink and may require that the stove be next to the refrigerator and the table next to the sink. Assume also that we can choose among several possible models for each of these components, with different models having different shapes but occupying the same floor area. In the floorplan optimization phase of our kitchen design, we select models so as make the best use of available floorspace.
In VLSI, a floorplan is represented as a pair of polar graphs, conventionally called the and G and H graphs. (A polar graph is a directed acyclic graph with a single source and a single sink. The term directed means that edges have a direction, and acyclic means that there are no cycles.) These graphs specify which cells are adjacent in the vertical and horizontal directions, respectively. Each arc denotes a cell, and nodes (other than the source and sink) link cells that must have touching edges.
Although a cell has a fixed area, it may have several possible implementations with different aspect ratios. If we have N cells, and if cell c(i)has implementations, then the total number of possible floorplan configurations is
For example, Figure 27 shows a floorplan optimization problem with three cells and six possible configurations:
The problem then is to identify the configuration with the lowest area, where area is defined as the product of the maximum horizontal and vertical extents. This identification can be achieved by using a search algorithm to explore a search tree representing all possible configurations. As shown in Figure 28, level i of this tree corresponds to the situation in which implementations have been chosen for i cells. We can explore this search tree by using Algorithm 1.1. An initial call search(root) causes the entire tree to be visited, with the path used to get to each leaf node reported as a solution.
Algorithm 1.1 implements an exhaustive search that visits all nodes of the search tree. Unfortunately, this strategy is computationally infeasible for any but the smallest problems. For example, a problem with just 20 cells and 6 implementations per cell has a search space of 620≈ 4x1015 nodes. Fortunately, the number of nodes explored can be reduced considerably by using a technique called branch-and-bound search. The basic idea is to keep track of the best (lowest area) solution found so far. Before "expanding'' a node (that is, looking at its subtrees), we check whether the area of the partial configuration represented by that node is already greater than that of the best known solution. If so, we know that this node cannot yield a better solution, and the subtree rooted at that node can be abandoned, or pruned (Figure 29). This approach is specified as Algorithm 2.2, with the global variable Amin used to maintain a record of the best solution.
On a sequential computer, the foreach in Algorithm 2.2 can examine each subtree in turn, thereby giving a depth-first search algorithm that explores the tree depth-first and left-to-right. In this case, pruning can reduce the number of nodes explored enormously. In one experiment reported in the literature, the number of nodes explored in a typical 20-cell problem was reduced from 4x1015 to 6x106. As we shall see, efficient pruning is a difficult problem in a parallel environment and, to a large extent, determines the structure of our parallel algorithm.
In summary, the fundamental operation to be performed in the floorplan optimization problem is branch-and-bound search. This is an interesting algorithm from a parallel computing perspective because of its irregular computational structure: the size and shape of the search tree that must be explored are not known ahead of time. Also, the need for pruning introduces a need both to manage the order in which the tree is explored and to acquire and propagate global knowledge of computation state. In these respects this problem is typical of many algorithms in symbolic (nonnumeric) computing.