Embedded Systems

Programming Paradigms

By Michael Swaine, August 01, 1988

Once again Michael attempts to go in many directions at once with a discussion of parallel algorithms--coupled with practical examples.

AUG88: PROGRAMMING PARADIGMS

On the Rickie Lee Jones album Magazine, a song called "Gravity" is preceded by an instrumental track called "Prelude to Gravity." I really believe that an understanding of alternative programming paradigms is so gravely important to American software developers today that this column could be called the gravity track of this magazine. But so far I've only been playing the prelude. To really develop the theme, I'll need to bring in other voices. This month I'll wrap up this number and set the stage for the polyphonic theme to come.

First, though, I'll beat this old drum once again. Certain facts are gathering to cast an ominous shadow on American software development.

1. Widely known techniques exist today for decomposing large problems into small, discrete tasks that can be handed to a shop full of coders.

2. Performance is rarely as important to a software client as prompt delivery.

3. and 4. You do not have to understand English to write code, and no shipping cost is in effect for software that is written offshore. It seems inescapable that the American coder will increasingly be in competition for work with Third World talent. Any programmer whose work is primarily coding has as much reason for concern about the financial future as a Detroit auto worker.

Nobody wants to be the victim of change or to be on the wrong side of the door when it closes. Diversification and abstraction are among the few strategies that can be effective when you are up against uncertainty. Don't put all your eggs in one basket. Concentrate on the principle behind the procedure. ff you can't find the answer, restate the question. Take the paradigmatic view. I believe that the American programmers who prosper in the 1990s will be those who can think in several programming paradigms.

This column has thus far proselytized for the paradigmatic view and danced around the edges of the issue. This month I present some parallel algorithms that I hope will be of interest. Next month will see the beginning of a series of discussions with, and summaries of, lectures by the experts in object- oriented design, Lisp and Prolog development, parallel processing, Risc-centered design, and other paradigms of programming.

Parallel Algorithms

The difficulty in all parallel architectures is getting the correct data to the correct place at the correct time. Performing the computation is the easy part.

Tom Knight of MIT, quoted in Tony Durham's Computing Horizons, said it's not just the architectures. Glancing at the tables of contents of some of the hooks on parallel programming, you can see that communication is a deep concern in parallel algorithms.

This month I'll present some elementary practical algorithms from the growing literature on parallel processing. I'll touch on some of the things that people who work with parallel algorithms have to say about the potential and present performance of these algorithms. Some of this discussion will doubtlessly be familiar, but it may be helpful to those who have limited experience with parallel algorithms. Also, the work of Ewald Speckenmeyer, et al. on super linearity may be new to many readers.

Sorting in Parallel

"With sobs and tears he sorted out Those of the largest size, Holding his pocket handkerchief Before his streaming eyes."

-Lewis Carroll,

Through the Looking-Glass

Sorting is one of the most analyzed of computational problems. The first serious algorithm learned by a computer science student (with sobs and tears, as often as not) is likely to be a sorting algorithm. The working programmer is as likely to know, offhand, the performance characteristics for common sorting algorithms as for any algorithms. In his well written book Algorithmics: the Spirit of Computing, David Harel presents parallel versions of a simple Mergesort (see Example 1, page 132) and discusses the theoretical performance of some sequential and parallel sort algorithms. Table 1, on page 132, is based on his Figure 10.2, with some additions.

Example 1: Pseudocode representation of a parallelized Mergesort algorithm, based on Harel, p. 261. It sorts a list L of length N in O(N) time, using N processors. The recursive calls are executed in parallel. It is a straightforward parallelization of a sequential algorithm, and not a particularly good one, having a product measure is 0(N2) (see Table 1).

     function parallel-sort L
         if length (L)>1
         then
              split L into L1 and L2
              par
                   parallel-sort L1
                   parallel-sort L2
              merge L1 and L2 into L
         end if
         return L
     end function

Table 1: Performance of some sequential and parallel sorting algorithms (after Harel). The Product column is the product of the Size (number of processors) column and the Time (worst-case performance) column. This product measure provides a way of comparing the performance of parallel and sequential algorithms.

Algorithm                          Size         Time          Product

Sequential sorting algorithms:
Bubblesort                         1            O(N<sup>2</sup>)          O(N<sup>2</sup>)
Mergesort                          1            O(N logN)      O(N logN)
Heapsort                           1            O(N logN)      O(N logN)
Quicksort                          1            O(N logN)      O(N logN)

---------------------------------------------------------------------

Parallel sorting algorithms:

Parallelized Mergesort (fig. 1)
                                   O(N)         O(N)           O(N<sup>2</sup>)

Odd-even sorting network
                                   O(N(logN)<sup>2</sup>)  O((logN)<sup>2</sup>)     O(N(logN)<sup>4</sup>)

Iterative Mergesort (Fig. 3)
                    O(N)           O((logN)<sup>2</sup>)   O(N(logN)<sup>2</sup>)

'Optimal' sorting network
                    O(N)           O(logN)        O(NlogN)

The figures for the product of the number of processors and the worstcase time are an important measure for assessing parallel algorithms. To some extent, you can trade off size and time in parallel implementations: in other words, buying more speed with more processors. The product Cure thus gives a measure of overall efficiency, and it bears a nice relationship with the worstcase time figure for sequential algorithms. Any parallel algorithm can be sequentialized, with a total time of the order of magnitude of the sum of the times taken by all the processors m the parallel version. So, the product measure for a parallel algorithm cannot be better than the lower bound on the problem's sequential-time complexity. A linear increase in speed with additional processors is the best you can expect, though communication overhead in the parallel implementation will normally result in something less than a linearity. For sorting algorithms, the lower bound on the problem's sequential-time complexity is N log N, so that also bounds the product measure for a parallel sort.

Harel presents an odd-even sorting network algorithm and Sara Baase (in the second edition of her Computer Algorithms: Introduction to Design and Analysis) presents a parallel mergesort algorithm (Example 2, page 134). Both of these algorithms approach this N log N bound, with N processors and O((logN)²) time. Both have small "big-O" constants. Harel also discusses an "optimal" parallel sorting algorithm that breaks this N log N down into an order-N processor demand and worst-case time on the order of log N. This Harel algorithm is based on a sorting network, but it is unfortunately impractical for normal purposes because of outrageously large "big-O" constants.

Example 2: Sketch of an iterative parallel Mergesort algorithm, after Baase, p. 376. It sorts a list L of length N in O((logN)²) time, using N processors. The recursion of the algorithm in Figure 1 is unwound here, with all length-1 sublists merged first, then all length-2 lists, then all length-4 lists, and so forth. The word "ceiling" refers to the ceiling function: ceiling(X) is the least integer greater than or equal to X.

     for pass=1 to ceiling (lgN) do
          k:=<sup>2pass-1</sup>
          par
               merge length-k sublists <sub>Li</sub> and <sub>Li+1</sub>
               merge length-k sublists <sub>Li+2</sub> and <sub>Li+3</sub>
               {etc., merging adjacent sublists pairwise}
     end for

Restaurant Algorithms

I got sick and tired of having to decide what kind of dessert I was going to have at the restaurant, so I decided it would always be chocolate ice cream, and never worried about it again--I had the solution to that problem.

- Richard Feinman, Surely You're Joking, Mr. Feinman

It doesn't seem realistic that the development of parallel algorithms can rest entirely on parallelization of sequential algorithms like Mergesort. In some cases you may have to look at the problem anew.

Sometimes, that can mean changing the very question you are asking. Relaxing the pure algorithmic requirement of a guaranteed solution, or a solution guaranteed in a finite time, can radically change the search strategy. What you end up with is a solution to a different problem. Heuristics and probabilistic techniques can play a big part in parallel algorithms.

A classic problem in the allocation of resources is the problem of the "Dining Philosophers." In one version of this problem, N plates are set at a round dinner table. A single chopstick is placed between each pair of adjacent plates, so that it would be within the reach of a person seated before either of the two plates. On each plate is some rice. Don't ask how much rice; all you could ever want, a veritable mountain of rice. We are given to understand that this is an all-you-can-eat Chinese restaurant. At unpredictable intervals, any one of N philosophers will wander into this restaurant, sit down in an available chair, attempt to eat some rice from the plate, and equally unpredictably get up and leave the room, only to repeat this performance as unpredictably as before. You've probably seen the situation hundreds of times.

Clearly, the management has erred in supplying only N chopsticks for N philosophers; this is a recipe for deadlock. As most Chinese restaurateurs realize, if all N philosophers came to the table at once, they would all grab for chopsticks at once and they would all starve to death. The problem is to prevent the tragic spectacle of philosophers starving while their plates are laden with rice, and it can be shown that this "Dining Philosophers problem" does not admit of a fully distributed, fully symmetrical solution (unless the restaurant hires a doorman to keep the room's philosopher density down to N-1.)

"Fully distributed" means no shared memory. The philosophers' protocols may use only distributed information: "Fully symmetric" means that all their protocols are identical. These are the kinds of constraints that can be expected from parallel processing and in select restaurants.

The philosophers will eventually starve unless...

Harel's presentation of the algorithm that saves the philosophers is lucid. Example 3, page 138, shows the protocol he prescribes for each philosopher. Note how he bends the constraint that the philosophers' protocols be identical. What they do is identical; what results from their actions may differ. A coin toss introduces an element of randomness that guarantees that the probability of a true deadlock is zero.

The "Dining Philosophers" protocol is not new, nor is the use of random information to improve the performance of a deterministic algorithm. A number of currently hot topics in parallel processing incorporate this simple insight. Simulated annealing, the Metropolis algorithm, neural networks, Boltzmann machines, and Parallel Distributed Processing are among the heuristic-based techniques that have attracted a legion of algorists as motley as the father of the H-bomb and the codiscoverer of the genetic code.

And it looks like the route to superlinearity.

Superlinearity

I suppose Siegel and Shuster won't get any royalties from this, either. - attributed to Jerry Pournelle.

If one processor is good, wouldn't two processors be 2.46 times as good? Three West German researchers say "yes." Ewald Speckenmeyer Burkhard Monien, and Oliver Vornberger have used probalistic information to create a parallel algorithm that, they claim, achieves superlinear speedup. They got the speedup results listed in Table 2, page 134, in actual tests. They did not get these results by designing a new heuristic that works well in a parallel implementation, but poorly in a sequential one. They parallelized an existing sequential algorithm known to work well in the case of a single processor.

Table 2: Superlinear performance in a parallel algorithm. Speckenmeyer et al., cite these results as evidence for greater-than-linear speedup with multiple processors.

Number of Processors     2         4         8

Average speedup          2.46      4.68      9.59

Example 3: Protocol for the "Dining Philosophers" (based on Harel, pp. 303-304). The algorithm is deadlock-free with probability zero, but only if the philosophers decide randomly which chopstick to pick up first. If they leave out the coin flip, they will eventually deadlock and starve.

repeat forever
     carry out private activities until hungry
     repeat until sated
          toss coin
          if heads
          then
               firstSide:=left
               secondSide:=right

          else
               firstSide:=right
               secondSide:=left
          wait until chopstick on firstSide is available

          lift chopstick on firstSide
          lift chopstick on secondSide is not available
          then
               put down chopstick on firstSide
          else
               lift chopstick on firstSide
               eat until sated
               put down chopstick on firstSide
               put down chopstick on secondSide
          end if
     end repeat
end repeat

The algorithm, discussed in Supercomputing 1987, Houstis, et al., eds., is a backtracking algorithm for determining the satisfiability of a formula. A simple sequential version of the algorithm is shown in Example 4, below, along with a discussion of the way in which the authors parallelized it. The authors maintain that, although the parallel algorithm could be simulated on a sequential machine, this would not necessarily lead to an improved sequential algorithm because of overhead costs.

Example 4: Sequential backtracking algorithm of Speckenmeyer et al. The algorithm determines the satisfiability of a formula. Some details (such as how an "appropriate" X is selected and how formulas and clauses are represented and what constitutes satisfiability) are suppressed, but to see what the authors parallelized it is only necessary to focus on the for. . do loop. F is split into two parts, and, in effect, the pushes get done in parallel via separate processors.

     push input formula F
     repeat until stack is empty or satisfiability is reported
          pop formula F
          if F=0
          then
               report "satisfiable"
          else
               choose "appropriate" variable X
               split F into <sub>FX</sub> and <sub>FX'</sub>
               for e in {X,X'} do
                    if the empty clause is not in F<sub>e</sub>
                    then
                         push F<sub>e</sub>
                    end if
               end do
          end if
     end repeat

The parallel version of the algorithm is able to break the rules and achieve superlinearity, the authors say, because solutions are distributed nonuniformly on the average. This is apparently the first time that the nonuniformity of the solution density has been used in analyzing an algorithm. If the authors are right that the observed superlinearity is because of the nonuniformity in solution density, then they must also be right that any problem exhibiting such solution density nonuniformity could yield to a superlinear parallelization. Does this suggest an entirely new approach to difficult problems: start by attempting to characterize the solution space? Still unknown: how large is the class of problems that give superlinear speedup?

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems

Programming Paradigms

Parallel Algorithms

Sorting in Parallel

Restaurant Algorithms

Superlinearity

Table 2: Superlinear performance in a parallel algorithm. Speckenmeyer et al., cite these results as evidence for greater-than-linear speedup with multiple processors.

Example 3: Protocol for the "Dining Philosophers" (based on Harel, pp. 303-304). The algorithm is deadlock-free with probability zero, but only if the philosophers decide randomly which chopstick to pick up first. If they leave out the coin flip, they will eventually deadlock and starve.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Embedded Systems

Programming Paradigms

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content