Algorithm Alley

By Jon Bentley, April 01, 1999

This month, Jon presents tools and techniques for analyzing the performance of algorithms. Next month, he examines how code-tuning techniques speed up the various algorithms.

Apr99: Algorithm Alley

Jon is a Member of Technical Staff at Lucent Technologies's Bell Labs. He can be reached at [email protected].

You've just designed a nifty new algorithm. It's fast, but exactly how fast is it? To help you find out, this column presents tools and techniques for analyzing the performance of algorithms. It starts with a simple program for an important problem, and improves it through several versions.

I originally presented versions of these algorithms in Unix Review magazine (June, 1997). In the process, I observed huge performance speedups -- a factor of 30,000 -- but didn't take the time to understand the details of the run times. This column analyzes those algorithms, with an emphasis on methods that real programmers might use to understand performance.

The Problem

The Traveling Salesman Problem (TSP) calls for finding the shortest tour through a collection of cities. Figure 1 shows the shortest circuit through 14 cities that Abraham Lincoln visited when he rode the Illinois Eighth Circuit in 1850. This prototypical network design problem often arises in applications such as scheduling mechanical devices and factory production lines.

The fundamental mathematical abstraction for dealing with the TSP is a permutation -- an ordered sequence of objects. For instance, there are a total of six permutations of the three letters "a," "b," and "c":

abc acb bac bca cab cba

In general, there are n! ("n factorial") permutations of n letters, where n!=n×(n-1)× (n-2)×...×3×2×1. This is because any one of the n elements can go in the first position, any one of the remaining n-1 elements can go in the second position, and so on down to the single remaining element that must go in the last position. The factorial function grows very quickly:

10!=3,628,800,

20!2.43×10¹⁸, and

30!2.65×10³².

A TSP tour is a permutation of the input cities, implicitly closed by the tour by traveling from the last city back to the first. The program tsp1.c (and a sample input file, both available electronically; see "Resource Center," page 5) deals with n cities, where n is a global variable: int n;.

The code won't peek inside the representation of cities. Instead, it will access distances between cities with the distance function

typedef double Dist;

Dist d(int i, int j)

It will represent the tour in the vector p

int p[MAXN];

where p[0..n-1] is a permutation of the cities in the tour. For instance, a tour that goes from city 3 to 1 to 4 then 2 (and finally back to 3) would be represented by the permutation vector 3 1 4 2.

Algorithm 1: The Easiest Program

The first algorithm is easy: It will recursively generate all n! permutations, and choose the one with least cost. Because it inspects so many tours, we can't expect the algorithm to be fast; we'll soon analyze it to see exactly how slow it is. Assume that the variables are initialized with

for (i = 0; i < n; i++)

p[i] = i;

minsum = INF;

The name solve1 denotes that this is the first version:

void solve1()

{ search1(n);

}

The workhorse is the following search function. The parameter m tells how many elements of the vector p have yet to be permuted. It operates top-down by leaving p[m..n-1] fixed while it permutes p[0..m-1]:

void search1(int m)

{ int i;

if (m == 1)

check1();

else

for (i = m-1; i >= 0; i--) {

swap(i, m-1);

search1(m-1);

swap(i, m-1);

}

}

The swap function exchanges pairs in the vector p. The for loop swaps each element in turn to the end, recurs with size m decreased by one, then swaps the element back to its original place. This code is short but subtle, and is worth serious study before proceeding.

When a complete permutation is fixed, m is 1 and search1 calls the check1 function:

void check1()

{ int i;

Dist sum = d(p[0], p[n-1]);

for (i = 1; i < n; i++)

sum += d(p[i-1], p[i]);

save(sum);

}

This function computes the cost of the tour by summing the n-1 distances between adjacent points and adding in the distance from the last point to the first. The save function records the tour and its cost, if it is the best so far:

void save(Dist sum)

{ int i;

if (sum < minsum) {

minsum = sum;

for (i = 0; i < n; i++)

minp[i] = p[i];

}

}

The global variables minp and minsum record the best tour and its length.

How slow is this program? The run times listed in Table 1 were generated on a 200-MHz Pentium Pro machine. The first three times are observed values in seconds, and the final four times are back-of-the-envelope extrapolations. The program is indeed pokey.

A little mathematics shows why it is so slow. The code investigates all n! permutations, and performs n distance calculations for each (to calculate the cost of the tour). The total number of distance calculations used by the program is therefore n×n!.

The analysis is confirmed by looking at a profile. Table 2 describes one run of the program with n=9. This data confirms the analysis: The check1 function is called n! (or 362,880) times. The distance function is called n×n! (or 3,265,920) times. This profile also shows that the distance function is indeed the critical operation in this program: The dist function and the sqr and sqrt functions it calls account for almost 87 percent of the run time of the program.

Exercise 1: Try profiling a C implementation of an algorithm.

Algorithm 2: A Fixed Point

The next algorithm reduces the run time by examining fewer permutations. It fixes the last city in the permutation, and starts the search at one lower value:

void solve2()

{ search1(n-1);

}

This code no longer inspects all n! permutations, but still examines all possible tours. Algorithm 2 looks at only (n-1)! permutations, and performs n distance calculations at each. The total number of distance calculations is n×(n-1)!, or precisely n!.

To verify this, we'll augment the dist function with instrumentation code that gives key counts without the overhead of profiling:

Dist dist(int i, int j)

{

#if COUNTDISTS

distcnt++;

#endif

...

When the program turns on COUNTDISTS, it increments distcnt at each call. (If the constant is zero, no additional run-time overhead is incurred.) Table 3 shows the number of distance calculations performed by the function. The first and second columns show that the number of distance calculations is indeed n!. The third row divides the current number of distance calculations by the previous number. In this case, those ratios repeat precisely the first column, and characterize factorial performance.

Of course, it is always important to verify that theory translates into reality; see Table 4. Both algorithms increase their run times by about a factor of n as n increases, and the run time of Algorithm 2 on n is a bit more than Algorithm 1 on n-1. Both of these observations are consistent with the predicted values.

Algorithm 3: Sum Times Not

The next version of the algorithm reduces the overhead of computing the sum for each permutation from scratch. Instead, the code carries along the partial sum in a new parameter to the search3 function. It initially calls the function with the partial sum set to zero:

void solve3()

{ search3(n-1, 0);

}

It adds in the distance from the top city to the city next to it at each recursive call:

void search3(int m, Dist sum)

{ int i;

if (m == 1)

check3(sum + d(p[0], p[1]));

else

for (i = m-1; i >= 0; i--) {

swap(i, m-1);

search3(m-1, sum + d(p[m-1], p[m]));

swap(i, m-1);

}

}

The new check3 function adds in the distance from the first to the last city:

void check3(Dist sum)

{ sum += d(p[0], p[n-1]);

save(sum);

}

This algorithm still looks at (n-1)! permutations, and I guessed that it would average roughly one distance calculation for each. I therefore expected another speedup of about a factor of n. Table 5 lists the run times. Where I hoped for a speedup of around 10, I observed a speedup of just 2 or 3. What gives?

I enabled the operation counts, and examined the sequence c_m, which denotes the count of distance functions used when the first parameter to search is m. I started with small values:

c₁ = 2, c₂ = 6, c₃ = 21.

The sequence begins:

2, 6, 21, 88, 445, 2676, 18739, ...

My first approach was to check this sequence in Neil Sloane's online encyclopedia of integer sequences, but it wasn't there at the time.

Exercise 2: Try looking up the sequence at Sloane's web site at http:// www.research .att.com/~njas/sequences/.

The next step involves finding a recurrence relation to describe the sequence in terms of its earlier values. When m is one, there are two cities in the array and the algorithm makes two comparisons (between that pair and from the first to the last): c₁=2. When there are m>=2 cities, the algorithm swaps each of them to be the last, calls itself recursively on size m-1 (at cost c_m-1), and then makes one additional distance calculation for each: c_m=m×(c_m-1+1).

I verified the first few values by hand, then used a spreadsheet to check larger values against the instrumented runs.

Exercise 3: How would you construct that recurrence in a spreadsheet? You may want to warm up by constructing the sequence m!.

The recurrence matched the experiments. To understand the sequence, I toyed further with the spreadsheet. Because I expected c_m to be similar to factorials, the third column in Table 6 shows m!. I next played with several different comparisons of the second and third rows: I subtracted c_m from (m+1)! and took several ratios, all without insight. I finally stumbled across the ratio c_m/m! shown in the fourth column, in which I at long last recognized a pattern.

In high school, I memorized the natural logarithmic base e to accuracy 2.718281828459045...(Your guesses about my social life in high school may prove uncannily accurate.) My complete spreadsheet converged to 14 of those decimal digits at m=18, where floating-point accuracy finally gave out. This led immediately to the conjecture that:

c_m m!×(1+e)

But how does the ubiquitous e sneak into my TSP program? Subtracting one from the right column gives the sequence 1, 2, 2 1/2, 2 2/3,..., which I recognized as the partial sums of the Taylor series expansion of e^x evaluated at x=1. This suggests the theorem:

c_m=m!×(1+1/0!+1/1!+...+1/(m-1)!)

Exercise 4: Use mathematical induction to prove that this solution satisfies the recurrence relation.

Exercise 5: Play further with a spreadsheet to discover additional patterns in the sequence c_m.

Because Algorithm 3 makes the original call search3(n-1, 0), it uses c_n-1 or approximately (n-1)!×(1+e) calls to the distance function. The speedup is not n, but rather roughly n/(1+e), which closely matches the experimental run times.

Algorithm 4: Pruning the Search

"Don't keep doing what doesn't work" is good advice for programs as well as people. As the program progresses, sum can never decrease: Adding edges only increases the tour length. The program can therefore terminate the search when sum grows too large:

void search4(int m, Dist sum)

{ int i;

if (sum > minsum)

return;

if (m == 1)

... as before ...

Each previous algorithm runs in the same time, for any size n, regardless of the particular input data. The run time of Algorithm 4 varies dramatically with its input. Table 7 shows the number of distance calculations it uses on one sequence of inputs, where each input adds a single city of Lincoln's tour. By n=11, pruned Algorithm 4 uses about a factor of 10 fewer distance calculations than Algorithm 3. Furthermore, the ratios of distance calculations show that the run time does not increase by a factor of n at each stage, but rather by a smaller factor (apparently near 5 for this range). As Table 8 shows, run-time experiments confirm both predictions.

Algorithm 4 is much faster than Algorithm 3, and its run time grows more slowly.

Exercise 6: Analyze Algorithm 4 in greater detail across a wider variety of inputs.

Tools and Techniques

Programming tools and techniques for analyzing algorithms include:

Profilers. These general tools help us identify the key operations in code and their associated counts.
Operation Counts. After identifying distance calculations as a critical operation, we added instrumentation code to count those operations.
Integer Sequences. You can analyze some algorithms by studying their operation counts with tools such as ratios and Sloane's online encyclopedia of sequences.
Spreadsheets. Spreadsheets provide a convenient way for storing and analyzing program performance data.
Mathematics. Detailed analyses can sometimes provide real insight into real programs.
Experiments and Common Sense. Always run experiments to ensure that your theory matches practice.

Stay Tuned!

In my next column, I'll examine how code-tuning techniques speed up the various algorithms.

Solutions to Selected Exercises

Exercise 2. Although the sequence was not there when I originally looked it up in April, 1998, it has since been added. (Amazing how one sequence pops up in various contexts!) The encyclopedia now gives the recurrence, but not the approximation derived in this column.

Exercise 3. To compute m!, first set the A column to the integers 1, 2, 3,... Next set B1 to 1, then set B2 to =A2×B1, and drag it down the B column. To compute c_m, set the first column as before, set B1 to 2, set B2 to =A2×(B1+1), and drag it down.

Exercise 5. The first four columns in Table 9 are from the previous spreadsheet. The fifth column shows the difference between elements in the fourth column, and the fifth column is the inverse of the fourth column. I was delighted to see the factorials pop out in the fifth column. This is an alternate derivation of the fact that c_m/m!=1+1/0!+1/1!+...+1/(m-1)!.

DDJ

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Algorithm Alley

The Problem

Algorithm 1: The Easiest Program

Algorithm 2: A Fixed Point

Algorithm 3: Sum Times Not

Algorithm 4: Pruning the Search

Tools and Techniques

Stay Tuned!

Solutions to Selected Exercises

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Algorithm Alley

The Problem

Algorithm 1: The Easiest Program

Algorithm 2: A Fixed Point

Algorithm 3: Sum Times Not

Algorithm 4: Pruning the Search

Tools and Techniques

Stay Tuned!

Solutions to Selected Exercises

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content