Dennis Shasha is a professor of Computer Science at New York University. His latest puzzle book Puzzles for Programmers and Pros outlined some general approaches to programmatic puzzle solving. This column challenges you to find the solutions to the puzzles that lie at the core of some cool tough applications. He solicits your help in identifying such cool apps.
If you've been reading Dr. Dobb's for many years, you may remember Dr. Ecco's Omniheurist Corner. In that column, Professor Shasha posed puzzles that required computer solutions. Dennis is returning to Dr. Dobb's, but with a new twist on puzzles. The idea is to take inspiration from the challenging apps that readers have faced and to turn those into puzzles.
If you have written an application that required extensive use of heuristics (because no algorithm would solve it) or that is otherwise algorithmically interesting, please send mail to Dennis at firstname.lastname@example.org. Put "Tough Apps" in the subject line. If he likes it, he will contact you. Then he will try to understand what you've done and create a puzzle explaining the algorithmic core. You will write a little blurb about yourself and the problem if you want.
In many sports, a factor of two from the world record is still respectable. If you can swim 400 meters in less than twice the time of Michael Phelps, climb El Cap half as fast as Lynn Hill, or windsurf at half the speed of Antoine Albeau, you are doing well.
Similarly, many problems that have no fast ("polynomial time") algorithm have fast approximation algorithms that give results within a factor of two of the optimal conceivable solution. Heuristics will often help improve beyond the factor of two, but starting with a half-as-good-as-optimal approach gives a foundation from which you can build.
Many such problems are scheduling algorithms. While the traveling salesman project gets the most press, the mother of practical scheduling algorithms is the shop scheduling problem. The problem also enjoys the virtue of being relevant to parallel and distributed computing.
Shop scheduling goes like this: You have M machines and J jobs. Any job can be done on any machine, but some jobs take more time than others. How can you get all jobs done in as little time as possible?
Remarkably, the following simple algorithm is within a factor of two of the optimal. Put jobs in a list in some order (any order will do). When a machine is idle, assign the next job in the list to that machine. How can such an apparently naive algorithm give any guarantee at all? Let's see.
Suppose the last job to complete begins at time t. Let L be the time to do that last job. Therefore the total time to complete all the jobs is t + L using this algorithm. Note that before t, all machines are used, because the naive algorithm uses every idle machine as it becomes available. Therefore the optimal schedule cannot use less than t time units. Further, the optimal schedule cannot complete before time L, because at least one job takes L time. So, if t >= L, then t + L <= 2t <= 2 * time of the optimal schedule. Similarly, if L >= t, then t + L <= 2L <= 2 * time of the optimal schedule. So, the naive algorithm yields a schedule that is no more than twice as long as the optimal possible schedule. What's cool is that you can prove this without even trying to find the optimal schedule.
This analysis immediately suggests a heuristic: Put the most time-consuming job remaining on the next available device. That will tend to avoid having the longest task scheduled near the end. It does not however guarantee an optimum.
1. Can you show an example in which this algorithm does not attain an optimal schedule (though it is within a factor of two)?
This factor-of-two idea works in a variety of contexts. Suppose you are sending machine parts from Trenton to Scranton on identical flatbed trucks. The width of each machine part extends the entire width of a truck. Further, no part can be stacked on top of another and the part must lie completely on the flatbed.
2. Can you guarantee to use no more than about twice the minimum conceivable number of trucks?
As long as we're in the transportation business, let's return to the traveling salesman problem. That problem goes like this. A salesman wants to travel to a set of towns by road, starting say from town A. The question is which route to take to minimize the total time spent in traveling. It's ok for the salesman to go through a town twice provided he visits every town at least once.
For the factor-of-two method to work, we need two assumptions. First, symmetry: it should take the same time to go from town x to y as to from y to x for any towns x and y. Second, triangle inequality: the time it takes to go from town x to town y through town z is greater than or equal to the time of the shortest path from x to z.That is, it shouldn't be faster, though it can be just as fast, to take a detour. As a consequence of these assumptions, a shortest route through the towns never needs to loop. So the shortest route is a special case of a tree, which is a graph structure without loops.
Finally, there is a very efficient algorithm to find a smallest cost tree that connects all the towns. The minimum spanning tree algorithm, in the version developed by Joseph Kruskal in the 1950s, works like this: Consider the towns to be nodes and roads to be edges. Form the edges of the minimum spanning tree TreeEdges as follows starting from the empty set:
Until you have a tree that connects all nodes, if the time to traverse the edge between x and y is among the smallest for any edge not already in TreeEdges and TreeEdges does not yet contain a path between x and y, add the edge between x and y to TreeEdges else discard the edge between x and y.
As an example, Figure 1 shows a simple graph and Figure 2 shows a minimum spanning tree.
3. How can you use the minimum spanning tree to find a route that is no more than twice the optimal route in cost? How would you improve this?