Optimization with Simulated Annealing

By Mark Bucci, November 01, 2001

When you care enough to find the very best, here's an effective way to find it.

November 2001/Optimization with Simulated Annealing

Introduction

The need to find the best solution to a problem, when many solutions are possible, is a universal challenge that arises in a wide variety of applications. Integrated circuit design, resource planning, financial analysis, and scheduling are just a few of the applications where optimization algorithms are facing new challenges and creating new opportunities. Dramatic increases in processing power and memory now make it possible to solve complex optimization problems that were once considered overwhelmingly difficult. Linear programming and other mathematical programming techniques have given way to more heuristic approaches including genetic algorithms, tabu search [1], and simulated annealing. In this article, I will point out the practical advantages of using simulated annealing. I will describe how simulated annealing works, and I will demonstrate how to implement a simulated annealing algorithm to solve the problem of assigning time-share units at a vacation resort. The code for solving this problem is presented with special emphasis on classes that can be reused in other simulated annealing applications.

While the name sounds complex, the theory behind simulated annealing is actually easier to understand than most optimization algorithms. The name comes from the process of annealing metals by slow cooling in stages. The analogy behind this is that the algorithm begins the search for the optimal solution by exploring many possible solutions. Slowly and in stages, it restricts the search paths to only the most promising solutions as the algorithm is said to be cooling down. This simple approach allows it to explore solutions that other algorithms might reject. Another advantage of the simulated annealing approach is that it does not require you to formulate the problem in a very specific way in order to fit the solution. It can therefore, handle a wide variety of problems with complex constraints.

In order to appreciate the way simulated annealing and other heuristic search methods work, it helps to understand a little about the earlier and more formal approaches to optimization. The typical mathematical expression for the general optimization problem looks something this:
Minimize cost(x1, x2,..xk)
subject to the following constraints:
g1(x1,x2,..xk) <= 0
g2(x1,x2,..xk) <= 0
 .
 .gm(x1,x2,..xk) = 0
gm+1(x1,x2,...xk) = 0
 .
 .
gn(x1,x2,..xk) <= 0
The cost function can be just about anything you can calculate that depends on x, like the number of wires in a circuit or the expected rate of return on an investment. If it’s a function that you’d like to maximize, then you can simply put a minus sign in front of it. In the real world, the constraints can be equalities or inequalities involving continuous or discrete variables, and the constraint logic can be arbitrarily complex. But traditional mathematical programming methods require the problem to be expressed a certain way, and they are limited to only solving problems of a certain type and size. Traditional mathematical programming methods can be categorized as linear and nonlinear. Linear programming methods can be applied and are very effective when the cost function and all of the constraints are linear. Integer and discrete variables require a special variation of this approach known as integer programming. Nonlinear methods are based on the fact that the optimum point is found where the partial derivatives of the cost function are zero. The constraints can be made part of the cost function through special multipliers whose derivatives must also be evaluated and combined with the other variables resulting in a system of nonlinear equations that must be solved. These methods are most applicable when the cost function and the constraints are smooth and continuous, but even then, it is still very computationally intensive for large-scale problems.

Heuristic search techniques are less formal but often more effective than the traditional mathematical programming approaches for most practical problems. The basic idea is to generate a solution, evaluate it, and use that information to generate a possibly better solution. If you repeat this process in a loop, you can usually find an optimal or near-optimal solution in a reasonable amount of time. With simulated annealing and tabu search, you will need to write your own application-specific heuristic for generating the new solutions from the previous solution. I view this as strength rather than a weakness of these algorithms. By writing your own heuristic, you can take advantage of problem-specific knowledge to improve overall performance.

A potential problem with heuristic search is that it can become trapped in a local optimum. Figure 1 illustrates this point. Without any information about solutions that are outside the first dip in the cost function, the algorithm is trapped. Simulated annealing was specifically designed to address this problem. The simulated annealing algorithm accomplishes this by incorporating a small amount of randomness into the logic. Introducing the right amount of randomness is the key to successfully implementing a simulated annealing algorithm.

The SA Algorithm

The simulated annealing algorithm keeps track of three solutions to the problem at all times. In the steps shown in Listing 1, best(x) is the best solution found so far, trial(x) is the most recently generated trial solution, and curr(x) is the current solution. The SA algorithm generates trial solutions and evaluates them at each iteration. In the early stages of the search, the evaluation procedure is less discerning, and this allows the algorithm to actively explore many possible solutions. As the search progresses, the evaluation becomes more strict; only the best solutions are explored. This behavior is controlled by a parameter called the annealing temperature t(i). The annealing temperature is initially set to a high value. As this temperature is decreased, the search becomes more focused.

The evaluation procedure outlined in Listing 1 compares cost of the trial solution to the best solution. If the trial solution is not as good as the best solution, it is not always completely discarded. Instead it is compared to curr(x) (the current solution). If it is better than the current solution, then it becomes the current solution, and a new trial solution is generated from it. However, even if it’s worse, it is still sometimes retained; this is how the algorithm escapes a local optimum. The decision to keep or discard an unpromising solution is determined by the annealing function and a randomly generated number in steps 11 through 14 in Listing 1.

There are actually several variations of the criteria to keep or discard, but the most common involves the annealing function:
anneal = exp((currCost-trialCost)/t(i))
where t(i) is the annealing temperature, which decreases with the iteration counter i as the search cools down. Note that whenever trialCost is less than the currCost, it doesn’t matter what the annealing temperature is. The annealing function will be greater than one, so you don’t really need steps 8 through 10; they are just shown to clarify what’s going on.

The procedure for setting an initial value for the annealing temperature and adjusting it as the algorithm progresses is a matter of debate, and it often depends on the application. The rule of the thumb that I use is to set the initial temperature to 5 percent of the first generated trial solution cost in step 5. I then decrease temperature by 10 percent of its current value in a step-wise fashion 20 times during the algorithm. For example if MaxIterations is 1,000, then I reduce the temperature by 10 percent of its current value every 50 iterations. This is a good starting point, but you will probably want to fine-tune this for your own application. Also, you don’t have to use a fixed number of iterations as the search-stopping criteria. You can use the annealing function or the bestCost value to decide when to stop. However, I think that using either a fixed number or a user-defined number of iterations is a good choice for many commercial applications, because it makes performance of the application more predictable.

Example Problem

Sales of time-share condo units are an increasingly important source of revenue for today’s ski resorts. Most of these time-share contracts only allow the time-share owner to occupy a specific condo unit during a specific one-week period each year. However, some contracts are more flexible and allow the time-share owner to select a different week every year.

In this example, each condo building contains four identical condo units. Each condo owner owns 1 week during a 16-week ski season for a total of 64 owners per building. During any given week, 4 different owners and their family members occupy the building. Due to local fire and safety regulations, the maximum number of occupants (owners plus family members) in each building is 22.

The owners do not buy a fixed week. Rather they must request a specific week each year, and they may not get their first choice, because 5 of the 16 weeks are twice as popular as the others. They are each obliged to make first, second, and third choices several months in advance of their ski trip. When an owner doesn’t get their first choice, the entire family is compensated with a number of free lift tickets. Two free lift tickets for each family member when you get a second choice. Four free lift tickets for a third choice. In order to best satisfy the owners and also to sell more tickets, the resort seeks to minimize the total number of free lift ticket giveaways.

It is possible for an owner to get none of his three choices, so a cost must be associated with this possibility. When this happens, the owner is given the choice to stay in a renter’s building for that week or to accept the algorithm’s assignment along with a cash incentive and seven days of free skiing. The cost associated with each possible outcome is embodied in the CalcCost function shown in Listing 2. CalcCost arbitrarily assigns a very high cost of 1,000 lift tickets for making no assignment at all. This is a commonly used technique in optimization for enforcing constraints that can be difficult to satisfy. In the example data, some owners have up to eight family members making it difficult to combine large families into one building and still satisfy the 22-person-per-building capacity constraint. Using a penalty cost such as this allows the algorithm to move more freely through the solution space by not strictly enforcing the constraint at all times.

Developing the Heuristic

In order to come up with a heuristic for generating good solutions to this problem, I thought about how I would assign the owners if I were doing it with pencil and paper. Large families will obviously be harder to assign than small families because of the building capacity constraint and because of the larger number of free tickets. So I would assign the largest families first. After everyone is assigned, I would scan the list looking for the most dissatisfied families whose assignments could be swapped with other families to reduce the total cost of the lift tickets.

The main body of TimeShareSolver.cpp is show in Listing 3. Initialize generates a very simple initial solution where all 64 owners are initially unassigned at a cost of 64,000 lift tickets. In the main loop, AssignOwners assigns the owners one at a time, selecting the largest families first. SwapOwners looks for high-cost assignments and tries to swap them. At that point, the new trial solution has been established. EvaluateCost determines if the new solution is better than the best solution so far or at least whether it should be saved as the current solution, using the annealing function. Logically, this completes one iteration, and you may or may not have a new current solution at this point. In any event, the algorithm will generate a new trial solution from our current solution on the next iteration. Generating this new solution begins with the UnassignOwners function, which unassigns 20 percent of the owners so that the process can start all over again. Tearing apart the current solution in order to generate a new trial solution is typical in simulated annealing and is often referred to as “perturbing” the solution.

The Importance of Randomness

Randomness in not only used in the evaluation of the annealing function. Almost every import logical decision in this algorithm has an element of randomness in it. Without randomness, the algorithm wouldn’t make any progress beyond the first iteration. For example, if AssignOwners always chose the very largest unassigned family, and SwapOwners always chose the highest cost assignment, then the algorithm would continually generate the same solution over and over again at each iteration. This solution would almost certainly be sub-optimal. This is a characteristic of greedy algorithms. SwapOwners avoids being too greedy by not always choosing the highest cost assignment. Instead it always calculates the top three high-cost assignments, and then it selects one at random.

With simulated annealing algorithms, adding the right amount of randomness to the search nearly always increases their power. This seems counterintuitive to deterministic types of people like software developers. I’ve been working with SA algorithms for a couple of years, and I am still (pleasantly) surprised whenever I observe this. It seems like I’m getting something for nothing.

Incorporating this kind of randomness into the logic would be very cumbersome without the help of a couple of classes that are specifically designed for this purpose. RandomMin and RandomMax are two classes that I use to control the randomness of the search. The purpose of these two classes is to select one of the lowest or highest values in a series of index, value pairs. A code fragment from SwapOwners demonstrates how this class is used.
RandomMax hiCost;
hiCost.Initialize( 3);
for (i=0;i<m_ownerCnt;i++)
{
   if (tryToSwap[i])
      hiCost.Compare(i, m_trialCost[i]);
}
i = hiCost.RandomSelectIndex();
The sample size is set to 3 using the Initialize function. This tells the hiCost object to keep track of the three highest costs. Index, value pairs are fed to the hiCost object using the Compare function. Finally an owner index is selected from among the three highest cost owner assignments.

Several important considerations are built in to RandomMin and RandomMax. Consider the following series of index, value pairs and suppose you want to select from among the three highest values.
0,  1.3000000000001
1,  1.5000000000000
2,  1.2999999999998
3,  1.7500000000000
4,  0.5000000000000
5,  1.0000000000000
6,  1.2999999999999
At first glance, the three highest values appear to be:
3,  1.7500000000000
1,  1.5000000000000
0,  1.3000000000001
Actually there is no significant difference between items 0, 2, and 6. The difference amounts to round-off error. Items 2 and 6 must somehow be included with the other three values before you select one at random. RandomMax stores index, value pairs of same value items as a group. It expands the sample size only when the size of the lowest value group is greater than one, but weights each member of this group lower when making the final random selection. A partial listing of RandomMin.cpp showing the Compare function is given in Listing 4.

Round-off error is an important consideration when each decision is based on a million or more previous calculations. In order to get repeatable results, it must be handled correctly. RandomMin and RandomMax do this automatically. Round-off error is not an issue in this example problem because all of the costs are integral. However, there is one repeatability issue that the time-share solver must deal with. The random number generator must be properly seeded before each run. This is done in the CTimeShareSolver::Initialize function.

Applying too much randomness can have a negative effect on the search. In general, the algorithm applies more randomness during the initial construction of the solution and less as it fine-tunes the solution. The time-share solver illustrates this concept. The construction of a new solution starts with the Unassign function. Unassign removes 15 percent of the owner assignments based on highest cost. But it also removes another 5 percent purely at random without regard to cost. The swapping code applies some randomness when selecting an owner to swap, but it does not apply any randomness when selecting the other owner to swap with. It always picks the other owner yielding the lowest cost swap. This implementation was arrived at by trial and error, turning randomness on and off in different areas and observing how well the algorithm converged for a few test cases. This is easy to do with RandomMin and RandomMax. You turn randomness off by setting the sample size to one with the RandomMax::Initialize routine. When the sample size is one, it will always choose the lowest or highest cost value respectively, except in the case of a tie, where it will select an item at random. But a random tiebreaker is often better than an arbitrary tiebreaker or no tiebreaker at all.

Results

A test database was created and populated with data for 64 owners, their vacation week choices, and family sizes. The family size varied from three to eight members. Fifty percent of the owners selected one of the five most popular weeks as their first choice. Twenty percent had these same weeks as their first and second choice. Six percent selected these popular weeks for all three choices.

The maximum number of iterations for the algorithm was set to 1,000. The algorithm determined its best solution, with a cost of 224 lift tickets at the 261st iteration. The run time for the full 1,000 iterations was 4 seconds on a 750MHz Pentium with 500MB of memory. I am fairly certain that this was the optimal solution for this set of data. When I set the maximum number of iterations to 1,000,000, it ran for over an hour, and it did not find a better solution. The convergence was not as fast for other data sets; in some cases it took the algorithm over 10,000 iterations (approximately 40 seconds) to find a solution that could not be improved. But in these cases, the algorithm always had a near optimal solution (within two or three percent of the optimal) within the first 1,000 iterations. The 22-person-per-building capacity constraint was satisfied in all cases.

Some characteristics of the solution are listed below:

49 of the 64 owners got their first choice.

12 of the 64 owners got their second choice.

3 of the 64 owners got their third choice.

All 64 owners got one of their three choices.

The progress of the algorithm during these 1,000 iterations can be seen from the following list of new best solutions that it found.
Iteration  Cost
  1        2308
  2        1321
  3        1283
  4         302
  7         292
 11         286
 24         236
 98         230
261         224
During the run, a total of nine improved solutions were found. A lot of progress is typically made in the first few iterations when it’s trying to reduce the high penalty cost of unassigned owners. In the first iteration, two families were completely unassigned, but the algorithm was able to assign everyone by the fourth iteration. Gradual progress is made thereafter with the exception of one dramatic improvement at iteration 24.

Conclusion

In this article, I have shared some of my experience in developing simulated annealing algorithms. I have presented the basic framework for the algorithm and given an example of how to develop a heuristic for generating new solutions. Carefully introducing randomness into the algorithm is very important. The RandomMin and RandomMax classes should make it easier to do this. I have also pointed out that it is sometimes useful to model difficult constraints as penalty costs rather than force the algorithm to always satisfy them. Remember that you will have to fine-tune the algorithm to get good results. You can use the annealing temperature schedule that I have suggested as a guideline, but you may need to modify it for your own application. A good reference for this is [2].

Notes and References

[1] Tabu search, developed by Dr. Fred Glover at the University of Colorado, is similar to simulated annealing, but it doesn't use any randomness to generate new solutions. It generates new solutions through a process of tightening and relaxing constraints. The word tabu refers to the tightening of constraints that forces the algorithm to temporarily stay out of "forbidden" or "taboo" areas of the search.

[2] P. J. M. Laarhoven and E. H. L. Aarts. Simulated Annealing: Theory and Applications (D. Reidel Publishing, 1987).

Mark Bucci is the president of Business Analytics Corp., a software consulting firm. His main interests are mathematical algorithms and performance issues, database design, and software project management. Mark has used applied mathematics in the development of a number of different commercial software applications involving linear programming, graphs and charting software for medical instruments, energy management, business forecasting, and most recently a workforce scheduling application that uses the simulated annealing algorithm. He can be reached at [email protected].

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Optimization with Simulated Annealing

Introduction

The SA Algorithm

Example Problem

Developing the Heuristic

The Importance of Randomness

Results

Conclusion

Notes and References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Optimization with Simulated Annealing

Introduction

The SA Algorithm

Example Problem

Developing the Heuristic

The Importance of Randomness

Results

Conclusion

Notes and References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content