Kenneth V. Price
by Bruce Schneier
How do you determine the best machine for a job?
Step #1. Express all the parameters of your machine using some kind of coding machine. Each particular machine will then be expressed as a list of these parameters.
Step #2. Generate some machines with random parameters.
Step #3. Test the machines against each other and select the few best ones.
Step #4. Generate new machines by combining parameters from the ones selected in Step #3, occasionally making random modifications in some of the parameters.
Step #5. Repeat Steps #3 and #4 until you're tired of watching the show.
What I've just described is a genetic algorithm, and it's done a pretty good job with life on this planet. Generation after generation of iterations, selecting the best few to reproduce and occasionally throwing in the odd mutation, has resulted in millions of different species. The end result is that there are lifeforms suited for every niche in every environment.
Genetic algorithms might not be the fastest way to generate the "best" machine for a job (it would be a lot easier to design a sentient lifeform from scratch than to wait for evolution to stumble across a human), but their main advantage is that they require no knowledge about the system. For instance, the traditional way of generating an aircraft wing is to study aerodynamics and spend months calculating airflows, lifts, stresses, and the like. Alternatively, you could just express a generic wing as a string of parameters (size, shape, weight, and so on), create some random wings, test them against each other, and let the fittest reproduce. A weekend of iteration on a supercomputer would probably yield a pretty good design.
Solving problems such as these falls into the category of "combinatorial" or "global" optimization. Other approaches to finding the "best" (according to some predefined criteria) configuration involve simulated annealing, a stochastic-search technique familiar to chip designers, who must determine the best geometrical arrangement for millions of circuits on microprocessors.
DDJ has examined both genetic algorithms and simulated annealing before (see "Genetic Algorithms" by Mike Morrow, April 1991 and "Simulated Annealing" by Michael P. McLaughlin, September 1989). In this month's column, however, Kenneth Price combines the two techniques, resulting in an approach called "genetic annealing" which takes the best from both the genetic and annealing worlds.
Genetic annealing is a hybrid, random-search technique that fuses simulated annealing and genetic methodologies into a more efficient algorithm. Like its genetic counterpart, the genetic-annealing algorithm creates new solutions by exchanging genetic material between members of a population. Instead of using a competitive tournament to decide which members will survive into the next generation, however, genetic annealing uses an adaptive, thermodynamic criterion based on a simple feedback scheme. With this strategy, the genetic-annealing algorithm can outperform the algorithms whose principles it embodies even though it is hardly more complex than running a suite of greedy algorithms in parallel.
To evaluate the effectiveness of a random-search algorithm, it helps to use a problem whose global minimum is already known. To this end, I have chosen a simple graph-bipartitioning problem to illustrate how genetic annealing works. A graph is a collection of vertices connected by edges. Bipartitioning is the process of assigning each vertex to one of two sets. The goal of the bipartitioning problem is to assign half of a graph's vertices to each set so that the number of edges connecting the two sets is minimized. In many ways, this problem resembles the real-world task of the circuit designer, who wants to divide a set of components equally between two chips so that the number of lines needed to connect the chips is minimal.
The particular graph used in this example is commonly called a "necklace" because of its obvious resemblance to neckwear; see Figure 1(a). An [M,2] necklace is a ring of M vertices ("beads") each of which is connected radially to a companion vertex for a total of 2M vertices. As Figure 1(b) illustrates, any diameter across the necklace constitutes a bipartition since it divides the necklace into two sets, each of which contains M vertices. The bipartition generated by a diameter is optimal because only two edges cross the diameter to connect opposing sides. Because of its rotational symmetry, an [M,2] necklace has M optimal bipartitions in all.
For the purposes of computation, you can represent an [M,2] necklace as a string of 2M bits. Bits at even positions are the beads on the ring itself, and bits at odd positions represent the dangling beads; see Figure 2. The actual binary value assigned to a bead indicates the set to which it belongs.
To count the number of edges that span sets, perform an XOR operation on every pair of bits that represent connected vertices and sum the results. When the vertices at the ends of an edge belong to the same set, the bits they contain will be identical, and the XOR operation will return a 0. Conversely, a pair of connected vertices with different binary values represents an edge with a vertex in each set. The XOR operation counts these spans by returning a 1.
To simulate a heat bath, you will need an entire population of bit strings. The number of bit strings depends on the problem: for this example a population of 20 strings is adequate. For convenience, you can arrange the bit strings into a two-dimensional array so that one coordinate locates a string within the population while the other coordinate gives you the position of a bit within a string.
Next, you initialize the bit strings by filling each one with an equal number of 1s and 0s. In a genetic-annealing experiment, you must randomly select the initial population to ensure that the system starts out like a "white-hot" thermodynamic ensemble. Starting from this condition makes it less likely that the population will prematurely cool to a suboptimal minimum. To randomize a bit string while keeping its populations of 1s and 0s equal, try swapping each bit with another bit from a randomly selected location within the same string.
The number of edges having a vertex in each set measures the quality of a bipartition as a solution. In the context of a genetic algorithm, this number reflects the "fitness" of a bit string to survive as a "gene" in a competitive selection procedure. By contrast, simulated annealing uses the fitness of a configuration-like energy in order to exploit the laws of statistical mechanics. Genetic annealing adopts the annealing metaphor, treating a configuration's fitness-like energy, even though the annealing process itself is driven by population dynamics.
Like its component algorithms, genetic annealing is a random-search technique. This class of algorithm tries to improve an existing configuration by subjecting it to trial mutations. A mutation is any procedure that alters either the structure of a configuration or the data that it holds. One of the keys to a successful random search is knowing when a mutation produces an acceptable improvement. For example, under the "greedy" criterion, a mutant is deemed acceptable whenever its energy is less than or equal to the energy of the configuration from which it was derived. If it is accepted, the mutant supplants its progenitor and becomes the target for subsequent mutations. This way, each improved mutant lowers the acceptability criterion until configurations with lower energies fail to turn up.
In the genetic-annealing approach, you assign an energy threshold to each bit string. Initially, each threshold equals the energy of the randomized bit string to which it is assigned. Unlike the greedy criterion, a bit string's threshold, not its energy, determines which trial mutations constitute acceptable improvements. If the energy of a mutant exceeds the threshold of the bit string that spawned it, you reject the mutant and move on to the next bit string. However, if its energy is less than or equal to the threshold, you accept the mutant as a replacement for its progenitor.
While the greedy algorithm accepts a better configuration without regard to how much better it is, the genetic-annealing algorithm uses this information to drive the annealing process. Genetic annealing uses an "energy bank," represented by the real variable DE, to keep track of the energy liberated by successful mutants. Whenever a mutant passes the threshold test, you add the difference between the threshold and the mutant's energy to DE for temporary storage. Once you account for this quantum of heat, you reset the threshold so that it equals the energy of the accepted mutant and then move on to the next bit string.
After each bit string has been subjected to a random mutation, you "reheat" the population by raising each threshold a little. The size of the increase depends both on the amount of energy accumulated in the energy bank and on the rate at which you want to cool the population. If you let N equal the number of bit strings in the population, then the average contribution to the energy bank is just DE/N. To fully reheat the population, all you need to do is add DE/N to each threshold. Annealing results from repeated cycles of collecting energy from successful mutants (spontaneous cooling) and then redistributing nearly all of it by raising the threshold energy of each population member equally (uniform reheating).
After they have been reheated, thresholds are higher than the energies of the bit strings to which they have been assigned. This means that sometimes you are forced to accept a mutant even though its energy is not as low as the energy of the bit string it replaces. Replacing a bit string with a worse one may seem counter-productive, but these occasional reversals of fortune provide floundering bit strings with a helpful energy boost. In essence, the entire population acts like a giant heat reservoir that exchanges energy among its members. Less successful bit strings can escape suboptimal configurations by borrowing the energy they need from the more successful versions.
To relax the bit strings into their optimal condition, they must be cooled very slowly. In a genetic-annealing program, you control the rate of cooling with the cooling constant, C--a real number in the closed interval [0,1] that represents the fraction of DE that is returned to the population. For example, C=1 holds the population at a constant "temperature" by using 100 percent of the energy stored in DE to reheat thresholds. By contrast, C=0 releases all of DE's stored energy from the system and leaves thresholds unaltered. In effect, C=0 sets up a suite of greedy algorithms since each threshold is always equal to the energy of the bit string to which it's assigned. Most problems of consequence require very slow cooling. Typically, C ranges from 0.9 to 0.99 and beyond, although for a given problem, the optimal choice of C depends on a variety of factors. Pseudo-code for the genetic-annealing algorithm is shown in Figure 4.
You can also use the genetic-annealing algorithm to "cool" a population of configurations to a condition of maximum energy. All you need to do is replace the "greater than" symbol with a "less than" symbol in the portion of code that determines whether or not a mutant is acceptable. This has the effect of reversing the sign of the spontaneous energy so that during the reheating cycle each threshold is lowered by the amount dE=C*DE/N. This feature lets you use a problem's natural measure of fitness without having to invert it. Figure 3 shows what the maximum energy solution to the necklace problem looks like for M=8.
This energy bank approach to simulated annealing can substantially reduce the time you need to design and execute difficult combinatorial optimizations. Traditional annealing methods invoke the venerable Metropolis algorithm to forge a link between the laws of statistical mechanics and combinatorial optimization. Despite its great utility, the Metropolis algorithm is computationally expensive because the decision of whether or not to accept a mutant usually requires that you generate a random number and compare it to an exponential term. Configurations that are not much worse than their progenitors may be rejected depending on the random number generated. The genetic-annealing algorithm accepts every configuration that is not much worse than its progenitor based on the result of a simple comparison that requires neither a random-number generator nor the use of acceptance probabilities.
Traditional annealing methods also rely on an empirically derived "annealing schedule" to control the rate at which the Metropolis temperature should be lowered. The advantage of using thresholds to track the time-averaged loss of spontaneous energy from individual configurations is that you can maintain equilibrium at any temperature without using an annealing schedule just by restoring energy to each threshold at the same average rate that the ensemble loses it. In most cases, this reduces the researcher's challenge to finding the smallest value of C that will allow the population to maintain equilibrium as it anneals.
Choosing the right mutation scheme is just as important to the success of a random search as determining which mutants are acceptable. A mutation can be an elementary operation like flipping a bit, or it can be a more complex procedure like a symmetry operation or crossbreeding. In general, each problem will have its own menu of mutations. Frequently, mutations are subject to constraints based on a problem's symmetries. For example, mutations used in the bipartitioning problem must preserve the equal number of 1s and 0s in each bit string.
Of all the ways you can alter a bit string without changing the number of 1s (or 0s) that it contains, swapping bits is perhaps the easiest. To swap a pair of bits, just randomly select two bits with different binary values and replace each with its one's complement.
While it may be the simplest form of mutation, swapping a single pair of bits provides you with only a limited search capability. You can explore remote regions of solution space more effectively if you occasionally swap more than one pair of bits at a time. For example, you can usually enhance the efficiency of a random search by drawing the number of elements involved in a mutation from an exponential distribution. You can characterize an exponential distribution by its decay constant, EX, where EX is a real number in the half-open interval [0,1). In the case of swapping bits, EX=.5 means that a single pair of bits is swapped half of the time, two pairs of bits are swapped one quarter of the time, three pairs of bits are swapped one eighth of the time, and so on. EX=0 means that you never swap more than one pair of bits. With the right decay constant, an exponential distribution of mutation sizes makes distant points in solution space more accessible without sacrificing the ability to efficiently fine tune a configuration.
In most problems, a well-chosen symmetry operation can make a valuable addition to your mutation scheme. When appropriate, operations like rotations and reflections can generate large-scale variations of a configuration without des-troying crucial local relationships. For example, a reflection operation in which the first four bits of the sequence: 0000111100001111 are exchanged with the second four bits, produces the optimal configuration: 0000000011111111. Despite altering the target string on a relatively large scale, the reflection procedure not only conserves the number of 1s, but also maintains the XOR relationships between the bits within the reflected substring. To accomplish the same transformation in a series of random swap operations would require considerable good fortune. Despite their utility, symmetry mutations reduce the novelty of a "random" search, so you should not rely on them exclusively.
Perhaps the greatest advantage that cooling a population in parallel affords you is the opportunity to employ crossbreeding as a form of mutation. Actually, instead of mating pairs of bit strings in a separate cross-breeding procedure (as you do in the genetic algorithm), the genetic-annealing algorithm transfers genetic information between bit strings by "splicing" it. Splicing tentatively replaces a portion of the target string with the corresponding section from another bit string chosen at random from the population. In the splicing scenario, the randomly chosen string remains unaltered while it donates a copy of a segment of its "genetic material" for the target string to use as a trial mutation; see Figure 5. By drawing upon the success of other configurations in the population, splicing brings to the genetic-annealing algorithm all of the problem-solving power that crossbreeding imparts to the genetic algorithm.
You must be careful when implementing a splicing procedure for the bipartitioning problem because the number of 1s in the substring being donated must be the same as the number of 1s in the target substring. To ensure that 1s are conserved, the size of the donated substring is allowed to grow until the number of 1s it contains equals the number of 1s in the corresponding target substring and until the two substrings differ in at least one bit position, or until the whole string is used.
The mutation scheme in GENNEAL.C (described later) includes all three forms of mutation: random bit swapping, a reflection operation, and splicing. PS, PR, and PX are the probabilities that a mutation will swap random bits, reflect a substring, or splice a substring, respectively. Of course, PS+PR+PX=1. The size of a swap mutation is controlled by EXS, while EXR determines the size of a reflection mutation.
Along with the mutation probabilities and their respective decay constants, other control variables include the population size, N, and the cooling constant, C. Given a necklace of size M (for a minimum energy of 2, make M even), what combination of these variables will repeatedly produce an optimal partition with a minimum of computational effort?
Since run times on a sequential computer are proportional to population size, you will usually want to reduce N until further reductions begin to jeopardize the population's ability to simulate a heat bath. A population containing anywhere from 10 to 40 configurations should be sufficient for most problems. Smaller populations hamper the annealing process by failing to provide energy when it is needed, while larger populations increase execution times while enhancing the thermal simulation only incrementally.
The robust flexibility of the genetic-annealing algorithm makes it easy to experiment with a variety of computational approaches to find the one that works best. You can transform the genetic-annealing algorithm into a greedy algorithm, a suite of annealing programs, or a genetic-style algorithm simply by changing a few control constants. When you approach a new problem, try starting with N=20 and run a suite of greedy algorithms in parallel by setting C=0 and PX=0 (no annealing or splicing). These results will provide you with a convenient performance benchmark.
Although it occasionally turns up an optimal configuration, the greedy algorithm performs poorly because it is not consistently successful. The true performance of the greedy algorithm, like any stochastic search, is more reliably gauged by the average and variance of an ensemble of results.
In a series of ten trials with N=20 and M=80, the greedy version of the genetic-annealing algorithm (in which C, EXS, PR, and PX are all zero), produced an average minimum energy of 6.62 and a variance of 4.63. In each trial, the program was allowed to run for 256,000 generations. When each mutation swapped only one pair of bits (EXS=0), just four out of the total of 200 configurations (10 trialsx20 strings per trial) reached the optimal energy of 2. Increasing EXS produced a modest improvement. A series of ten trials that used EXS=.35 turned up six optimal bit strings and dropped the average final energy to 6.37. For the sake of simplicity, the random-swap procedure uses a constant value of EXS=.35 throughout the remainder of this demonstration.
As expected, results improved significantly once the population was annealed. With N and C both greater than 0 and no splicing, this version of the genetic-annealing algorithm resembles a suite of annealing algorithms running in parallel. Initial trials, which used C=.84, drove the average energy down to 4.83, reduced the variance to 2.61, and produced 21 optimal bit strings. The next set of trials used C=.92, which cooled the population at about half the rate that C=.84 did, because annealing times are roughly proportional to 1/(1--C). With C=.92, the average final energy dropped to 4.51, and 28 bit strings found their way to an optimal state. Cutting the cooling rate in half again gave still better results, but the returns were clearly diminishing. C=.96 lowered the average final energy to 4.43 and produced a total of 38 optimal bit strings. When the cooling rate was halved yet again to C=.98, the average final energy rose to 4.52, indicating that cooling was so slow that even 256,000 generations were not enough to completely quench the population.
In the absence of the reflection mutation, schemes that used only a minuscule fraction of splicing performed best. If splicing is used too frequently, the population will misconverge. Annealing, however, can overcome misconvergence. For example, the "greedy-genetic" scheme: C=0, PX=.0002, PS=.9998 produced optimal results in only five of ten trials with convergence occurring at about 100,000 generations. Annealing with C=.84 took a little longer but misconvergence abated. By 120,000 generations, all 200 bit strings had found an optimal state.
The real surprise comes when you use a scheme that uses both long reflections and splicing. In particular, the scheme EXR=.95, PR=.3, PX=.7 not only produced perfect results in ten out of ten trials without annealing, but also converged on average in only 280 generations! This level of performance is somewhat atypical, due primarily to the reflection mutation's ability to speed equilibration across a bit string when large values of EXR are used. Some annealing is needed when you bipartition larger necklaces and/or use a less effective mutation scheme.
In the genetic-annealing approach, a synergy exists between splicing and annealing. Annealing controls the rate of convergence, tolerates error, and can endure high rates of random mutation. All these factors help alleviate the tendency of the genetic algorithm to misconverge. Furthermore, splicing genetic material empowers the annealing algorithm by providing the means to efficiently compare competing bit strings that are widely separated in solution space. Run times benefit from this synthesis, since you can exploit the superior search capabilities of splicing without having to maintain a large population to ensure genetic diversity. Add to this the increase in computational speed made possible by using thresholds instead of the Metropolis algorithm, and you can improve run times substantially when compared to traditional genetic and annealing techniques. Of course, you can also improve performance by running the genetic-annealing algorithm on a parallel computer, especially one using the SIMD architecture, since each configuration can then be assigned its own processor.
GENNEAL.C is a bipartitioning program available electronically; see "Availability," page 3. I've kept code simple for the sake of clarity and as a result, several routines have not been optimized. For instance, GENNEAL.C computes the energy of a mutant by performing an XOR operation on every pair of connected vertices, even though most mutations affect only a small part of the target string. It is faster to compute the energy of a mutant by reevaluating just those links of the target string that have been altered. Similarly, if a mutation fails, you need to restore only that part of the target string affected by the mutation. GENNEAL.C copies the whole bit string and then restores it in its entirety, if necessary.
You should now have enough information about how genetic annealing works to experiment on your own. If you currently use either a traditional genetic or simulated annealing algorithm, give genetic annealing a try. I would enjoy hearing from those of you who do the comparison.
I would also like to acknowledge Margaret E. Burwell for her valuable assistance in the preparation of this article.
Figure 1 (a) The "necklace" graph; (b) any diameter across the necklace constitutes an optimal bipartition.
Figure 2 Bits at even positions are the beads on the ring itself; bits at odd positions represent the dangling beads.
Figure 3 Maximum-energy solution to the necklace problem for M=8.
1. Randomly select an initial population of N configurations. 2. For i=1 to N: Initialize the i<I>th</I> threshold, Th[i], with the energy of the i<I>th</I> configuration. 3. DE=0 /* Empty the energy bank */. 4. For i=1 to N: /* Begin cooling loop */. 5. Splice or randomly mutate the i<I>th</I> configuration. 6. Compute the energy, E, of the resulting mutant. 7. If E>Th[i] then restore the old configuration. 8. If E <= Th[i] then: a.DE=DE+Th[i]--E /* Add energy difference to DE */. b.Th[i]=E /* Reset threshold */. c. Replace old configuration with successful mutant. 9. Mutate next configuration or end cooling loop. 10.dE=DE*C/N /* Compute reheating increment, dE. 0<C<1 */ 11.For i=1 to N: /* Begin reheating loop */. 12.Th[i]+dE /* Add dE to each threshold */. 13.Return to step 3 once all thresholds have been reheated.
Figure 5 The splicing scenario.
Copyright © 1994, Dr. Dobb's Journal