Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Genetic Algorithms


APR91: GENETIC ALGORITHMS

GENETIC ALGORITHMS

A new class of searching algorithms

This article contains the following executables: MORROW.ARC

Mike Morrow

Mike is a programmer at Applied Microsystems Corp. He can be contacted at 16541 Redmond Way #162, Redmond, WA 98052.


Genetic algorithms are a class of machine-learning techniques that gain their name from a similarity to certain processes that occur in the interactions of natural, biological genes. In this article, I'll outline the steps in a typical Genetic Algorithm (GA) and illustrate the concept by implementing a word-guessing application.

A GA is a method of finding a good answer to a problem, based on feedback received from its repeated attempts at a solution. The judge of the GA's attempts is called an objective function. GAs don't know how to derive a problem's solution, but they do know, from the objective function, how close they are to a better solution.

Each attempt a GA makes towards a solution is called a gene--a sequence of information that can somehow be interpreted in the problem space to yield a possible solution. A GA gene is analogous to a biological gene in that both are representations of alternative solutions to a problem. In the biological world, the problem is evolutionary survival, and a particular gene represents one possible solution to survival within a competitive environment. In the digital world, the stated problem will vary from one application program to another, as will the objective function. The coding of a GA's gene will also depend on the problem being addressed. We'll examine a couple of possible encodings in this article.

Figure 1(a) shows a gene that is a sequence of binary digits. This gene could represent one of a GA's attempts at answering the question, "What is the square root of 64?" This particular gene, with binary value 0101, represents the idea that 5 is the square root of 64. It happens to be wrong, of course. Typically, a GA will maintain a collection, or population, of genes. Each gene in the population may represent a different sequence and a different idea about how to solve the problem, thus satisfying the objective function.

Figure 1: Trying to find the square root of 64: (a) A genetic sequence, composed of 1s and 0s, represents the value 5; (b) an objective function: Gene values close to the answer get FITNESS close to 0.

  (a) 0 0 0 0 0 1 0 1
  (b) FITNESS=64-(gene_value*gene_value)

Deciding how to encode genes to represent possible solutions in a particular problem space leads to the next requirement: a suitable objective function. An objective function must be able to interpret the data contained within a gene and decide how good a solution it represents. Suppose that we are indeed trying to find the square root of 64. We need to set up an objective function that rewards higher fitness to genes that are good solutions. We don't "know" the correct answer, so we can't just compare each gene to 8. Instead, we have to plug the value a gene advocates into an equation that tells us how close the gene is to the solution. Thus, our objective function should square a gene's solution and note the difference from 64. Figure 1(b) shows how a fitness value might be derived programmatically. Good genes evaluated by this fitness function will get fitness values close to zero.

Fitness values are associated with the gene that generated them until the data within the gene changes. After an encoding and an objective function have been formulated, the person running a GA may step back and let it do the work. Through a sequence of steps, the GA will work toward making its genes more fit. Again, we find a biological analogy in the optimization steps. Each pass through the set of optimization steps is called a generation. If all is going well, we expect the overall fitness of the genes to increase as we pass through generations.

We name the optimization steps in each generation reproduction, crossover, and mutation. In reproduction, the first part of a generation, genes from the previous generation are duplicated and form the new population. We use the fitness of a gene, conferred by the objective function, to decide how likely that gene is to reproduce. Genes that are more fit are more likely to be duplicated; less fit genes have a poorer chance. However, reproduction is ruled by chance, so it is not impossible for genes to be reproduced in a proportion not exactly in keeping with their fitness. In our square root problem, we would expect that genes with fitness close to zero would be more likely to reproduce than genes with less optimal fitness. It is important that we maintain all sorts of genes in our population, both fit and unfit. The genes that are apparently unfit may contain important information that will be revealed in later generations. The fittest genes may be indicating only a locally good solution; such solutions may be less useful in future generations.

After reproduction, the genes undergo crossover, where we choose pairs of genes at random. The gene pairs then exchange parts of their sequences. The result is two new genes. Figure 2 illustrates crossover of two genes. In this example, the two genes in Figure 2(a) are crossed at the indicated points to create the two genes in Figure 2(b).

Figure 2: Two genes (a) are crossed at the indicated points to create two new genes (b).

  (a) ^0 0 0 0 1^1 1 1  FITNESS= -161
      ^1 1 1 1 0^0 1 0  FITNESS= -58400

  (b) 1 1 1 1 0 1 1 1   FITNESS= -60945
      0 0 0 0 1 0 1 0   FITNESS= -36

In performing the crossover step, we want the genes to exchange subsequences that contain good information about solutions. Hopefully, exchanging material will result in genes that are better than their ancestors. The crossover of Figure 2 actually produced one gene with better fitness, and another with poorer fitness.

The final step in a generation is mutation. This optimization entails randomly altering a very small percentage of the genetic sequences present in the population. Mutation may introduce new concepts into the population.

If the second bit of the good gene in Figure 2(b) was mutated from 1 to 0, that gene would then represent the value 8, the solution to the square root problem. Note that if our entire population were made up of the two genes shown in Figure 2, then crossover would never be able to change that particular bit from 1 to 0 -- there are no genes that have a 0 in that bit position. In cases such as these, mutation is the only operation that will produce genes of the optimal solution. We don't want to rely on mutation to save our GA, though; normally, we try to get a varied population, so mutation serves only to speed up the search.

To put it all together then, a GA can be seen as a series of steps. Initially, a random population of genes is created. Then, we attempt to optimize the fitness of the genes by running through generations of optimization steps (reproduction, crossover, and mutation). An objective function is used throughout to judge the fitness of members of the population.

A Simple GA

In this section, I'll apply the genetic algorithm presented in the previous discussion to a simple problem -- we'll have the GA learn how to form a short phrase. Let's choose the phrase "Hello there," but to make things a little simpler, we'll reduce the problem so that the GA need only consider uppercase alphabetic characters, HELLOTHERE.

The objective function to drive the GA will return the number of correct letters in a gene. The best possible fitness is 10 -- the number of characters in the target phrase. Listing One (page 86) shows an implementation of the objective function, written in C.

Figure 3(a) shows the five fittest genes in the initial, randomly generated population. It can be seen that a few of the desired letters were arrived at through chance. Though we're proud parents of our genes, we can't truthfully say that they look anything like the target phrase. The situation improves with a few generations of effort from the GA. Figure 3(b) shows the top five genes after five generations. They've started to build up subsequences with good fitness values. As time goes on, they will exchange these subsequences to arrive at still better results.

Figure 3(c) shows the five fittest genes after 15 generations. It can be seen that some of the genes have actually arrived at the solution.

Figure 3: The first (a), fifth (b), and 15th (c) generation of the word-guessing population. The first column of values is the genes' fitnesses.

  (a) 2    P M Y M Z T T E D G
      2    H R R H O S T O I Q
      2    O R T X U U H M Y E
      2    H U L M I D I L A S
      2    M K P Z S O V E W E

  (b) 6    H Y V L W T H E R B
      6    G E L L W T H H V E
      5    G A L L H T H V R B
      5    H Y V L W T H E G G
      5    G A L L W T X V R E

  (c) 10   H E L L O T H E R E
      10   H E L L O T H E R E
      9    H Y L L O T H E R E
      9    H Y L L O T H E R E
      9    H A L L O T H E R E

Actually, this form of word guessing is relatively easy for a GA. The feedback the genes got from the objective function are monotonic -- there weren't any false leads for the genes to pursue.

Other types of problems present a GA with a more difficult challenge. These problems have many "bumps." These are good, but not optimal, solutions that may attract a lot of genes. It is in these types of problems that the diversity of a GA's population is important. Rather than concentrating on a short-term, fairly good solution, a GA may look at a variety of possibilities in parallel. In the next section, we will ask our GA to find a solution to a maze.

The Maze Problem

Running a GA's genes through a maze to come up with a good solution is a harder problem than guessing a word because a maze has dead ends; a gene might be happily traversing a maze, only to turn a corner and smack into a wall. Unlike the word-guesser, the maze problem doesn't have a monotonic objective function.

As always, we need to decide on a way to encode the gene and an objective function. We can encode the gene as a set of directions: north, east, west, and south. The ideal gene would be a sequence of directions that indicated how to get from the starting point of the maze to the finish point. At each decision point in the maze, we would consult the next direction in this perfect gene to continue our trip. An example of such a gene, and its interpretation, is shown in Figure 4.

Figure 4: According to our encoding, this gene directs us to go north, east, west, south, west, and finally east again.

  N E W S W E

The choice of objective function is important for this problem. If we made the objective function the same as the word guessing objective function, then the GA would have an easy time of it. In fact, the word guessing objective function is not realistic in this setting; laboratory animals don't get to consult an oracle when they want to get to the end of the maze. Let us create an objective function that gives each gene a score based on its final distance from the end of the maze. This is something like a lab rat gauging the distance from a goal by the intensity of the appetizing aroma of the cheese there. The objective function will also count the number of direction changes made by a gene. If a gene makes the goal in fewer moves, it will get a bonus from the objective function. A C function to implement this objective function is shown in Listing Two (page 86). The complete GA system that solves the maze problem consists of a number of files. Due to space constraints, however, source code for the complete system is only available electronically (see "Availability" on page 3).

The final player in this drama of directions is the maze. Figure 5 shows the maze without any inhabitants. Although we won't show them in future maze displays, coordinates have been assigned to each row and column in the maze. This will aid the objective function in evaluating genes, but is just another detail to us.

Notice that we've made the genetic sequences longer than necessary. We give the genes a little slack so they can make some dumb decisions at first and still have hope of reaching the goal. We do reward shorter paths, so as the genes get smarter, the extra length will not be necessary.

The population size is another important parameter in the maze-running GA. If we have too small a population, we'll lose genetic diversity and it is probable that the genes will settle into a comfortable spot close to the solution without thought of venturing out to get to the real solution. We live in the real world, however, so we can't have an excessively large population because of MIPS restrictions. I settled on a population of 100 genes; population size is an interesting parameter for experimentation.

So, what happened when we slipped the chain and let loose the genes into the maze? Figure 6 shows some of the initial, random population. In the graphical representation, genes that share a position in the maze overwrite each other, so only one shows up in any particular spot where genes have congregated. You can see what is actually at that spot by consulting the list of genes on top of the picture of the maze.

After 20 generations, the fittest genes are as shown in Figure 7. Note that many genes have stumbled onto a spot that is physically close to the end. Unfortunately, some are blocked from going further because of intervening walls. Other genes have lower scores, but are actually on the right path to get to the best solution. In future generations, their genetic material will be valuable as building blocks for really good genes.

After 40 generations, Figure 8 shows that the top genes are those that have arrived at the destination. Now the difference in fitness is decided by how long they took to get there. Genes that went on a more direct path have higher value in this application.

If we continued to watch generations go by, we would see genes that used fewer moves than those in the current generation.

Conclusion

Our GA managed to traverse the maze after a number of generations. There are computationally easier ways to traverse mazes, but traditional methods require the algorithm to know about mazes. The GA learned about the maze; it actually had no a priori knowledge of what it was doing. For this reason, the GA maze runner is an interesting variation.

This article has just shown a bit of the type of things people are doing with genetic algorithms. If you are interested in learning more about genetic algorithms, a good book to consult is Genetic Algorithms in Searching, Optimization, and Machine Learning by David E. Goldberg (Addison Wesley, 1989).


_GENETIC ALGORITHMS_
by Mike Morrow


[LISTING ONE]
<a name="00e1_000c">

/*** GASystem -- Mike Morrow -- Objective function **/

#include "ga.h"

void objinit()
{
}

FIT_TYPE objective(s, len)
SEQ s;
int len;
{
   FIT_TYPE n;
   unsigned int i;
   static char tgt[] = "HELLOTHERE";

   n = 0;
   for(i = 0; i < len && i < sizeof tgt - 1; i++)
      if(tgt[i] == s[i])
         n++;

   return n;
}

void objshow(s, len, fitness)
SEQ s;
int len;
FIT_TYPE fitness;
{
   printf("%d      ", fitness);
   while(len--) printf(" %c", *s++);
   puts("");
}

void objdumpdone()
{

}




<a name="00e1_000d">
<a name="00e1_000e">
[LISTING TWO]
<a name="00e1_000e">

/*** GASystem -- Mike Morrow -- Objective function.  Evaluates a set of
* directions as applied to a maze. The closer the set of directions gets to
* the end of the maze, the higher the fitness of that set of directions.
**/

#include "ga.h"
#include <stdio.h>

/** A set of directions is made up of the following **/
#define NORTH   0
#define   EAST   1
#define WEST   2
#define SOUTH   3

/** Define the maze **/
#if MSDOS
#define BLOCK   (char) 178      /* block char on PC */
#define SPACE   ' '
#else
#define BLOCK   '@'
#define SPACE   ' '
#endif

#define   _         BLOCK,
#define A         SPACE,

#define   MAZEX   17
#define MAZEY   13

typedef char MAZE[MAZEY][MAZEX];
typedef char DISPLINE[80];

static CONST MAZE maze =
{
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   _ A A A A A A A _ _ _ _ _ _ _ A _
   _ A _ _ A _ _ _ _ _ A A _ _ _ A _
   _ A _ _ A A A A A A A _ _ A A A _
   _ A _ _ A _ _ _ A _ _ _ _ A _ _ _
   _ A _ _ _ _ _ _ A A A _ _ A _ A _
   _ A _ _ A _ _ _ _ _ _ _ _ A _ A _
   _ A A A A A A _ _ _ A A A A _ A _
   _ _ _ _ A _ _ _ A _ A _ _ A _ A _
   _ A _ _ A _ A A A _ A A _ A A A _
   _ A A A A _ _ _ A _ _ _ _ _ _ A _
   _ A _ _ A A A A A A A A A A A A _
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
};

/** Maze runners start at this point in the maze **/
#define MAZESTARTX   10
#define MAZESTARTY   11

/** Maze runners' goal is this point in the maze **/
#define MAZEENDX   7
#define MAZEENDY   1

/** How far from the MAZEEND is this set of points (x, y)? **/
#define DIST(x, y) ((MAZEENDX - x) * (MAZEENDX - x) + (MAZEENDY - y)
                                                             * (MAZEENDY - y))

/** What is the longest distance in a maze this size? **/
#define MAXDIST ((MAZEX * MAZEX) + (MAZEY * MAZEY))




#if __STDC__ || __ZTC__
static int mazerun(SEQ s, int len, unsigned int *xp, unsigned int *yp);
static int xroads(int x, int y);
#else
static int mazerun();
static int xroads();
#endif

void objinit()
{
   char exebuff[80];

       /** set parameters.  High allele should be SOUTH, low allele
       * should be NORTH. **/
   sprintf(exebuff, "HIA = %d", SOUTH);
   exec(exebuff);

   sprintf(exebuff, "LOWA = %d", NORTH);
   exec(exebuff);

        /** For starters, give a gene this many elements. User may want to
        * experiment with this parameter anyway. **/
   sprintf(exebuff, "LEN = 15");
   exec(exebuff);
}

/*** This function evaluates a gene's sequence of directions. It returns
* a fitness value. ***/
FIT_TYPE objective(s, len)
SEQ s;
int len;
{
   FIT_TYPE dist;
   unsigned int x, y;
   int n_moves;

   /** Run through maze using directions in s. x and y will get final position
   * we reach. n_moves will get number of moves it took to get there. **/
   n_moves = mazerun(s, len, &x, &y);

   /** The fitness of that path through maze is distance from MAZEEND.**/
   dist = DIST(x, y);

   /** Convert: lower distances imply higher fitness value. **/
   dist = (FIT_TYPE)MAXDIST - dist;

   /** Scale down result.    **/
   dist = (dist * dist) >> 12;

   /** Plus a bonus for brevity.    **/
   dist += 5 * (FIT_TYPE) (len - n_moves);

   return dist;
}

static CONST char i_to_c[] = "NEWS";

#define N_PER_DISP_BLOCK   5

static DISPLINE displines[N_PER_DISP_BLOCK];
static int n_lines = 0;
static MAZE disp_maze;

void objshow(s, len, fitness)
SEQ s;
int len;
FIT_TYPE fitness;
{
   unsigned int x, y;
   int n_moves;
   DISPLINE buff;

   if(! n_lines)
      memcpy(disp_maze, maze, sizeof maze);

   n_moves = mazerun(s, len, &x, &y);

   disp_maze[y][x] = '0' + n_lines;

   sprintf(displines[n_lines], "%6d    ", fitness);
   while(len--)
   {
      if(*s > SOUTH)
         sprintf(buff, " %d!", *s++);
      else
         sprintf(buff, " %c", i_to_c[*s++]);
      strcat(displines[n_lines], buff);
   }

   sprintf(buff, " : (%2d, %2d); %2d moves", x, y, n_moves);
   strcat(displines[n_lines], buff);

   n_lines++;
   if(n_lines == N_PER_DISP_BLOCK)
      objdumpdone();
}

void objdumpdone()
{
   unsigned int i, x, y;

   if(! n_lines)
      return ;

   for(i = 0; i < n_lines; i++)
   {
      printf("%d)   %s\n", i, displines[i]);
   }

   puts("");
   for(y = 0; y < MAZEY; y++)
   {
      printf("      ");
      for(x = 0; x < MAZEX; x++)
      {
            putchar(disp_maze[y][x]);
      }
      puts("");
   }
   n_lines = 0;
   puts("\n\n");
}

/** Run through maze with directions given in s.  *xp and *yp are set to final
* coords that we end up with. This function returns number of moves it took to
* run maze. It will terminate when moves in s are used up, or when we arrive
* at the end of maze. **/
static int mazerun(s, len, xp, yp)
SEQ s;
int len;
unsigned int *xp, *yp;
{
   register int x, y;
   register SEQ dirs;
   int n_moves;


   x = MAZESTARTX;
   y = MAZESTARTY;
   dirs = s;
   n_moves = 0;

        while(len-- && ! (x == MAZEENDX && y == MAZEENDY))
        {
                switch(*dirs++)
                {
                        case NORTH:
                             while(maze[y - 1][x] == SPACE)
                             {
                                y--;
                                if(xroads(x, y))
                                break;
                              }
                         break;

         case EAST:
            while(maze[y][x + 1] == SPACE)
            {
               x++;
               if(xroads(x, y))
                  break;
            }
            break;

         case WEST:
            while(maze[y][x - 1] == SPACE)
            {
               x--;
               if(xroads(x, y))
                  break;
            }
            break;

         case SOUTH:
            while(maze[y + 1][x] == SPACE)
            {
               y++;
               if(xroads(x, y))
                  break;
            }
            break;

      default:
       printf("Error in objective(), got allele = %d!", *(dirs - 1));
            break;
      }
      n_moves++;
   }
   *xp = x;
   *yp = y;

   return n_moves;
}

/** If this is a cross roads in maze, i.e. there are more than two exits
* from the current cell, then return TRUE. **/
static int xroads(x, y)
int x, y;
{
   char exits;

   exits = (maze[y][x+1] != SPACE) + (maze[y][x-1] != SPACE) +
      (maze[y+1][x] != SPACE) + (maze[y-1][x] != SPACE);
   return exits < 2;
}


Copyright © 1991, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.