Generalization of a Simple Genetic Algorithm (GA)

sadavoya

4.54/5 (8 votes)

Nov 14, 2006

10 min read

47143

1224

Taking a simple genetic algorithm and constructing a framework to allow easy creation of similar algorithms.

Download source files - 20.6 Kb (MD5 Checksum: 64217d1976d39dc3c05671397457e9d3)
Download demo project - 9.59 Kb (MD5 Checksum: e6858cb72b1286386dcbb6855c19db55)

Introduction

In response to Sacha Barber's article (please read his article before you continue), I couldn't help but think about generalizing the approach. After some discussion with Sacha, he suggested that I post a follow up article. So here we go.

Disclaimer: This is my first article, please bear that in mind. I appreciate any feedback. More importantly, as I mentioned to Sacha, I'm entirely new to genetic algorithms, so please do not think of me as any kind of expert. This article is more about generalization than GA specifically. But I think it poses some interesting questions, and I hope it will prompt some discussion.

Recall Sacha's sample problem: from 10 cards labeled 1-10, create two piles A and B, where the sum of the cards in A is as close as possible to 36, and the product of the cards in B is as close as possible to 360.

Generalization

The idea would be to try to create a simple framework that will allow a person wishing to implement a genetic algorithm to do so with as little work as possible, allowing them to focus on the specifics of their fitness tests and trusting that the framework will cycle through the population properly.

I first considered the general approach Sacha described:

Genetic Cycle Part I

Select two members of the population
Determine the "winner" 1 and the "loser" 1
Randomly recombine the winner with the loser
Randomly mutate the loser

I wondered if perhaps there was any use to more generalization, so just for fun, I took it one step further:

Genetic Cycle Part II

Select N members of the population, where N > 1
Determine the W "winners" and the L "losers", where W+L = N
Randomly recombine the winners with the losers
Randomly mutate the losers

Now, I don't know how relevant this second generalization is. I'm not even sure it will work properly for N > 2. Sacha suggested that it might end up with a population that lacks sufficient diversity to find a solution:

The only thing you would have to watch is that after some time (n-times round the loop), the population may end up being too similar if you keep a few losers in. The result of which could be that the dominant winner is not different enough, and recombination and mutation won't make enough difference in actually creating an organism which could solve the problem domain.

With that caveat in mind, let's continue.

Why generalize? Well, this - i.e., GA - seems like it's a problem that comes up a lot, so an approach that is easy to reuse would seem appropriate. My idea is to have a base class that works with any N, W, and L, and then make derived classes to suit the situation. Specifically, I made a derived class with N=2, W=1, and L=1, and I called it BinaryGeneticAlgorithm. Then, I made a derived class of BinaryGeneticAlgorithm called "CardProblem". It is hoped that CardProblem and similar classes are now significantly easier to create.

`IGenotype`

Before we get to the base classes, we need an interface to work with. IGenotype provides an interface for, you guessed it, genotypes. It just maintains a list of Gene objects that are either On or Off. (There is some additional code to make Gene objects "nice" to display in a property browser.)

public interface IGenotype
{
    [Editor(typeof(CollectionEditor), typeof(UITypeEditor))]
    List<Gene> Genes { get; }
}

Here are the On/Off flags:

[Flags]
public enum EGene
{
    Off = 0,
    On = 1,
}

The Gene class is really straightforward, so just look at the source if you want to see the details.

`GeneticAlgorithmBase`

The GeneticAlgorithmBase class defines a template for running tournaments on a population of IGenotype objects. You just run a tournament, specifying a population and a number of iterations. During the tournament, the overall winners are kept, and each successive winner is compared to the overall winner. I also check to see if the current overall winner satisfies the genetic requirements. If so, I stop the tournament. Here's the code:

public List<IGenotype> Tournament(List<IGenotype> Population, 
                                        int NumberOfIterations)
{
    List<IGenotype> returnVal = null;
    // This will be the overall winner for the tournament

    for (int i = 0; i < NumberOfIterations; i++)
    {
        // Run the cycle for the current genotypes
        List<IGenotype> currentWinners 
            = GeneticCycle(Population);

        // Keep track of the overall winner for the tournament
        if (returnVal == null)
        {
            // By default, the first winner is the starting overall winner
            returnVal = currentWinners;
        }
        else
        {
            // Compare the current winner to the overall winner
            // and update the overall winner
            List<IGenotype> chosen = new List<IGenotype>();
            chosen.AddRange(returnVal);
            chosen.AddRange(currentWinners);
            List<IGenotype> Winners, Losers;
            // NOTE: This could lead to winners from returnVal intermingled 
            // with winners from currentWinners, which may not be desirable?
            this.ChooseWinners(chosen, out Winners, out Losers);
            returnVal = Winners;
            
            // Check if our overall winner fulfills the genetic requirements
            if (this.Evaluate(returnVal))
            {
                break; // We have a winner so stop the tournament
            }
        }
    }
    return returnVal;
}

The genetic cycle is modeled in a template method. First, choose two or more members from the population, then choose the winner(s) and loser(s), recombine, and mutate. Finally, return the winner(s):

public List<IGenotype> GeneticCycle(List<IGenotype> Pop)
{
    // Choose two Genotypes from the population
    List<IGenotype> Chosen = this.PickGenotypes(Pop, 
                                      this.m_NumberOfSelectees);

    // Determine which of the two is the winning Genotype
    List<IGenotype> Winners, Losers;
    this.ChooseWinners(Chosen, out Winners, out Losers);

    // Infect the loser's genes with some portion of the winner's genes
    this.Recombine(Winners, Losers);

    // Mutate the loser's genes
    this.Mutate(Losers);

    return Winners;
}

The code to pick the genotypes to use is a little cumbersome, but basically we're choosing random positions from the population, but once we choose one, we need to make sure we don't choose it again.

private List<IGenotype> PickGenotypes(List<IGenotype> Population, 
                                            int NumberOfSelectees)
{
    List<IGenotype> returnVal = new List<IGenotype>();
    Random r = new Random();
    // We'll choose random numbers between 0 and
    // the total size of the population
    int max = Population.Count - 1;

    List<int> usedPositions = new List<int>();
    for (int i = 0; i < NumberOfSelectees; i++)
    {
        // We need to remove positions from the pool as we go along
        int newPosition = -1;
        do
        {
            newPosition = (int)Math.Round(r.NextDouble() * (double)max);
        } while (usedPositions.Contains(newPosition));
        usedPositions.Add(newPosition);

        // Select the entry from the population
        returnVal.Add(Population[newPosition]);
    }

    if (returnVal.Count < 2)
    {
        throw new Exception("Must have at least 2 chosen genotypes.");
    }
    return (List<IGenotype>)returnVal;
}

I provide a template method to quickly create a population:

public List<IGenotype> CreatePopulation(int PopulationSize)
{
    if (PopulationSize < 2)
    {
        throw new ArgumentException("PopulationSize must be greater than 2");
    }

    List<IGenotype> returnVal = 
            new List<IGenotype>(PopulationSize);
    for (int i = 0; i < PopulationSize; i++)
    {
        returnVal.Add(this.CreateGenotype());
    }

    return returnVal;
}

Finally, there's a bunch of abstract methods that inheritors will need to implement:

public abstract void ChooseWinners(List<IGenotype> Potentials, 
       out List<IGenotype> Winners, out List<IGenotype> Losers);
protected abstract void Recombine(List<IGenotype> Winners, 
       List<IGenotype> Losers);
protected abstract void Mutate(List<IGenotype> Losers);
protected abstract bool Evaluate(List<IGenotype> Genes);
protected abstract IGenotype CreateGenotype();

The first three - ChooseWinners, Recombine, and Mutate - should be obvious as they are just steps in the cycle.

Evaluate is the method which determines the genetic fitness. Note: This is not the same as Sacha's Evaluate method, which is used for comparing fitness between two genotypes. Rather, this just indicates whether the specified genotypes fulfill the solution requirements or not. For instance, in Sacha's example, the genetic requirements are as described above. More on this later.

CreateGenotype does just that - it lets the implementor create a genotype. It is used by the CreatePopulation template method.

That's it for GeneticAlgorithmBase. Let's move on to BinaryAlgorithmBase.

`BinaryAlgorithmBase`

We've already done a lot of the work in GeneticAlgorithmBase, so what do we need to do here? Well, first, we need to specify N - the number of selected members of the population. We also need to know the probability of recombination and mutation. We do this in the constructor.

public BinaryGeneticAlgorithm(double ChanceOfInfect, double ChanceOfMutate)
    : base(2)
{
    m_ChanceOfInfect = ChanceOfInfect;
    m_ChanceOfMutate = ChanceOfMutate;
    this.m_NumberOfSelectees = 2;
}

Next, we need to simplify some of the abstract methods to work with selection sizes of only two. First, we override the ChooseWinners method, selecting the first two entries from the list of chosen selected genotypes. Then, we pass them out to another abstract method, also called ChooseWinners, but this one only expects two selected genotypes, and only produces one winner and one looser:

public override void ChooseWinners(List<IGenotype> Potentials,
     out List<IGenotype> Winners, out List<IGenotype> Losers)
{
    Winners = (List<IGenotype>)new List<IGenotype>();
    Losers = (List<IGenotype>)new List<IGenotype>();
    IGenotype gWinner, gLoser;
    this.ChooseWinners(Potentials[0], Potentials[1], out gWinner, out gLoser);
    ((List<IGenotype>)Winners).Add(gWinner);
    ((List<IGenotype>)Losers).Add(gLoser);
}

Now, we override the Recombine and Mutate methods. In each case, I've pretty much used Sacha's code to randomly select which genes get recombined / mutated. Note that we use the class variables m_ChanceOfInfect and m_ChanceOfMutate to determine the random chance.

protected override void Recombine(List<IGenotype> Winners, 
             List<IGenotype> Losers)
{
    IGenotype Winner = Winners[0],
        Loser = Losers[0];
    Random r = new Random();
    for (int i = 0; i < Loser.Genes.Count; i++)
    {
        double rand = r.NextDouble();
        if (rand < this.m_ChanceOfInfect)
        {
            Loser.Genes[i].SetGene(Winner.Genes[i].Value);
        }
    }
}
protected override void Mutate(List<IGenotype> Losers)
{
    IGenotype Loser = Losers[0];
    Random r = new Random();
    for (int i = 0; i < Loser.Genes.Count; i++)
    {
        double rand = r.NextDouble();
        if (rand < this.m_ChanceOfMutate)
        {
            Loser.Genes[i].SetGene(Loser.Genes[i].Value == EGene.On
                ? EGene.Off
                : EGene.On);
        }
    }
}

The Evaluate method just delegates out to an overload that accepts a single IGenotype:

protected override bool Evaluate(List<IGenotype> Genes)
{
    return Evaluate(Genes[0]);
}

Well, that's pretty much it for BinaryAlgorithmBase. We just have a couple of abstract methods that inheritors will override, but these are pretty straightforward. We just want to make sure that our inheritors are working with two selections and no more.

protected abstract void ChooseWinners(IGenotype First, 
     IGenotype Second, out IGenotype gWinner, out IGenotype gLoser);
protected abstract bool Evaluate(IGenotype Genes);

That's it. Now we can move on to CardProblem.

`CardProblem`

So now, we come down to the actual problem we're trying to solve, and it only took 285 lines to get here. Let's hope it's worth the wait. To begin with, we'll pass some arguments to CardProblem's constructor. We need to know:

How many cards we're dealing with
The total sum we are trying to achieve (in our example, it's 36)
The total product we are trying to achieve (in our example, it's 360)
The chance of recombination
The chance of mutation

public CardProblem(int NumberOfCards, int SumTarget, int ProductTarget, 
       double ChanceOfInfection, double ChanceOfMutation)
     : base(ChanceOfInfection, ChanceOfMutation)
{
    this.m_NumberOfCards = NumberOfCards;
    this.m_SumTarget = SumTarget;
    this.m_ProdTarget = ProductTarget;
}

Next, we have some overrides to do. The first one is ChooseWinners. What are we doing? Just casting the arguments to a specific type of IGenotype called "CardGenes" (see the source code for details). Then, we're evaluating them and assigning the winner and loser. That's it. We could almost move this up to the BinaryGeneticAlgorithm level, but this evaluation looks at the double values returned by "double Evaluate(IGenotype Genes)" which may not always be desirable. BinaryGeneticAlgorithm needs to be flexible enough to work with different kinds of constraints. Since implementors are defining their own measure of fitness, it makes sense that they would define their own method of comparison as well.

protected override void ChooseWinners(IGenotype First, 
         IGenotype Second, out IGenotype Winner, out IGenotype Loser)
{
    CardGenes f = First as CardGenes,
        s = Second as CardGenes;
    double fFitness = Evaluate(f),
        sFitness = Evaluate(s);
    string result = null;
    if ( fFitness < sFitness)
    {
        Winner = f;
        Loser = s;
    }
    else
    {
        Winner = s;
        Loser = f;
    }
}

The Evaluate method is pretty straightforward. Remember how I talked about the Evaluate method earlier? Well, in this case, Evaluate would indicate whether a current genotype provides a solution where the error is zero (i.e., the sum of A is exactly X - e.g., 36, and the product of B is exactly Y - e.g., 360). Note that Evaluate delegates to a private method (see below):

protected override bool Evaluate(IGenotype Genes)
{
    CardGenes g = Genes as CardGenes;
    bool returnVal = Evaluate(g) == 0.0;
    return returnVal;
}

Here's the CreateGenotype method. Not much to it.

protected override IGenotype CreateGenotype()
{
    return new CardGenes(this.m_NumberOfCards);
}

Now, let's look at that private Evaluate method. It should be pretty familiar. It's just Sacha's evaluate code pretty much word for word, just doctored enough to fit the framework:

private double Evaluate(CardGenes f)
{
    double sumErr, prodErr, combinedErr;
    int sum = 0, prod = 1;

    for (int i = 0; i < f.Genes.Count; i++)
    {
        Gene gene = f.Genes[i];
        if (gene.Value == EGene.On)
        {
            // On means sum
            sum += (i + 1);
        }
        else
        {
            // Off means product
            prod *= (i + 1);
        }
    }
    sumErr = ((double)(sum - m_SumTarget)) / (double)m_SumTarget;
    prodErr = ((double)(prod - m_ProdTarget)) / (double)m_ProdTarget;
    combinedErr = Math.Abs(sumErr) + Math.Abs(prodErr);
    f.Sum = sum;
    f.Product = prod;
    return combinedErr;
}

So that's pretty much it. There's just the CardGenes class, which implements IGenotype. Check the source code out for details. There's also the UI...

UI

The UI in the sample code simply provides an easy way of passing in the necessary parameters to run tournaments, so you can play with different combinations, and have a way to view the results. Set the Sum Target, Product Target, and Number of Cards. These parameters are specific to the card problem. Set the population size to create a population of a specific size. Set the Chance of Recombination and Chance of Mutation each to a value between 0 and 1. Finally, set the Number of Iterations to specify how long the tournament should be. Once you've set all the values, click the "Go!" button. After a while, it will become enabled again, signaling the completion of the Tournament. Depending on the tournament results, the "View Results" button will be enabled. Here's the main form:

Form1

The main form keeps track of the overall winner of all tournaments, similar to the way the GeneticAlgorithmBase keeps track of the overall winner in a tournament. After each tournament, the main form compares the winner of the tournament with the overall winner. If the tournament winner is "better" than the overall winner, it becomes the new overall winner, and the View Results button is enabled. You can then click the View Results and see the results, which look like this:

Results

This is just a property browser in a form. The property browser is displaying the current overall winning IGenotype. You can keep the old Results forms open so you can compare the differences between winners and verify that you are approaching a solution. You can also look at the specific genes in the IGenotype by clicking the ellipses button "..." next to the "Genes" collection. Doing so displays the Gene collection editor.

Results

That's it for the UI.

Questions:

I look forward to your feedback. In particular, I'm really interested in:

How valuable you think GeneticAlgorithmBase is. Do you think it's useful, or should it just be collapsed into BinaryAlgorithmBase? Why?
How's my generalization? Do you think I applied some good OOP strategies to facilitate reuse? Anything I should work on?
Any other comments you might have ;)

Known Issues

Here are some things that I've noticed in this approach:

The code only deals with the GA that Sacha described. It's my understanding that there are others. I'm not familiar with those, so they are not included here.
GeneticAlgorithmBase works with collections of IGenotypes. There could be a problem when there is more than one winner and/or more than one loser. Specifically, with such "PolyGenenticAlgorithm" classes, recombining the winners with the losers might cause problems because it's not clear which winner should be recombined with which loser, or whether each winner should be recombined with each loser. Since recombination is left to the implementor, it would be up to the developer to determine somehow an appropriate way to recombine. For instance, in a TernaryGeneticAlgorithm, there would be either one winner and two losers, or one loser and two winners, and in both cases, the question is how do the winners affect the losers? The developer would need to answer such questions.
As Sacha mentioned, there's the possibility that multiple winners recombined with losers could result in a population that lacks sufficient diversity to find a solution.
This has more to do with design, but I think using IComparable to compare IGenotypes might be useful in simplifying things. I'll look into this more.