Introduction
Getting Started
Genetic Algorithms (GAs) are the nearest thing a software developer can get to magic. Watching a solution to a problem evolve, is awesome.
This section is a very simple description of the techniques used when implementing Genetic Algorithm and is intended as a very simple introduction for those not familiar with the science. Later in the article, I will create GAs in C# using the Genetic Algorithm Framework for .NET (GAF). The GAF is a freely available GA framework that makes it easy to create and modify GAs. It was written from first principles in 2013 in order to fully understand the workings of GA systems. The GAF was part of a project run in conjunction with the De Montfort University, Leicester, UK. The GAF brings all of that GA science and functionality to your project in an extremely easy to use form. However, I will leave details of the GAF for later. For now, we will look at a simple example, in order to understand some of the concepts.
A Simple Example
If you find yourself saying, "I have no idea how to solve this, but I will recognize a good solution when I see one", then a GA could be the answer. Being able to identify a good solution is paramount to making them work.
Let's have a look at an example. GAs have been known to improve the design of aeroplane wings. Aeroplane wings have a lot of different parameters, leading edge shape, width length and another thousand or so that make little sense to me. The great thing is, that with each design of Aeroplane wing we can, within a few milliseconds, model it in software and give it a score that represents how good it is. We can leave all the wind tunnel stuff until much later in the design process. The point here is that we, with the help of modeling software, will recognize a good aeroplane wing when we see one.
So, for our example, we can use a GA to create wings and software to measure how good they are.
How the GA Works
The genetic algorithm is an evolutionary approach to computing, inspired by Darwin’s theory of evolution and biological reproduction, that has the power to determine approximate solutions to optimization problems. Evolutionary computation has its roots in the 1960s. However, the genetic algorithm that is used as the basis for research today stems from the work of John Holland in the 1980s.
Holland believed that if the features of natural evolution could be incorporated into a computer algorithm, a new way to solve difficult problems could be created. Holland’s work involved manipulating strings of ’1’s and ’0’s which he called Chromosomes.
The basic process adopted by GAs typically involves creating an initial set of random solutions (population) and evaluating them. Following a process of selection, the better solutions are identified (parents) and used to generate new solutions (children). These can be used to replace lesser members of the population. This new population (generation), is re-evaluated and the process continues, reproducing new generations until either a final solution is determined or some other criterion is reached.
GAs borrow their terms from the biological world. For example, GAs use a representation for potential solutions referred to as a chromosome and the operators that are used to create new child solutions such as crossover and mutation are derived from nature.
In their original and most basic form, GAs were used mainly for single objective search and optimization algorithms. Common to most GAs is the use of a chromosome, genetic operators, a selection mechanism and an evaluation mechanism.
In our case as Aeroplane wing designers, all we have to do is determine the parameters we need in order to build a wing (for this we ask an expert) and bundle them into a binary string, this will be the definition of our chromosome. Therefore, each chromosome will fully describe an aeroplane wing. The GA will do the rest.
Given that we have a definition of a chromosome, and that this describes an aeroplane wing, we can make aeroplane wings until the cows come home. We can, for example, create lots of random chromosomes in the hope that one of them make a useful wing. I know it sounds daft, the truth is this is kind of what we do.
A common approach when using GAs is to start by making a 'population' of random chromosomes (wings) perhaps a 100. You may remember earlier that we said we can test each one and score it, or to use the correct terminology, 'evaluate its fitness'. Whilst it is very unlikely that any of our randomly created wings will actually be good enough to put on an aeroplane, some will be marginally better than others. Here is where the magic happens.
The Magic
Of our population of 100 chromosomes (aeroplane wings), we select some 'parents' and use them to make 'children'. We then test these and add these to our population. Finally, we remove all the worst chromosomes leaving our population back at 100. To be honest, there are numerous ways to do what has just been described, but for now, we will keep it simple. The upshot is that our new 'generation' of aeroplane wings has become stronger. Do this a few hundred times and we end up with some pretty good aeroplane wings.
Making Children
There are many techniques for selecting parents and using them to make children. However, for the purposes of this description, we will assume we have a mechanism for selecting two suitable chromosomes as parents. We then use these parents to make two new chromosomes (children). Remember that each chromosome, in this example, defines an aeroplane wing. We are making child wings from parent wings.
As we have already stated, each wing is defined by a chromosome. A chromosome can take many forms, however, often a binary string is used. Creating a binary string chromosome can be as simple as converting all of the aeroplane wing parameters into binary numbers and joining them all together. One simple way to create two children from two parents is to pick a random position within the binary string and 'crossover' the chromosomes. This gives two new chromosomes, see Fig. 1.
With a population of 100, we would do this 50 times and perhaps keep the best 100 chromosomes, the rest we delete or 'kill'. We do not care if the chromosomes to be deleted are the new children, this is quite normal as many new children will be worse than their parents. We simply keep the best chromosomes, whoever they are. It can be a cruel GA world.
If the ability to create an aeroplane wing from simply carving up a few binary strings seems far fetched, then you are in for a treat. As I said before, GAs are the nearest thing to magic that a software developer can get.
If all of this is interesting, stay tuned. In the next section, I will start coding a simple GA in C# using the GAF. The GAF makes this a simple task. In addition, the structure of the GAF and the way in which it is used, helps in understanding the concepts further.
If you want to dive straight in, the GAF is available as a NuGet package.
PM> Install-Package GAF
Solving the Binary F6 Function
In order to create a simple GA and test it, we need a problem to solve. If we are testing the GA, we really need a problem that we already know the answer to. In line with many researchers, the problem we will solve with this example is the Binary F6 function. Details of the Binary F6 function can be found all over the internet, however, for our purposes, all we need to understand is that this is a difficult problem to solve using traditional approaches.
A 3D plot of the Binary F6 function is shown here. It can be seen that there are many peaks and troughs within the overall plot. The mission for this example is to identify the values for X and Y that produces the minimum value for f(x,y). If all of this X,Y,Z stuff doesn't make sense, don't worry it is not that important really.
I can tell you that a value of 0 for X and 0 for Y gives the minimum value for Z. Therefore, this is a problem that we know the answer to, this will allow us to test our GA.
If you are still puzzled as to why this is hard to solve and you are thinking, that you could just test each value for X and Y between the limits of -100 and +100 (as used in this example), i.e., use a 'brute force' approach, consider a situation where the range of X and Y lies somewhere between 0 and 23 billion. The non-linear shape of the curve (i.e. its 'spikeyness') means that other non-GA approaches struggle to identify the minimum and tend to get stuck in one of the other troughs (minima). Trust me, Binary F6 was made for GAs.
In order to test the GA, I will be using three genetic operators. Elitism, Crossover and Mutation. It is worth noting that many systems often combine crossover and mutation into a single operator. The GAF treats them as two completely independent operators. In the next section, I will describe these operators in more detail.
When developing GAs, the fitness function is probably the single most important thing to get right. However, for now at least, we will use the Binary F6 function as it helps us concentrate on developing the algorithm.
To get started, use the NuGet package manager to add the GAF to a project. In this example, I used a Console Project.
PM> Install-Package GAF
The code below shows a Console Application that configures a GA to solve the Binary F6 function.
In the example, two delegates are used. The first EvaluateFitness
is passed into the constructor of the GeneticAlgorithm
class. The second, TerminateAlgorithm
is passed as an argument to the Run
method. These delegate methods will be called when needed by the GAF.
In order to show the evolving solution, the example below subscribes to the GenerationComplete
event. The code in this function displays X, Y and fitness from the best chromosome in each generation.
The results of the run show the final values for X and Y to be -0.00417 and -0.00069 respectively the final fitness value is 0.999982095862215.
The values for X and Y are very close to zero. Near enough is good enough in GA world.
using System;
using GAF;
using GAF.Operators;
namespace BinaryF6
{
internal class Program
{
private static void Main(string[] args)
{
const double crossoverProbability = 0.65;
const double mutationProbability = 0.08;
const int elitismPercentage = 5;
var population = new Population(100, 44, false, false);
var elite = new Elite(elitismPercentage);
var crossover = new Crossover(crossoverProbability, true)
{
CrossoverType = CrossoverType.SinglePoint
};
var mutation = new BinaryMutate(mutationProbability, true);
var ga = new GeneticAlgorithm(population, EvaluateFitness);
ga.OnGenerationComplete += ga_OnGenerationComplete;
ga.Operators.Add(elite);
ga.Operators.Add(crossover);
ga.Operators.Add(mutation);
ga.Run(TerminateAlgorithm);
}
}
}
The fitness function can be called anything but should have this signature. This function simply decodes the chromosome, calculates the binary F6 and sees how good the respective chromosome is before returning its 'fitness'.
public static double EvaluateFitness(Chromosome chromosome)
{
double fitnessValue = -1;
if (chromosome != null)
{
var rangeConst = 200 / (System.Math.Pow(2, chromosome.Count / 2) - 1);
var x1 = Convert.ToInt32(chromosome.ToBinaryString(0, chromosome.Count / 2), 2);
var y1 = Convert.ToInt32(chromosome.ToBinaryString
(chromosome.Count / 2, chromosome.Count / 2), 2);
var x = (x1 * rangeConst) - 100;
var y = (y1 * rangeConst) - 100;
var temp1 = System.Math.Sin(System.Math.Sqrt(x * x + y * y));
var temp2 = 1 + 0.001 * (x * x + y * y);
var result = 0.5 + (temp1 * temp1 - 0.5) / (temp2 * temp2);
fitnessValue = 1 - result;
}
else
{
throw new ArgumentNullException("chromosome",
"The specified Chromosome is null.");
}
return fitnessValue;
}
This terminate function simply ends the GA run after 1000 generations. Any criteria could be used, e.g., a fitness value combined with maximum generations.
public static bool TerminateAlgorithm(Population population,
int currentGeneration, long currentEvaluation)
{
return currentGeneration > 1000;
}
Hooking up to the Generation Complete event will provide a means to interrogate the population. All properties and methods of the Population
are thread safe and can be checked in any of the events.
private static void ga_OnGenerationComplete(object sender, GaEventArgs e)
{
var chromosome = e.Population.GetTop(1)[0];
var x1 = Convert.ToInt32(chromosome.ToBinaryString(0, chromosome.Count/2), 2);
var y1 = Convert.ToInt32(chromosome.ToBinaryString(chromosome.Count/2, chromosome.Count/2), 2);
var rangeConst = 200 / (System.Math.Pow(2, chromosome.Count / 2) - 1);
var x = (x1 * rangeConst) - 100;
var y = (y1*rangeConst) - 100;
Console.WriteLine("x:{0} y:{1} Fitness{2}", x, y, e.Population.MaximumFitness);
}
}
}
Genetic Operators
In the previous section's example, three Genetic Operators were used. In this section, I will look more closely at each of these.
In the previous section, I created an initial population of 100 chromosomes (solutions), this represented the first generation of chromosomes. These were then subjected to three genetic operators in turn, once complete, we were left with a new generation. This process was repeated for many generations until an answer to the problem emerged. The order in which each operator is added to the process pipeline determines the order in which each operator is applied. The order was Elite, Crossover, Binary Mutate.
The initial population was created with the following line of code:
var population = new Population(PopulationSize, ChromosomeLength, false, false);
As previously mentioned, the population size was 100 and the chromosome length (number of binary digits or 'Genes') was 44. The remaining parameters (set to false
in the example) can be ignored for now. They relate to noisy environments and Linear Normalization.
The Elite Operator
Elitism allows the best solutions to pass through to the next generation without being modified. The Binary F6 example used a value of 5 percent, which, with a population of 100, means that the top 5 chromosomes, based on fitness, form part of the next generation. This means that any future generation is at least as good as the current generation. Elitism helps protect good individuals. However, it can be shown that as the percentage of elites is increased, the number of duplicates within the population increases, thereby reducing the performance of the GA. This is normal behaviour for an approach that does not explicitly handle duplicates, and is simply due to the convergence of the algorithm. If you want to experiment with this value, I would recommend a value somewhere between 5 and 10 percent as a starting point.
To create an Elite operator and add it to the GA, the following code can be used:
var elite = new Elite(elitismPercentage);
ga.Operators.Add(elite );
The Elite operator takes a single parameter in its constructor, the percentage. Chromosomes identified as elite are marked as such. This means that operators later in the pipeline such as Mutate or any that you create yourself, know not to modify these chromosomes. The IsElite
property returns the elite status of a chromosome.
bool eliteStatus = myChromosome.IsElite;
The Crossover Operator
The crossover operator within the GAF can handle single or double point crossover. The example in the previous post crossed parent chromosomes at a single point. In many cases, selecting two points and swapping the center section between parents is more appropriate. Single or double point can be selected via a property as shown below.
var crossover = new Crossover(crossoverProbability, true)
{
CrossoverType = CrossoverType.SinglePoint;
ga.Operators.Add(crossover);
}
The crossover probability is simply a value between 0 and 1 that represents the probability of parents crossing over to produce children. A value of 1 means that all selected parents will crossover and create children. A value of 0.85 means that 85% will be crossed over and the remainder of parents will be pass through untouched to the new generation. The second parameter relates to duplicate handling and should be set to 'true' for now. When experimenting with crossover probability, a good starting point would be somewhere between 0.65 and 1.
The Binary Mutate Operator
This operator scans each binary digit of each chromosome and, based on a probability factor, will toggle the bit (from 1 to 0 or from 0 to 1). Typically the mutation probability is very low, for example 0.02, which means that mutation only occurs occasionally. The mutation operator is used to ensure there is some diversity in the population. Diversity will be covered later.
var mutation = new BinaryMutate(mutationProbability, true);
ga.Operators.Add(mutation);
The constructor of this operator accepts two parameters, the first is the mutation probability, the second relates to duplicate handling and should be set to 'true' for now.
The Binary Mutate operator will not mutate any chromosomes marked as 'elite'. This ensures that the best solutions from the previous generation remain, unmodified, in the following generation.
Other Operators
The GAF contains other operators to those described above. These will be discussed in future articles.
Diversity
In order that a GA can find suitable solutions to the Binary F6 function it has to search amongst all the possible solutions. In order to do this, some level of diversity amongst the population is required. Imagine back in during the times of the Cave Man, those who were strong and could hunt, feed their family and grow strong. Imagine if suddenly all the animals were rounded up and Safeway built a supermarket. Who would be the strongest then, probably someone who could trade. The thing is that diversity in the population allows the population to adapt should the environment change.
In our example, diversity was primarily maintained through the use of the Binary Mutate operator. By swapping a binary digit (gene) of the chromosome every now and then, the operator has the potential to change the solutions X and Y values significantly. This allows completely different values to be tested. If these turn out to be good, the chromosome will be selected as a parent and more similar solutions will be created for the next generation. Every time a mutation happens, a new part of the search space is searched.
Maintaining a level of diversity is very important for a GAs performance. However, the level of diversity has to be balanced with the need to find suitable solutions through convergence.
Mutation is not the only way to maintain diversity. In a future article, I will discuss other replacement and complimentary options.
The Traveling Salesman Problem
The Traveling Salesman problem is a well used problem in AI circles. Given a list of cities, a salesman has to identify the most efficient route that visits every city once and only once. Any city can be the starting point.
It is possible to solve this problem using 'brute force', that is, calculating every combination of routes. However, with 16 cities, we would have to test 20,922,789,888,000 possible routes. With 20 cities, the number of routes increases to 2,432,902,008,176,640,000. A brute force approach is simply impractical.
To solve this with a GA is reasonably straight forward and two examples are shown here; the first uses a chromosome that stores an index to a list of cities. The second example, presented later shows how to leverage the object based Genes of the GAF and stores the city within the gene rather than a separate list. For each of these examples, I simply created a console application and added the Genetic Algorithm Framework (GAF) using the NuGet command:
PM> Install-Package GAF
Example 1
With this example, I have created a list of 16 cites. Each city can be identified by an integer within the range 0-15. The approach taken was to create and store a list of 16 cities in a member variable. The chromosome would store a series of numbers in the range 0 to 15. The chromosome would represent a potential route. However, the chromosome is a special case as it needs to contain each city only once. Therefore, it is not possible just to create random chromosomes (routes) as was the case with previous example, therefore it is necessary to create the population manually. This was done in the Main
function in this example.
As mentioned earlier, the chromosome must contain every city. Therefore, it is not possible to use the traditional crossover method. In most cases, a traditional single or double point crossover would corrupt the route leaving duplicate cities and others missing. The GAF supports a form of crossover called 'Double Point Ordered'. This type of crossover creates a child from a single parent. Two parents are selected and two random points along the chromosome are selected. The genes between the points are passed to the child. The remaining genes are transferred from the same parent, but in the order that they appear in the second parent. The result is that the child contains all of the values from a single parent but includes ordering, and therefore traits, from both parents.
Similarly, the Binary Mutate operator used in previous examples is not suitable in this example as we have to maintain the complete set of values within the chromosome. Therefore, the 'Swap Mutate' operator has been used. This operator simply takes two genes at random and swaps their position within the chromosome.
In addition to the Main
function, I created a method to populate the member variable _cities
and implemented the two events to report progress.
For the fitness delegate, the CalculateFitness
function has been created. This function simply calls CalculateDistance
in order to calculate the total distance between each city in the order specified by the chromosome. The return value of this function needs to be between 0
and 1
with the better fitness (shorter route) being the higher number. The approach taken here is a little crude but it works well.
The terminate delegate function simply stops the GA when 400 generations have taken place.
Results
Running the GA gave the following results. Most of the work has been done after 154 generations although leaving the GA running showed a very slight improvement in route distance after 368 generations. The shortest distance discovered by the GA was approximately 1629 miles. It is worth noting that this was the result from the first run, no optimization of the GA was carried out. Adjusting the parent selection method or Crossover/Mutation probability could improve the performance of the GA. I will leave this to you to experiment with.
Generation: 10, Fitness: 0.751764339552472, Distance: 2482.35660447528
Generation: 11, Fitness: 0.751764339552472, Distance: 2482.35660447528
Generation: 12, Fitness: 0.776179966270441, Distance: 2238.20033729559
Generation: 16, Fitness: 0.778498506612512, Distance: 2215.01493387488
Generation: 20, Fitness: 0.778792498460299, Distance: 2212.07501539701
Generation: 21, Fitness: 0.779983453246518, Distance: 2200.16546753482
Generation: 22, Fitness: 0.791472961088518, Distance: 2085.27038911482
Generation: 26, Fitness: 0.806022227954872, Distance: 1939.77772045128
Generation: 27, Fitness: 0.806721914946973, Distance: 1932.78085053027
Generation: 28, Fitness: 0.80946570810232, Distance: 1905.3429189768
Generation: 36, Fitness: 0.81391858503094, Distance: 1860.8141496906
Generation: 39, Fitness: 0.820262019460363, Distance: 1797.37980539637
Generation: 46, Fitness: 0.825307565772963, Distance: 1746.92434227037
Generation: 85, Fitness: 0.825532111167778, Distance: 1744.67888832222
Generation: 93, Fitness: 0.832642679433738, Distance: 1673.57320566262
Generation: 154, Fitness: 0.836884868024645, Distance: 1631.15131975355
Generation: 368, Fitness: 0.83710941341946, Distance: 1628.9058658054
The route selected by the GA was as follows:
Canterbury London Bristol Cardiff Exeter Falmouth Swansea Birmingham Liverpool
Manchester Leeds Hull Newcastle Carlisle Glasgow Edinburgh
using System;
using System.Collections.Generic;
using System.Linq;
using GAF;
using GAF.Extensions;
using GAF.Operators;
namespace TravellingSalesman
{
internal class Program
{
private static List<city> _cities;
private static void Main(string[] args)
{
_cities = CreateCities().ToList();
var population = new Population();
for (var p = 0; p < 100; p++)
{
var chromosome = new Chromosome();
for (var g = 0; g < 16; g++)
{
chromosome.Genes.Add(new Gene(g));
}
chromosome.Genes.ShuffleFast();
population.Solutions.Add(chromosome);
}
var elite = new Elite(5);
var crossover = new Crossover(0.8)
{
CrossoverType = CrossoverType.DoublePointOrdered
};
var mutate = new SwapMutate(0.02);
var ga = new GeneticAlgorithm(population, CalculateFitness);
ga.OnGenerationComplete += ga_OnGenerationComplete;
ga.OnRunComplete += ga_OnRunComplete;
ga.Operators.Add(elite);
ga.Operators.Add(crossover);
ga.Operators.Add(mutate);
ga.Run(Terminate);
}
static void ga_OnRunComplete(object sender, GaEventArgs e)
{
var fittest = e.Population.GetTop(1)[0];
foreach (var gene in fittest.Genes)
{
Console.WriteLine(_cities[(int)gene.RealValue].Name);
}
}
private static void ga_OnGenerationComplete(object sender, GaEventArgs e)
{
var fittest = e.Population.GetTop(1)[0];
var distanceToTravel = CalculateDistance(fittest);
Console.WriteLine("Generation: {0}, Fitness: {1},
Distance: {2}", e.Generation, fittest.Fitness, distanceToTravel);
}
private static IEnumerable<city> CreateCities()
{
var cities = new List<city>
{
new City("Birmingham", 52.486125, -1.890507),
new City("Bristol", 51.460852, -2.588139),
new City("London", 51.512161, -0.116215),
new City("Leeds", 53.803895, -1.549931),
new City("Manchester", 53.478239, -2.258549),
new City("Liverpool", 53.409532, -3.000126),
new City("Hull", 53.751959, -0.335941),
new City("Newcastle", 54.980766, -1.615849),
new City("Carlisle", 54.892406, -2.923222),
new City("Edinburgh", 55.958426, -3.186893),
new City("Glasgow", 55.862982, -4.263554),
new City("Cardiff", 51.488224, -3.186893),
new City("Swansea", 51.624837, -3.94495),
new City("Exeter", 50.726024, -3.543949),
new City("Falmouth", 50.152266, -5.065556),
new City("Canterbury", 51.289406, 1.075802)
};
return cities;
}
public static double CalculateFitness(Chromosome chromosome)
{
var distanceToTravel = CalculateDistance(chromosome);
return 1 - distanceToTravel / 10000;
}
private static double CalculateDistance(Chromosome chromosome)
{
var distanceToTravel = 0.0;
City previousCity = null;
foreach (var gene in chromosome.Genes)
{
var currentCity = _cities[(int) gene.RealValue];
if (previousCity != null)
{
var distance = previousCity.GetDistanceFromPosition(currentCity.Latitude,
currentCity.Longitude);
distanceToTravel += distance;
}
previousCity = currentCity;
}
return distanceToTravel;
}
public static bool Terminate(Population population,
int currentGeneration, long currentEvaluation)
{
return currentGeneration > 400;
}
}
}
using System;
namespace TravellingSalesman
{
public class City
{
public City(string name, double latitude, double longitude)
{
Name = name;
Latitude = latitude;
Longitude = longitude;
}
public string Name { set; get; }
public double Latitude { get; set; }
public double Longitude { get; set; }
public double GetDistanceFromPosition(double latitude, double longitude)
{
var R = 6371;
var dLat = DegreesToRadians(latitude - Latitude);
var dLon = DegreesToRadians(longitude - Longitude);
var a =
Math.Sin(dLat / 2) * Math.Sin(dLat / 2) +
Math.Cos(DegreesToRadians(Latitude)) * Math.Cos(DegreesToRadians(latitude)) *
Math.Sin(dLon / 2) * Math.Sin(dLon / 2)
;
var c = 2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a));
var d = R * c;
return d;
}
private static double DegreesToRadians(double deg)
{
return deg * (System.Math.PI / 180);
}
public byte[] ToBinaryString()
{
var result = new byte[6];
return result;
}
}
}
Example 2
The previous example relies on each gene being an integer in order to determine elements in an externally defined list of cities. However, the GAF supports other gene types such as real numbers and objects. Therefore, instead of storing an integer to represent an element of an array as in the TSP, the gene can now store the city object itself. In fact, each gene could store a list of objects allowing multidimensional chromosomes if so desired. The gene's type is given by the Gene.GeneType
property. This property returns an enumerated value of one of the following types, Binary
, Integer
, Real
or Object
.
To demonstrate this, I have recorded the Traveling Salesman problem to use an object based Gene. The code is shown below.
A couple of things to note is that the City
object has been modified in order to make it serializable with the [serializable]
attribute. This allows the framework to clone the gene as appropriate. Secondly, the Equals
method has been overloaded. This is simply to ensure that the Swap Mutate operator functions correctly. If you are not using Swap Mutate, then this step could be ignored. For completeness, I have implemented GetHashCode
also.
using System;
using System.Collections.Generic;
using System.Linq;
using GAF;
using GAF.Extensions;
using GAF.Operators;
namespace ObjectBasedGene
{
internal class Program
{
private static void Main(string[] args)
{
var cities = CreateCities().ToList();
var population = new Population();
for (var p = 0; p < 100; p++)
{
var chromosome = new Chromosome();
foreach (var city in cities)
{
chromosome.Genes.Add(new Gene(city));
}
chromosome.Genes.ShuffleFast();
population.Solutions.Add(chromosome);
}
var elite = new Elite(5);
var crossover = new Crossover(0.8)
{
CrossoverType = CrossoverType.DoublePointOrdered
};
var mutate = new SwapMutate(0.02);
var ga = new GeneticAlgorithm(population, CalculateFitness);
ga.OnGenerationComplete += ga_OnGenerationComplete;
ga.OnRunComplete += ga_OnRunComplete;
ga.Operators.Add(elite);
ga.Operators.Add(crossover);
ga.Operators.Add(mutate);
ga.Run(Terminate);
}
static void ga_OnRunComplete(object sender, GaEventArgs e)
{
var fittest = e.Population.GetTop(1)[0];
foreach (var gene in fittest.Genes)
{
Console.WriteLine(((City)gene.ObjectValue).Name);
}
}
private static void ga_OnGenerationComplete(object sender, GaEventArgs e)
{
var fittest = e.Population.GetTop(1)[0];
var distanceToTravel = CalculateDistance(fittest);
Console.WriteLine("Generation: {0}, Fitness: {1},
Distance: {2}", e.Generation, fittest.Fitness, distanceToTravel);
}
private static IEnumerable<City> CreateCities()
{
var cities = new List<City>
{
new City("Birmingham", 52.486125, -1.890507),
new City("Bristol", 51.460852, -2.588139),
new City("London", 51.512161, -0.116215),
new City("Leeds", 53.803895, -1.549931),
new City("Manchester", 53.478239, -2.258549),
new City("Liverpool", 53.409532, -3.000126),
new City("Hull", 53.751959, -0.335941),
new City("Newcastle", 54.980766, -1.615849),
new City("Carlisle", 54.892406, -2.923222),
new City("Edinburgh", 55.958426, -3.186893),
new City("Glasgow", 55.862982, -4.263554),
new City("Cardiff", 51.488224, -3.186893),
new City("Swansea", 51.624837, -3.94495),
new City("Exeter", 50.726024, -3.543949),
new City("Falmouth", 50.152266, -5.065556),
new City("Canterbury", 51.289406, 1.075802)
};
return cities;
}
public static double CalculateFitness(Chromosome chromosome)
{
var distanceToTravel = CalculateDistance(chromosome);
return 1 - distanceToTravel / 10000;
}
private static double CalculateDistance(Chromosome chromosome)
{
var distanceToTravel = 0.0;
City previousCity = null;
foreach (var gene in chromosome.Genes)
{
var currentCity = (City)gene.ObjectValue;
if (previousCity != null)
{
var distance = previousCity.GetDistanceFromPosition(currentCity.Latitude,
currentCity.Longitude);
distanceToTravel += distance;
}
previousCity = currentCity;
}
return distanceToTravel;
}
public static bool Terminate(Population population,
int currentGeneration, long currentEvaluation)
{
return currentGeneration > 400;
}
}
}
using System;
namespace ObjectBasedGene
{
[Serializable]
public class City
{
public City(string name, double latitude, double longitude)
{
Name = name;
Latitude = latitude;
Longitude = longitude;
}
public string Name { set; get; }
public double Latitude { get; set; }
public double Longitude { get; set; }
public double GetDistanceFromPosition(double latitude, double longitude)
{
var R = 6371;
var dLat = DegreesToRadians(latitude - Latitude);
var dLon = DegreesToRadians(longitude - Longitude);
var a = Math.Sin(dLat / 2) * Math.Sin(dLat / 2) +
Math.Cos(DegreesToRadians(Latitude)) *
Math.Cos(DegreesToRadians(latitude)) *
Math.Sin(dLon / 2) *
Math.Sin(dLon / 2) ;
var c = 2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a));
var d = R * c;
return d;
}
private static double DegreesToRadians(double deg)
{
return deg * (System.Math.PI / 180);
}
public byte[] ToBinaryString()
{
var result = new byte[6];
return result;
}
public override bool Equals(object obj)
{
var item = obj as City;
return Equals(item);
}
protected bool Equals(City other)
{
return string.Equals(Name, other.Name) &&
Latitude.Equals(other.Latitude) &&
Longitude.Equals(other.Longitude);
}
public override int GetHashCode()
{
unchecked
{
int hashCode = (Name != null ? Name.GetHashCode() : 0);
hashCode = (hashCode * 397) ^ Latitude.GetHashCode();
hashCode = (hashCode * 397) ^ Longitude.GetHashCode();
return hashCode;
}
}
}
}
Additional Options
Linear Normalisation
Linear normalisation, as implemented within the GAF, simply orders the solutions by decreasing evaluation, and creates a fitness based on this ordering. For example for a population of 100, the fittest would be 1.0 and each subsequent solution would decrease linearly in value down to the worst solution with a fitness of 0. The starting point and decrement parameters of this approach are fixed within the GAF.
To add linear normalisation, simply change the useLinearlyNormalisedFitness
parameter to true
in the constructor of the Population
object.
Linearly normalising fitness in this way can help maintain natural selection. Natural selection occurs when the fittest individuals are selected, as parents for the next generation, proportionally to their fitness. However, in some problem domains, where the difference between the fitness value of the fittest and others within the solution is very small, each has a similar probability of being selected as a parent. By normalising the fitness of the population between a minimum and maximum value, natural selection can be improved.
Duplicate Handling
In the examples above, we did not worry about duplicate chromosomes in our solution. This approach will be fine in most cases, however, if you wish to ensure that all chromosomes are unique in the population, all that is required is to set the allowDuplicates
constructor parameter to false
for the appropriate genetic operators.
Noisy Environments
The Binary F6 evaluation executed as part of the first example is not noisy, that is, the fitness value would not change if the solution were to be re-evaluated with the same values for X and Y. In situations such as these, the evaluation process can be optimized not to re-evaluate each solution at each generation. The GAF handles this type of optimization internally.
When using this GA with noisy evaluations, for example where a re-evaluation of a solution would return a different result from the previous evaluation, the optimization described above, cannot be applied. In order to force a re-evaluation of each solution at each generation, the reEvalutaeAll
parameter in the constructor of the Population
object should be set to true
. Naturally, this could reduce the performance of the GA.
Creating a Custom Genetic Operator
The GAF has several built in genetic operators, however, from time to time, it is useful to be able to define different ones. There are two methods for producing a custom Genetic Operator for use with the GAF. The first involves creating a C# class that implements the GAF.IGeneticOperator
interface. The second method is to derive a new Genetic Operator from an existing one. This section will take a brief look at the first method before creating an 'Auto-Mutate' Genetic Operator using the second method.
Implementing IGeneticOperator
This is probably the simplest and most flexible way to create a Genetic Operator. The class SimpleOperator
in the code shown below shows the simplest example.
using System;
using GAF;
namespace Operators
{
public class SimpleOperator : IGeneticOperator
{
public void Invoke(Population currentPopulation,
ref Population newPopulation, FitnessFunction fitnesFunctionDelegate)
{
throw new NotImplementedException();
}
public int GetOperatorInvokedEvaluations()
{
throw new NotImplementedException();
}
public bool Enabled { get; set; }
}
}
Once the operator has been created, it can be added to the Operators
collection in the same way as the other built-in operators, e.g.:
ga.Operators.Add(simpleOperator)
The Invoke
method will be called by the GAF and will present the current generation's population (param: population) along with the next generation's population (param: newPopulation
). Each operator is responsible for either taking some solutions from the current population and transferring them to the new population or by modifying solutions already in the new population. The way in which this is done is left to the implementer of the operator. For example, a Crossover Operator could take two solutions from the current population, performs a single or double point crossover, and places these 'children' into the new population.
The GetOperatorInvokedEvaluations
method is used during by the GAF to determine the number of evaluations an operator invokes. Therefore, the method should return the number of evaluations that the operator undertakes for each invocation. For example, the built-in Crossover operator evaluates the population to determine which individuals to select. However, the built-in Mutation operator does not perform any evaluations. If an operator performed a single evaluation of each member of a population of say 100 individuals, the GetOperatorInvokedEvaluation
method would return 100
. For an operator that does not perform any evaluations, the GetOperatorInvokedEvaluation
method would return 0
.
Things to Bear in Mind
- Ensure that the new population is not larger than the current population.
- Bear in mind that the new population may already contain solutions depending on what operators have already been invoked.
- The new population may already have a full compliment of solutions, this does not prevent them being changed or modified.
- Elites passed into the new population by the Elite Operator will be marked with the
IsElite
property set to true
. These should not normally be interfered with by subsequent operators.
Creating an Auto-Mutation Operator
Watson in his paper An Investigation into Cooperative Behaviour: Altrurism and Evolutionary Computing (2003) discussed an approach that used a non-phenotype gene within the chromosome that could be used to increase the mutation probability. This additional gene was treated to the same genetic operations as the other genes, however, it was not used during chromosome decoding process. The gene was simply used to modify the mutation probability.
The class AutoMutate
in the code below shows an Auto-Mutate operator that has been built by deriving from the existing BinaryMutate
operator of the GAF. This process makes it very simple to create an operator that could increase the mutation probability depending upon the non-phenotype gene’s value.
using System.Linq;
using GAF;
using GAF.Operators;
namespace Operators
{
public class AutoMutate : BinaryMutate
{
private AutoMutateFactor _autoMutationFactorS;
private readonly object _syncLock = new object();
private int _geneCount;
public AutoMutate(double mutationProbability, bool allowDuplicates)
: base(mutationProbability, allowDuplicates)
{
}
public override void Invoke(Population currentPopulation, ref Population newPopulation,
FitnessFunction fitnesFunctionDelegate)
{
_geneCount = newPopulation.ChromosomeLength;
base.Invoke(currentPopulation, ref newPopulation, fitnesFunctionDelegate);
}
protected override void Mutate(Chromosome chromosome, double mutationProbability)
{
var newMutationProbability = 0.0;
var nonPhenotypeGene = chromosome.Genes.ElementAt(_geneCount - 1);
if (nonPhenotypeGene.BinaryValue == 1)
{
newMutationProbability = mutationProbability * (int)AutoMutationFactor;
}
else
{
newMutationProbability = mutationProbability;
}
base.Mutate(chromosome, newMutationProbability);
}
public AutoMutateFactor AutoMutationFactor
{
get
{
lock (_syncLock)
{
return _autoMutationFactorS;
}
}
set
{
lock (_syncLock)
{
_autoMutationFactorS = value;
}
}
}
}
}
namespace Operators
{
public enum AutoMutateFactor
{
None = 1,
Factor5 = 5,
Factor10 = 10,
Factor20 = 20,
Factor50 = 50
}
}
In this implementation, a value of 1 increases the probability by a factor of 5, 10 or 20 depending upon the configuration. A value of 0 has no effect on the mutation probability. As the non-phenotype gene is subject itself, to this increased probability, there is a greater probability that the gene will be changed from 1 to 0 than from 0 to 1. The effect therefore, is that as convergence occurs, less and less chromosomes are subject to this increased mutation rate. However, if the landscape changes significantly, other chromosomes not affected by genetic operators come in to play. Those with the non-phenotype gene will increase the mutation probability rate once again, until the GA converges towards the new solution. The process will continue in this manner at every significant change in landscape. This Auto-Mutation process is explained in greater detail in Watsons paper.
In this example, the last gene is assumed to be the non-phenotype gene. In order to use this operator, simply create a chromosome with an extra gene. Set this gene to a random value and do not use the gene to store information (non-phenotype gene).
Summary
This article introduces the Genetic Algorithm and describes how to implement the GA by utilizing the Genetic Algorithm Framework for .NET. The aim was to demonstrate how simple it can be to leverage the power of the GA.
The library is provided free via NuGet with the primary aim of encouraging investigation and research. If you wish to use the library for a commercial project and require technical support or advice, please feel free to get in touch.
Enjoy!
References
Watson, T. (2003) An Investigation into Cooperative Behaviour: Altrurism and Evolutionary Computing. Submitted in partial fulfillment of the requirements for the degree of Doctor Of Philosophy DeMontfort University 2003