|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThe intention of this series of articles is to write an introduction into the world of genetic algorithms starting from the absolute beginnings and working towards more complex solutions as the series progresses. The code provided is intended to run on Windows XP with .NET version 1.1. This article will focus on the very basic elements of genetic algorithms focusing on what a genetic algorithm is and will provide the code for a demonstration application that despite having certain problems will be used to show how a genetic algorithm is intended to work with more advanced methods being the target of future articles. The secondary aim of the articles is to develop and provide a free genetic library that will allow any future development of genetic algorithms both for myself and others to be developed more easily by providing a set of generic tools that will allow future developers using the library to concentrate more on the application specific side of what they want their genetic algorithm to do rather than constantly trying to work out how to do it. What is a genetic algorithm?The theory of a genetic algorithm is based on the idea of reproduction and genetics in biology. This, to greatly simplify, is to take a pair of strings or lists of letters or chemical types and which are compatible and to mix and match them and see what the results are. Of course, there are limits in biology to what exactly can be mixed and matched and the same should go for our own genetic algorithms. To go right down to the genetic algorithms for dummies level for a moment if you have a cup of coffee and some sugar and some salt and try different combinations of two of the three then most people will agree that the coffee and sugar mix taste the best while the others pretty much suck in terms of providing a refreshing beverage. The biological idea of a genetics algorithm is the DNA strand with each section of the DNA strand being equivalent to a certain function or characteristic of the body. Although it should probably be mentioned here that this bodily characteristic is rarely as simple as tall or short or big nose or little nose but rather these things are more a combination of the resulting influences of the separate strands of chemicals within the genetic strands, influences which I'm pretty sure are not fully understood by anyone yet, so don't go expecting genetic face lifts to start appearing in the supermarket any time soon. But let's just for a second suppose that such a thing is actually possible and that we can nip off down to the supermarket and get our very own genetic face lift in pill form. How would it be made up, well at a guess it would go something like this, You start with a few basic types which you can choose from, starting with: SIZE: Fat
Skinny
Medium
Then you think about the length. LENGTH: Long
Short
Medium
Finally, you put a bit of characteristics on it. CHARACTERISTIC: Hooked
Stubby
Average
You take one pill from each section and wake up in the morning with the nose of your choice and if you don't like it then you can always try another combination the next night. Of course, though the only problem with this is when the decisions about your genetic make up are made you are just a chemical soup in your mothers womb, so your ability to affect the outcome is somewhat limited, along with the fact that you couldn't possibly have any idea about what type of noses are going to be "in" in twenty years time. So along with the rest of us you have to let nature do it's job and whine about it when you're a teenager. The idea behind genetic algorithms is to simulate the job done by nature when making this kind of choice and use it to find answers to problems. Being programmers we want to find the correct answers but we won't worry about that a great deal for now and just concentrate on how we are going to get an answer at all. To use a genetic algorithm for the above problem what we do is present the solution as an array and then select the answers from that so we would have: Array 0 :- SIZE
Array 1 :- LENGTH
Array 2 :- CHARACTERISTIC
Each item in the array would contain another array that holds the options so Array 0 would contain: array 0 :- Fat
array 1 :- Skinny
array 2 :- Medium
Now if in the code we use zeros and ones to turn an option on or off with a 1 being on and a 0 being off then a print out of the Array 0 for a fat nose would read 100, a print out for a skinny nose would be 010 and a print out for a medium nose would read 001. We could of course have mixes and variations such as a medium fat nose which would be 101 but we won't go there for now. So extending this throughout a print out for a completely average nose would read: Array 0 :- 001
Array 1 :- 001
Array 2 :- 001
which would give us a complete string ( probably should steer clear of the phrase nose string here ) of 001001001. Now if we say that this is how all noses are defined then all we need to do is manipulate this string and we can select any type of nose that is possible given our requirements by simply switching the values in the string between 1 and 0. This is what a genetic algorithm does of course the string doesn't have to be zeros and ones and it can represent anything you want it to be but at its most basic form a genetic algorithm will manipulate a string in order for the program to get whatever meaning is represented by the string from it. The meaning and representations of the string are of course up to the programmer and the requirements of the project. So the next question is how do we do it. The demonstration codeBefore we get into the details of how to implement the genetic algorithm, we should probably take a quick look at the demonstration code provided with the article. For this project all the code is provided in one project that builds slightly different executables designed to highlight the points in each article. The demonstration code for this project is the GeneticsDev project which can be built by left clicking on the project in Developer Studio and then right clicking and selecting "Set As StartUp Project" on the menu. The idea behind the demonstration code is to provide what appears to be a binary type of calculator, that is you enter a sum to be calculated in the boxes provided and the code tries to find the answer to the sum in binary. The amount of binary numbers that can be calculated is configurable by the code and we will look at the reason for this and the repercussions later. For now though the GeneticsDev demonstration uses ten bit binary numbers so anything that returns a one will return a string of 0000000001 and a two would be 0000000010 and a three would be 0000000011 etc. Note that there is no real reason to use binary strings here I just felt that it would give a simple visual representation of what is occurring in the code. The demonstration itself is a simple front end that uses three custom threads, one that is fired during start-up and is used to do the initialization for the genetic algorithm and two separate threads that are used to do the calculations. The reason for the separate threads is that the application is processor intensive and will become unresponsive if threads are not used. ReproductionGenetic algorithms are basically random data trying to find the answer or the solution to a problem through what is essentially guess work. The initial array is randomly initiated and then we hope to move it into the direction of the answer or solution. This of course requires that we have at least some idea of the solution we are looking for but it doesn't mean that the requirements for testing the fitness need to be as exact as I have shown them in the demonstration code to this project. Basically it is possible to write a fitness function that requires the code to achieve a goal say of finding a path through a map without requiring that the code finds a specific path through the map, as in the shortest or the longest. However, if we don't have any type of fitness function within the code at all we are just peeing into the wind and hoping to get it into a bucket that someone has randomly placed behind us. Without any attempt to guide the program to the solution we are looking for we may as well just make a lot of calls to the random function and see what the code does. This is what version one of the genetic algorithm does. So we take our randomly initialized array and check to see if it has the correct answer in it while all the time kind of hoping that it doesn't start randomly guessing it every time that we run it and for those who have run the demonstration code already you may have seen that sometimes this is the case. Now we have the situation where we have an array of random data and a solution that we want the code to find, the way genetic algorithms do this is to select data from the array and use it to create new data. This is done by using the biological technique of combining the data in some way so that we create an entirely new attempt at a solution from the data that we have selected. This is called reproduction. Reproduction is the cornerstone of any genetic algorithm and serves exactly the same function that it serves in biology. The idea being that you take two or more parents and derive offspring, in this case we are using binary string so say 0000000001 and 0000000010 are the parents there are only three noticeably different offspring that can be produced these being 0000000001, 0000000010 and 0000000011, which in decimal numbers would be 1, 2 and 3. The code that determines which offspring will be chosen is covered in the section on Crossover below for now we will concentrate on just how we are supposed to decide who the parents are supposed to be. The common term for selecting the parents in a genetic algorithm is called Fitness. Fitness is meant to be the way of selecting the parents from which the offspring for the genetic algorithm are chosen and as such is the hardest part of the genetic algorithm to discuss as it is always completely application specific, so we will discuss it in terms of the demonstration code on the understanding that the method chosen is completely optional and that there are other ways of choosing the parents even within the demonstration code. The only limits really when it comes to choosing the fitness for a genetic algorithm are really the developer's imagination and understanding of the project requirements. In the demonstration project we set up an array of hundred binary numbers which may or may not contain a binary string who's decimal value is equal to the answer. If there is no binary string who's decimal value is equal to the required answer then we create a new generation of binary strings from the current generation but before we can do this we need to select the items from the array of a hundred that we are going to use as parents to derive the offspring from. In the demonstration code for this project found in form one of the GeneticsDev project, I have written a function called SelectionBefore the genetic algorithm can make any decisions, it must choose who the parents are going to be and as with nature the kids may not always get the parents they deserve. There are a number of ways the parents for the next child in the algorithm can be generated which we will go into more detail later, in the first demonstration though a simple random algorithm known as Roulette Wheel Selection is used. Roulette Wheel Selection simply chooses two parents at random from those on offer in the array although the parents themselves have been through the fitness function first so it is not completely random in that sense. CrossoverCrossover is the mechanism by which genetic algorithm chooses the parts of the parents that it wants, this can be application specific but is kept general throughout these projects. To go back to the biological analogy, crossover is the part of the code where one parent has brown eyes and the other parent has blue eyes and the color of the child's eyes is determined. There are a number of ways in which this can be achieved some of which are discussed in part two of this project. For now though crossover goes something like this. If you have two parents, 0101010101 and 1010101010. You want to create a child that is a mixture of the two parents. The simplest way to do this and the way that is used in the demonstration code for this project is to split the parents strings and simply cross them, so you get a child that looks like: 0101001010. This gives you a unique offspring that is made up entirely from the parents genetic make-up. MutationMutation is the process by which random variations are introduced into the genetic string, to continue with the natural analogy say that you have a string for the eyes and that part of the string has a value to indicate if the child's eyes are squinty or not. If a random value is generated when the child is created and the value is less than a given value which is usually extremely low, it's set to 0.01 in the demonstration code then the value for squinty eyes will be changed from what ever its current value is to it's opposite. So if the child's value for squinty eyes is 0 for 'not squinty' when the value is fired, then it will be set to one and the child will have squinty eyes. The codeThe code for this demonstration is designed to give a simple view into the world of genetic algorithms, showing how the basic code works and concentrating on demonstrating the order of things rather than trying to create a fully functioning program from the start I figured it would be best to talk about the common pitfalls and problems that can run into it when using genetic algorithms for a couple of reasons, the first being that we aren't dealing with an exact science here, every project will be different and a solution that works for one project is likely to completely foul up another project, so understanding the reasons for this is paramount before starting any project. The demonstration codeThe demonstration project is a simple Windows application that uses a The code uses these genetic strings to get a number and then attempts to find the correct answer to the sum entered by checking the binary values in the array with the value given by the sum entered. When the application starts a thread is started that initializes the array that contains the binary numbers and the user interface is disabled until this is complete. The user then enters a sum using the numeric check boxes on the forms. The two values on the forms are restricted as the total value of the answer cannot exceed the maximum value that can be held in a ten character binary array. Once the user has entered the sum they can run it through one of two genetic algorithm implementations, the first implementation being a very basic implementation that relies on the generation of random numbers or the second version that uses a fitness function to try and guide the algorithm to the correct answer. Before we get down to the problems of these implementations as they stand, let's take a look at how they work. First of all the array is initialized when a variable named for( int i=0; i<PopulationSize; i++ )
{
this.GeneticsArray.Add(new BinaryGeneticsString(ChromosomeLength));
}
This adds a new Random rand = new Random();
for( int i=0; i<ChromosomeLength; i++ )
{
/// need to slow the computer down
///
System.Threading.Thread.Sleep( 2 );
GeneticsString.Add( rand.Next() % 2 );
}
which creates a random string of zeros and ones in an array of Starting with the When the code does nothing more than print information to the bool bStop = false;
int nRunCount = 0;
while( bStop == false )
{
/// print out
for( int i=0; i<binGenetics.PopulationSize; i++ )
{
binTest.BinaryString =
((BinaryGeneticsString)binGenetics.GeneticsArray[i]).ToString();
/// print out
if( binAnswer.Number == binTest.Number )
{
/// print out
bStop = true;
OnStopThreadOne( this, null );
break;
}
}
/// now update the gentics array
///
if( bStop == false )
{
binGenetics.Run();
nRunCount++;
}
}
The code cycles through the population stored in the The ArrayList holdingList = new ArrayList();
for( int i=0; i<PopulationSize; i++ )
{
BinaryGeneticsString bsTemp =
new BinaryGeneticsString( ChromosomeLength, false );
RouletteWheelSelection();
SinglePointCrossOver( bsTemp, nCrossOverPoint );
holdingList.Add( bsTemp );
}
GeneticsArray.Clear();
for( int i=0; i<PopulationSize; i++ )
{
GeneticsArray.Add( holdingList[ i ] );
}
Mutation();
The code shows the order in which the genetics algorithm functions are called. To begin with a temporary The libraryAs mentioned above the idea behind the library is to both provide a framework to use in building future genetic algorithm projects and as a template to hold some of the more generic functionality. Which in this case means holding some of the functionality for the selection and crossover so that when an algorithm is being tested it should be reasonably simple to be able to change the selection and crossover functions used within the project to find an implementation that will allow the best results. The main library classes are the Genetics Base class and Genetics String class and both are meant to be inherited by the current implementation to provide the basic functionality of the genetics algorithm so that the classes that inherit from them can concentrate on getting the implementation right. The main class is the Genetics Base class which contains the main blueprint of the functionality for the genetic algorithm, as well as the variables necessary for the implementation of the algorithm such as the population size and the length of the chromosome array. Its main characteristics though lie in the three abstract functions and how they are implemented by child classes. These are: ///
/// Initialise function for setting up the starting array
///
public abstract void Initialise();
///
/// Mutation function to add mutations where applicable
/// Note this will always be program specific
///
public abstract void Mutation();
///
/// Do a single run through the population of the genetics array
///
public abstract void Run();
The for( int i=0; i<PopulationSize; i++ )
{
this.GeneticsArray.Add(new BinaryGeneticsString(ChromosomeLength));
}
which adds a new Random rand = new Random();
for( int i=0; i<ChromosomeLength; i++ )
{
/// need to slow the computer down
///
System.Threading.Thread.Sleep( 2 );
GeneticsString.Add( rand.Next() % 2 );
}
The important line here is the FitnessAs mentioned the way Fitness is calculated for a genetic algorithm is highly application specific so I can only talk about it here in reference to the current project. The idea of the SelectionAs mentioned above the selection method used for the first demonstration project is the Roulette Wheel Selection method and the code looks like this: Random rand = new Random();
int nSelected = rand.Next( GeneticsArray.Count );
gsParentOne = ( GeneticsStrings )GeneticsArray[ nSelected ];
System.Threading.Thread.Sleep( 2 );
nSelected = rand.Next( GeneticsArray.Count );
while( gsParentOne ==
( GeneticsStrings )GeneticsArray[ nSelected ] )
{
nSelected = rand.Next( GeneticsArray.Count );
}
gsParentTwo = ( GeneticsStrings )GeneticsArray[ nSelected ];
The variables CrossoverOnce we have selected our parents that are going to form the child we need to get to the crossover function, whose function it is to take a part of one parent and a part of another parent and mix them together so that we get a new, in this case binary string: child.GeneticsString.Clear();
for( int i=0; i<SingleCrossOverPoint; i++ )
{
child.GeneticsString.Add( gsParentOne.GeneticsString[ i ] );
}
for( int i=SingleCrossOverPoint; i<gsParentTwo.GeneticsString.Count; i++ )
{
child.GeneticsString.Add( gsParentTwo.GeneticsString[ i ] );
}
As with the Selection functions there are several crossover functions available for us to use, with the one used here being the Single Point Crossover. The MutationThe final part of the process is the mutation part, at least it is the way the library is organized as it works once the complete array of children has been built rather than applying it individually at the time each child is created. It is also not applied within the base classes as once again it is application specific, you can't write a mutation function for a genetic algorithm until you know what it is that you are mutating. So the function can be found in the Random rand = new Random();
Random randLocation = new Random();
int nLocation = 0;
BinaryGeneticsString bsTemp = null;
for( int i=0; i<PopulationSize; i++ )
{
if( rand.NextDouble() < MutationRate )
{
nLocation = randLocation.Next( ChromosomeLength );
bsTemp = ( BinaryGeneticsString )GeneticsArray[ i ];
if( ( Int32 )bsTemp.GeneticsString[ nLocation ] == 0 )
{
bsTemp.GeneticsString[ nLocation ] = 1;
}
else
bsTemp.GeneticsString[ nLocation ] = 0;
}
}
The decision whether a child is to be mutated or not is entirely dependant on the ProblemsNow in a perfect world we would be able to say fine that's a wrap, pack our stuff and go home to fire up World of Warcraft and put our feet up. Unfortunately, there remains one minor setback and that is the program we have has one tiny but noticeable little flaw. And that little flaw is that it simply isn't very good at doing what we want it to do. In fact, if you run the second algorithm for long enough you will get something that looks like this: Fixing the problemsThe problem here is that although the algorithm got quite close to the answer, it is only 12 off after all, the algorithm isn't sophisticated enough to
One quick solution to this problem is the This however, is a common problem with genetic algorithms and while changing the crossover points or algorithm completely is one solution there is no guaranteed solution. There are a few tools that we can use. Here are two of them. In the Form1.cs file at line 49 there is a variable called GenerationsOnce the checkbox for generations is clicked the code will use the default settings or any changes made to the default settings. It works on the idea that there are fifty cycles of breeding to create a generation, if the answer to the sum ( in this case ) isn't found by the end of a generation then the genetics array is reinitialized and the process is started again until either the correct answer is found or the program runs through all its allotted generations. Adding the generations code to the program is fairly simple in that all we need to do is initialize the generation code and then interrupt the loop with the code to update it. To initialize generations we add the following code: if( bProblemSolving == true )
{
binGenetics.UseGenerations = generationsBox.Checked;
if( binGenetics.UseGenerations == true )
{
binGenetics.NumberOfGenerations =
( int )numberGenerationsBox.Value;
binGenetics.NumberOfCyclesPerGeneration =
( int )cyclesPerGenerationBox.Value;
}
}
which checks if the problem solving mode is on and then checks if the generations option is checked, if it is then the variables to control the generations code are set to the values in the numeric boxes. The following code is then added to the loop to control the flow of the program: if( binGenetics.UseGenerations == true )
{
if( nGenerationCount == binGenetics.NumberOfGenerations )
{
/// output text
bStop = true;
OnStopThreadOne( this, null );
}
if( nRunCount == binGenetics.NumberOfCyclesPerGeneration )
{
/// output text
binGenetics.Initialise();
nGenerationCount++;
nRunCount = 0;
}
}
As with the initialization code we first check that the problem solving option is on and then check to see if we've used all our allowed generations if we haven't we then check to see if the end of a cycle has been reached and if it has we reset everything and reinitialize the ElitismElitism is the process by which items in the Adding Elitism to a child class of the if( UseElitism == true )
{
AddEliteValues();
}
Simple check if elitism is being used and then call the function that you will have to write to get the elite values that you wish to save. Of course, you will have to set it up first which requires: binGenetics.UseElitism = elitismBox.Checked;
if( binGenetics.UseElitism == true )
{
binGenetics.ElitismValue = binAnswer.Number;
}
or similar code being added to the function that calls the run function. Additional codeAlong with the development of any library goes the need for functionality so I've also included a number of selection and crossover methods that can be used in any application developed using the library. In order to develop and test these methods I decided that rather than trying to shoehorn them into the GeneticsDev application I would include a second application developed specifically for the purpose. This is the GeneticsDevAdditional application provided with the sample code. In order to compile and run this sample right click on the GeneticsDevAdditional title and select Set As Startup Project. As you can see there have been a few cosmetic changes to accommodate the extra options on two tab pages one containing the options for Selection and one containing the options for CrossOver. The main changes made to the code are: binGenetics.CrossOverMethod = CROSSOVERMETHODS.SINGLEPOINTCROSSOVER;
binGenetics.SelectionMethod = SELECTIONMETHODS.ROULETTEWHEEL;
/// This section will hold the changable values
/// for the genetics selection
/// and crossover.
///
binGenetics.SingleCrossOverPoint = 5;
binGenetics.DualPointCrossOverPointOne = 4;
binGenetics.DualPointCrossOverPointTwo = 7;
binGenetics.PositionBasedCrossOverPoints = 5;
In Form1.cs for the GeneticsDevAdditional project as you can see from the first two lines the Selection();
CrossOver( bsTemp );
Selection methodsThere are only two options available for selection. The first one being the Roulette wheel option that we have seen previously and the second being the Fixed Point selection option. The problem from the point of view of this code is that once you start getting into the specifics of what selection methods you are going to use then you are venturing into the realm of application specific code, so the idea behind this option is that the person using the library can write a selection algorithm and then simply tell the base class which two array items it would like to use as parents. The code given here simply starts by taking the first two items in the array and passing the array values into the Crossover methodsThere are four crossover methods available. The first being the The The The The code for both the ConclusionThat concludes the beginners guide to genetic algorithms. In the next part, we'll take a look at using genetic algorithms in more of a hopefully realistic application, i.e. where it does something useful in a situation where the end we are looking for is known but not how to get there and I will be taking a look at adaptive programming. Primary references
References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||