# AI : Neural Network for beginners (Part 2 of 3)

By , 29 Jan 2007

## Introduction

1. Part 1 : Is an introduction into Perceptron networks (single layer neural networks).
2. Part 2 : This one, is about multi layer neural networks, and the back propagation training method to solve a non linear classification problem such as the logic of an XOR logic gate. This is something that a Perceptron can't do. This is explained further within this article.
3. Part 3 : Will be about how to use a genetic algorithm (GA) to train a multi layer neural network to solve some logic problem.

### Summary

This article will show how to use a multi-layer neural network to solve the XOR logic problem.

## A Brief Recap (From part 1 of 3)

Before we commence with the nitty gritty of this new article which deals with multi layer Neural Networks, let just revisit a few key concepts. If you haven't read Part 1, perhaps you should start there.

### Perceptron Configuration ( Single layer network)

The inputs `(x1,x2,x3..xm)` and connection weights` (w1,w2,w3..wm)` shown below are typically real values, both positive (+) and negative (-).

The perceptron itself, consists of weights, the summation processor, an activation function, and an adjustable threshold processor (called bias here after).

For convenience, the normal practice is to treat the bias as just another input. The following diagram illustrates the revised configuration.

The bias can be thought of as the propensity (a tendency towards a particular way of behaving) of the perceptron to fire irrespective of it's inputs. The perceptron configuration network shown above fires if the weighted sum > 0, or if you have into maths type explanations

So that's the basic operation of a perceptron. But we now want to build more layers of these, so let's carry on to the new stuff.

## So Now The New Stuff (More layers)

From this point on, anything that is being discussed relates directly to this article's code.

In the summary at the top, the problem we are trying to solve was how to use a multi-layer neural network to solve the XOR logic problem. So how is this done. Well it's really an incremental build on what Part 1 already discussed. So let's march on.

What does the XOR logic problem look like? Well, it looks like the following truth table:

Remember with a single layer (perceptron) we can't actually achieve the XOR functionality, as it is not linearly separable. But with a multi-layer network, this is achievable.

## What Does The New Network Look Like

The new network that will solve the XOR problem will look similar to a single layer network. We are still dealing with inputs / weights / outputs. What is new is the addition of the hidden layer.

As already explained above, there is one input layer, one hidden layer and one output layer.

It is by using the inputs and weights that we are able to work out the activation for a given node. This is easily achieved for the hidden layer as it has direct links to the actual input layer.

The output layer, however, knows nothing about the input layer as it is not directly connected to it. So to work out the activation for an output node we need to make use of the output from the hidden layer nodes, which are used as inputs to the output layer nodes.

This entire process described above can be thought of as a pass forward from one layer to the next.

This still works like it did with a single layer network; the activation for any given node is still worked out as follows:

Where (wi is the weight(i), and Ii is the input(i) value)

You see it the same old stuff, no demons, smoke or magic here. It's stuff we've already covered.

So that's how the network looks/works. So now I guess you want to know how to go about training it.

## Types Of Learning

There are essentially 2 types of learning that may be applied, to a Neural Network, which is "Reinforcement" and "Supervised"

### Reinforcement

In Reinforcement learning, during training, a set of inputs is presented to the Neural Network, the Output is 0.75, when the target was expecting 1.0.

The error (1.0 - 0.75) is used for training ('wrong by 0.25').

What if there are 2 outputs, then the total error is summed to give a single number (typically sum of squared errors). Eg "your total error on all outputs is 1.76"

Note that this just tells you how wrong you were, not in which direction you were wrong.

Using this method we may never get a result, or it could be a case of 'Hunt the needle'.

NOTE : Part 3 of this series will be using a GA to train a Neural Network, which is Reinforcement learning. The GA simply does what a GA does, and all the normal GA phases to select weights for the Neural Network. There is no back propagation of values. The Neural Network is just good or just bad. As one can imagine, this process takes a lot more steps to get to the same result.

### Supervised

Not just 'how wrong' it was, but 'in what direction it was wrong' like 'Hunt the needle' but where you are told 'North a bit', 'West a bit'.

So you get, and use, far more information in Supervised Learning, and this is the normal form of Neural Network learning algorithm. Back Propagation (what this article uses, is Supervised Learning)

## Learning Algorithm

In brief, to train a multi-layer Neural Network, the following steps are carried out:

• Start off with random weights (and biases) in the Neural Network
• Try one or more members of the training set, see how badly the output(s) are compared to what they should be (compared to the target output(s))
• Jiggle weights a bit, aimed at getting improvement on outputs
• Now try with a new lot of the training set, or repeat again,
jiggling weights each time
• Keep repeating until you get quite accurate outputs

This is what this article submission uses to solve the XOR problem. This is also called "Back Propagation" (normally called BP or BackProp)

Backprop allows you to use this error at output, to adjust the weights arriving at the output layer, but then also allows you to calculate the effective error 1 layer back, and use this to adjust the weights arriving there, and so on, back-propagating errors through any number of layers.

The trick is the use of a sigmoid as the non-linear transfer function (which was covered in Part 1. The sigmoid is used as it offers the ability to apply differentiation techniques.

Because this is nicely differentiable – it so happens that

Which in context of the article can be written as

delta_outputs[i] = outputs[i] * (1.0 - outputs[i]) * (targets[i] - outputs[i])

It is by using this calculation that the weight changes can be applied back through the network.

### Things To Watch Out For

Valleys: Using the rolled ball metaphor, there may well be valleys like this, with steep sides and a gently sloping floor. Gradient descent tends to waste time swooshing up and down each side of the valley (think ball!)

So what can we do about this. Well we add a momentum term, that tends to cancel out the back and forth movements and emphasizes any consistent direction, then this will go down such valleys with gentle bottom-slopes much more successfully (faster)

## Starting The Training

This is probably best demonstrated with a code snippet from the article's actual code:

```/// <summary>
/// The main training. The expected target values are passed in to this
/// method as parameters, and the <see cref="NeuralNetwork">NeuralNetwork</see>
/// is then updated with small weight changes, for this training iteration
/// This method also applied momentum, to ensure that the NeuralNetwork is
/// nurtured into proceeding in the correct direction. We are trying to avoid valleys.
/// If you don't know what valleys means, read the articles associated text
/// </summary>
/// <param name="target">A double[] array containing the target value(s)</param>
private void train_network(double[] target)
{
//get momentum values (delta values from last pass)
double[] delta_hidden = new double[nn.NumberOfHidden + 1];
double[] delta_outputs = new double[nn.NumberOfOutputs];

// Get the delta value for the output layer
for (int i = 0; i < nn.NumberOfOutputs; i++)
{
delta_outputs[i] =
nn.Outputs[i] * (1.0 - nn.Outputs[i]) * (target[i] - nn.Outputs[i]);
}
// Get the delta value for the hidden layer
for (int i = 0; i < nn.NumberOfHidden + 1; i++)
{
double error = 0.0;
for (int j = 0; j < nn.NumberOfOutputs; j++)
{
error += nn.HiddenToOutputWeights[i, j] * delta_outputs[j];
}
delta_hidden[i] = nn.Hidden[i] * (1.0 - nn.Hidden[i]) * error;
}
// Now update the weights between hidden & output layer
for (int i = 0; i < nn.NumberOfOutputs; i++)
{
for (int j = 0; j < nn.NumberOfHidden + 1; j++)
{
//use momentum (delta values from last pass),
//to ensure moved in correct direction
nn.HiddenToOutputWeights[j, i] += nn.LearningRate * delta_outputs[i] * nn.Hidden[j];
}
}
// Now update the weights between input & hidden layer
for (int i = 0; i < nn.NumberOfHidden; i++)
{
for (int j = 0; j < nn.NumberOfInputs + 1; j++)
{
//use momentum (delta values from last pass),
//to ensure moved in correct direction
nn.InputToHiddenWeights[j, i] += nn.LearningRate * delta_hidden[i] * nn.Inputs[j];
}
}
}
```

## So Finally The Code

Well, the code for this article looks like the following class diagram (It's Visual Studio 2005 C#, .NET v2.0)

The main classes that people should take the time to look at would be :

• `NN_Trainer_XOR `: Trains a Neural Network to solve the XOR problem
• `TrainerEventArgs `: Training event args, for use with a GUI
• `NeuralNetwork `: A configurable Neural Network
• `NeuralNetworkEventArgs` : Training event args, for use with a GUI
• `SigmoidActivationFunction` : A static method to provide the sigmoid activation function

The rest are a GUI I constructed simply to show how it all fits together.

## Code Demos

The DEMO application attached has 3 main areas which are described below:

### LIVE RESULTS Tab

It can be seen that this has very nearly solved the XOR problem (You will probably never get it 100% accurate)

### TRAINING RESULTS Tab

Viewing the training phase target/outputs together

Viewing the training phase errors

### TRAINED RESULTS Tab

Viewing the trained target/outputs together

Viewing the trained errors

It is also possible to view the Neural Networks final configuration using the "View Neural Network Config" button. If people are interested in what weights the Neural Network ended up with, this is the place to look.

## What Do You Think ?

That's it. I would just like to ask, if you liked the article, please vote for it.

## Points of Interest

I think AI is fairly interesting, that's why I am taking the time to publish these articles. So I hope someone else finds it interesting, and that it might help further someone's knowledge, as it has my own.

Anyone that wants to look further into AI type stuff, that finds the content of this article a bit basic should check out Andrew Krillovs articles, at Andrew Krillov CP articles as his are more advanced, and very good. In fact anything Andrew seems to do, is very good.

## History

• v1.0 24/11/06

## Bibliography

• Artificial Intelligence 2nd edition, Elaine Rich / Kevin Knight. McGraw Hill Inc.
• Artificial Intelligence, A Modern Approach, Stuart Russell / Peter Norvig. Prentice Hall.

A list of licenses authors might use can be found here

 Sacha Barber Software Developer (Senior) United Kingdom Member
I currently hold the following qualifications (amongst others, I also studied Music Technology and Electronics, for my sins)

- MSc (Passed with distinctions), in Information Technology for E-Commerce
- BSc Hons (1st class) in Computer Science & Artificial Intelligence

Both of these at Sussex University UK.

Award(s)

I am lucky enough to have won a few awards for Zany Crazy code articles over the years

• Microsoft C# MVP 2013
• Codeproject MVP 2013
• Microsoft C# MVP 2012
• Codeproject MVP 2012
• Microsoft C# MVP 2011
• Codeproject MVP 2011
• Microsoft C# MVP 2010
• Codeproject MVP 2010
• Microsoft C# MVP 2009
• Codeproject MVP 2009
• Microsoft C# MVP 2008
• Codeproject MVP 2008
• And numerous codeproject awards which you can see over at my blog

Votes of 3 or less require a comment

 Search this forum Profile popups    Spacing RelaxedCompactTight   Noise Very HighHighMediumLowVery Low   Layout Open AllThread ViewNo JavascriptPreview   Per page 102550
 First PrevNext
 I have a question,thank you for telling me . fengyelan 16 Apr '13 - 21:31
 After reading your article about BP neural network ,I began to code a BPNN using java.When I tested it,I found that, when I used{0,0} or {1,1} as inputs and 0.0 as result, then ,I will get 0.0 when tested any{*,*}(like {1,0},{2,3}). As I thought, the inputs can not influence the result. This confused me for days, I will be grateful that you can lead me to a correct understanding of BPNN. Sign In·View Thread·Permalink
 My vote of 5 Nickydo 10 Sep '12 - 1:30
 Part 3? Mauro Leggieri 5 Apr '09 - 6:41
 Re: Part 3? Sacha Barber 5 Apr '09 - 8:55
 You can get to all my articles old and new right here   http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=569009[^]   The 3rd part of the NN series is AI : Neural Network for beginners (Part 3 of 3)[^]   You may also want to check out Daniel Vaughans latest article Perceptor: An artificially intelligent guided navigation system for WPF[^]   Sacha Barber Microsoft Visual C# MVP 2008/2009Codeproject MVP 2008/2009Your best friend is you. I'm my best friend too. We share the same views, and hardly ever argue   My Blog : sachabarber.net Sign In·View Thread·Permalink
 Re: Part 3? Mauro Leggieri 6 Apr '09 - 2:17
 About parameterizing the 'momentum' factor mahabir 23 Sep '08 - 19:48
 Hi Sacha, This is a very nice article, for sure. Congrats and thanks.   But there is no explicit variable for 'momentum (eta)'. The Delta factor is not the actual momentum term but is error factor, I think.   If I have to add momentum factor and parameterize it, where should I change it? Is it : new_wt = lerning_rage*delta*input + momentum*new_wt   Thanking in advance.   Prajwol Sign In·View Thread·Permalink
 Re: About parameterizing the 'momentum' factor Sacha Barber 23 Sep '08 - 21:57
 That is the way I was taught to do. This is sometimes called "Gradient descent", so you should find some examples in google, but from memory, its been a while for me with ANNs, what you say looks right   Sacha Barber Microsoft Visual C# MVP 2008Codeproject MVP 2008Your best friend is you. I'm my best friend too. We share the same views, and hardly ever argue   My Blog : sachabarber.net Sign In·View Thread·Permalink
 Re: About parameterizing the 'momentum' factor Sacha Barber 23 Sep '08 - 21:59
 Actually checking my notes, I think all is ok. Have a look at   //use momentum (delta values from last pass), //to ensure moved in correct direction nn.InputToHiddenWeights[j, i] += nn.LearningRate * delta_hidden[i] * nn.Inputs[j];   Sacha Barber Microsoft Visual C# MVP 2008Codeproject MVP 2008Your best friend is you. I'm my best friend too. We share the same views, and hardly ever argue   My Blog : sachabarber.net Sign In·View Thread·Permalink
 Re: About parameterizing the 'momentum' factor [modified] ramesh0285 26 Nov '12 - 17:14
 part 1 gholamabbas Sayyad 18 Sep '08 - 20:21
 Re: part 1 Sacha Barber 18 Sep '08 - 21:49
 There was no code for part1   Sacha Barber Microsoft Visual C# MVP 2008Codeproject MVP 2008Your best friend is you. I'm my best friend too. We share the same views, and hardly ever argue   My Blog : sachabarber.net Sign In·View Thread·Permalink
 Solution for getTrainSet(int idx) DKHVC 16 Apr '08 - 20:09
 Hi, good article and good teacher men...   I have the solution to use easy the array train_set in c#,   Declare the array : ``` private double[][] train_set = { new double[2] {0,0}, new double[2] {0,1 }, new double[2] {1,0}, new double[2] {1,1} };```   then next replace de function getTrainSet(int idx) for get de value directly : train_set[idx].   or if you want in function getTrainSet(int idx): replace :   ```double[] trainValues = { train_set[idx, 0], train_set[idx, 1] }; //replace for : double[] trainValues = train_set[idx];```   Ok... only a solution ..   zxc Sign In·View Thread·Permalink
 [Message Deleted] Danny Rodriguez 27 Jan '08 - 9:05
 Hello MohamadJaber 10 Dec '07 - 23:33
 Hello,   I'm searching if it's possible a training method for a function using neuronal network   example suppose that we have F(x,y,z) and I have a A= (x1,y1,z1; F(x1,y1,z1)+b1) .... (xn,yn,zn; F(xn,yn,zn)+bn)   and I need to trained a neuronal network using the database A.   that mean after training of neuronal network I want to give the neuronal network the input x,y,z and obtain F(x,y,z)   Thanks for u help   dadax Sign In·View Thread·Permalink
 Erratic Bahaviour? rampantandroid 15 Oct '07 - 17:17
 Hi,   First, I wanted to say thanks for the article - the best article I've found as it doesn't use some prebuilt NN library (where's the fun in that?)   My question is - I'm seeing erratic behavour, unsure why. Sometimes the NN trains fine for XOR - sometimes...not so. On occasion (1 out of every 3-4 runs) it will not train for exemplars 2,3 and 4 right. Here's an example of this fluke:   END OF TRAINING Output : 0.03 / Target Output : 0 Output : 0.95 / Target Output : 1 Output : 0.49 / Target Output : 1 Output : 0.49 / Target Output : 0   I wrote my own program based on your code, but I used just the NN class and the FRMMain code (removing the third class - the trainer class - and the event handlers in an attempt to simplify things a little...) and my own program sees the same problem. I usually run 5,000 epochs, but I get these errors just the same if I instead run 50,000 epochs...and usually, I'm able to see if it won't separate right by the 500th epoch, if not sooner.   Changing the randomized weights to go from say 10 to -10 makes it worse...but making them range from 0 to 1 doesn't fix it (but it is no worse than the -2 to 2 that you used.)   Is this a known problem? Might you have some tips? I've looked at both your code, as well as mine, to no avail.   Thanks! Sign In·View Thread·Permalink
 Re: Erratic Bahaviour? rampantandroid 15 Oct '07 - 18:56
 Also, I did further testing...   If I find a set of randomized weights that worked once...it always works (if I use that set every run it works) - obvious discover, I know...   However, if I take a set that just isn't working, and keep trying to train it...I NEVER separate. For example, a set that I let run for 1 million epochs; the result:   Epoch: 999900 Output : 0.00294 / Target : 0 Output : 0.99676 / Target : 1 Output : 0.49841 / Target : 1 Output : 0.50159 / Target : 0   Thanks. Sign In·View Thread·Permalink
 Small Suggestion dfhgesart 28 Jul '07 - 15:26
 Add the following to frmMain.cs:   ``` private void txtResults_TextChanged(object sender, EventArgs e) { txtResults.Select(txtResults.Text.Length, 1); txtResults.ScrollToCaret(); }   ```   So that you can see the live results scrolling automatically to the latest results...   Other than that, great article. Sign In·View Thread·Permalink
 Re: Small Suggestion Sacha Barber 28 Jul '07 - 23:35
 Its a fair comment.Alas I am too busy for these older articles, new articles and masters dissertation demanding too much time.   Glad you like it though   Sacha Barber A Modern Geek - I cook, I clean, I drink, I Program. Modern or what?   My Blog : sachabarber.net Sign In·View Thread·Permalink
 Excellent! merlin981 17 May '07 - 4:31
 Thank you for these detailed articles. They are very helpful and informative.   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I get my developer tools from Merlin A.I. Soft I get my news and jokes from Daily Roundup ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sign In·View Thread·Permalink
 license? famousj.dejazzd.com 17 Jan '07 - 8:32
 Re: license? Sacha Barber 18 Jan '07 - 2:19
 There are no license implications. Feel free to use this code as is.     If you like what you see vot for my other article in the monthly competition   http://www.codeproject.com/script/survey/survey.asp?survey=639   Article URL http://www.codeproject.com/useritems/GA_ANN_XOR.asp   Other than that one favour, feel free to use the source code   Have fun.   sacha barber Sign In·View Thread·Permalink
 most of the time it won't converged.. can explain? f2 24 Dec '06 - 8:34
 Re: most of the time it won't converged.. can explain? Sacha Barber 25 Dec '06 - 22:14
 Re: most of the time it won't converged.. can explain? f2 6 Jan '07 - 18:42
 it never reach the target.   i think ur 3rd article did solve the problem already. something to do with the randomizer. thanks anyway.   do u have any plan to demonstrate how NN work with SVM?   from, -= aLbert =- Sign In·View Thread·Permalink
 Re: most of the time it won't converged.. can explain? Sacha Barber 29 Jan '07 - 21:44
 Yes this could well be a RNG issue. GRRRR Always the same issues with RNG.   No plans for SVM, but if you would like to do it with this skeleton code go ahead     sacha barber Sign In·View Thread·Permalink
 Last Visit: 31 Dec '99 - 18:00     Last Update: 22 May '13 - 23:26 Refresh 12 Next »