11,412,578 members (70,179 online)

# Back-propagation Neural Net

, 28 Mar 2006 GPL3
 Rate this:
A C++ class implementing a back-propagation algorithm neural net, that supports any number of layers/neurons.

## Introduction

The class `CBackProp` encapsulates a feed-forward neural network and a back-propagation algorithm to train it. This article is intended for those who already have some idea about neural networks and back-propagation algorithms. If you are not familiar with these, I suggest going through some material first.

## Background

This is part of an academic project which I worked on during my final semester back in college, for which I needed to find the optimal number and size of hidden layers and learning parameters for different data sets. It wasn't easy finalizing the data structure for the neural net and getting the back-propagation algorithm to work. The motivation for this article is to save someone else the same effort.

Here's a little disclaimer... This article describes a simplistic implementation of the algorithm, and does not sufficiently elaborate on the algorithm. There is a lot of room to improve the included code (like adding exception handling , and for many steps, there is a lot more reasoning required than I have included, e.g., values that I have chosen for parameters, and the number of layers/the neurons in each layer are for demonstrating the usage and may not be optimal. To know more about these, I suggest going to:

## Using the code

Typically, the usage involves the following steps:

• Create the net using `CBackProp::CBackProp(int nl,int *sz,double b,double a)`
• Apply the back-propagation algorithm - train the net by passing the input and the desired output, to `void CBackProp::bpgt(double *in,double *tgt)` in a loop, until the Mean square error, which is obtained by `CBackProp::double mse(double *tgt)`, gets reduced to an acceptable value.
• Use the trained net to make predictions by feed-forwarding the input data using `void CBackProp::ffwd(double *in)`.

The following is a description of the sample program that I have included.

## One step at a time...

#### Setting the objective:

We will try to teach our net to crack the binary A XOR B XOR C. XOR is an obvious choice, it is not linearly separable hence requires hidden layers and cannot be learned by a single perception.

A training data set consists of multiple records, where each record contains fields which are input to the net, followed by fields consisting of the desired output. In this example, it's three inputs + one desired output.

```// prepare XOR training data
double data[][4]={//    I  XOR  I  XOR  I   =   O
//--------------------------------
0,      0,      0,      0,
0,      0,      1,      1,
0,      1,      0,      1,
0,      1,      1,      0,
1,      0,      0,      1,
1,      0,      1,      0,
1,      1,      0,      0,
1,      1,      1,      1 };```

#### Configuration:

Next, we need to specify a suitable structure for our neural network, i.e., the number of hidden layers it should have and the number of neurons in each layer. Then, we specify suitable values for other parameters: learning rate - `beta`, we may also want to specify momentum - `alpha` (this one is optional), and Threshold - `thresh` (target mean square error, training stops once it is achieved else continues for `num_iter` number of times).

Let's define a net with 4 layers having 3,3,3, and 1 neuron respectively. Since the first layer is the input layer, i.e., simply a placeholder for the input parameters, it has to be the same size as the number of input parameters, and the last layer being the output layer must be same size as the number of outputs - in our example, these are 3 and 1. Those other layers in between are called hidden layers.

```int numLayers = 4, lSz[4] = {3,3,3,1};
double beta = 0.2, alpha = 0.1, thresh = 0.00001;
long num_iter = 500000;```

#### Creating the net:

`CBackProp *bp = new CBackProp(numLayers, lSz, beta, alpha);`

#### Training:

```for (long i=0; i < num_iter ; i++)
{
bp->bpgt(data[i%8], &data[i%8][3]);

if( bp->mse(&data[i%8][3]) < thresh)
break; // mse < threshold - we are done training!!!
}```

#### Let's test its wisdom:

We prepare test data, which here is the same as training data minus the desired output.

```double testData[][3]={ //  I  XOR  I  XOR  I  =  ?
//----------------------
0,      0,      0,
0,      0,      1,
0,      1,      0,
0,      1,      1,
1,      0,      0,
1,      0,      1,
1,      1,      0,
1,      1,      1};```

Now, using the trained network to make predictions on our test data....

```for ( i = 0 ; i < 8 ; i++ )
{
bp->ffwd(testData[i]);
cout << testData[i][0]<< "  "
<< testData[i][1]<< "  "
<< testData[i][2]<< "  "
<< bp->Out(0) << endl;
}```

## Now a peek inside:

#### Storage for the neural net

I think the following code has ample comments and is self-explanatory...

```class CBackProp{

//      output of each neuron
double **out;

//      delta error value for each neuron
double **delta;

//      3-D array to store weights for each neuron
double ***weight;

//      no of layers in net including input layer
int numl;

//      array of numl elements to store size of each layer
int *lsize;

//      learning rate
double beta;

//      momentum
double alpha;

//      storage for weight-change made in previous epoch
double ***prevDwt;

//      sigmoid function
double sigmoid(double in);

public:

~CBackProp();

//      initializes and allocates memory
CBackProp(int nl,int *sz,double b,double a);

//      backpropogates error for one set of input
void bpgt(double *in,double *tgt);

//      feed forwards activations for one set of inputs
void ffwd(double *in);

//      returns mean square error of the net
double mse(double *tgt);

//      returns i'th output of the net
double Out(int i) const;
};```

Some alternative implementations define a separate class for layer / neuron / connection, and then put those together to form a neural network. Although it is definitely a cleaner approach, I decided to use `double ***` and `double **` to store weights and output etc. by allocating the exact amount of memory required, due to:

• The ease it provides while implementing the learning algorithm, for instance, for weight at the connection between (i-1)th layer's jth Neuron and ith layer's kth neuron, I personally prefer `w[i][k][j]` (than something like `net.layer[i].neuron[k].getWeight(j)`). The output of the i`th` neuron of the j`th` layer is `out[i][j]`, and so on.
• Another advantage I felt is the flexibility of choosing any number and size of the layers.
```// initializes and allocates memory
CBackProp::CBackProp(int nl,int *sz,double b,double a):beta(b),alpha(a)
{

// Note that the following are unused,
//
// delta[0]
// weight[0]
// prevDwt[0]

//  I did this intentionally to maintain
//  consistency in numbering the layers.
//  Since for a net having n layers,
//  input layer is referred to as 0th layer,
//  first hidden layer as 1st layer
//  and the nth layer as output layer. And
//  first (0th) layer just stores the inputs
//  hence there is no delta or weight
//  values associated to it.

//    set no of layers and their sizes
numl=nl;
lsize=new int[numl];

for(int i=0;i<numl;i++){
lsize[i]=sz[i];
}

//    allocate memory for output of each neuron
out = new double*[numl];

for( i=0;i<numl;i++){
out[i]=new double[lsize[i]];
}

//    allocate memory for delta
delta = new double*[numl];

for(i=1;i<numl;i++){
delta[i]=new double[lsize[i]];
}

//    allocate memory for weights
weight = new double**[numl];

for(i=1;i<numl;i++){
weight[i]=new double*[lsize[i]];
}
for(i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
weight[i][j]=new double[lsize[i-1]+1];
}
}

//    allocate memory for previous weights
prevDwt = new double**[numl];

for(i=1;i<numl;i++){
prevDwt[i]=new double*[lsize[i]];

}
for(i=1;i<numl;i++){
for(int j=0;j<lsize[i];j++){
prevDwt[i][j]=new double[lsize[i-1]+1];
}
}

//    seed and assign random weights
srand((unsigned)(time(NULL)));
for(i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
weight[i][j][k]=(double)(rand())/(RAND_MAX/2) - 1;

//    initialize previous weights to 0 for first iteration
for(i=1;i<numl;i++)
for(int j=0;j<lsize[i];j++)
for(int k=0;k<lsize[i-1]+1;k++)
prevDwt[i][j][k]=(double)0.0;
}```

#### Feed-Forward

This function updates the output value for each neuron. Starting with the first hidden layer, it takes the input to each neuron and finds the output (`o`) by first calculating the weighted sum of inputs and then applying the Sigmoid function to it, and passes it forward to the next layer until the output layer is updated:

where:

```// feed forward one set of input
void CBackProp::ffwd(double *in)
{
double sum;

// assign content to input layer

for(int i=0;i < lsize[0];i++)
out[0][i]=in[i];

// assign output(activation) value
// to each neuron usng sigmoid func

// For each layer
for(i=1;i < numl;i++){
// For each neuron in current layer
for(int j=0;j < lsize[i];j++){
sum=0.0;
// For input from each neuron in preceding layer
for(int k=0;k < lsize[i-1];k++){
// Apply weight to inputs and add to sum
sum+= out[i-1][k]*weight[i][j][k];
}
// Apply bias
sum+=weight[i][j][lsize[i-1]];
// Apply sigmoid function
out[i][j]=sigmoid(sum);
}
}
}```

#### Back-propagating...

The algorithm is implemented in the function `void CBackProp::bpgt(double *in,double *tgt)`. Following are the various steps involved in back-propagating the error in the output layer up till the first hidden layer.

```void CBackProp::bpgt(double *in,double *tgt)
{
double sum;```

First, we call `void CBackProp::ffwd(double *in)` to update the output values for each neuron. This function takes the input to the net and finds the output of each neuron:

where:

`ffwd(in);`

The next step is to find the delta for the output layer:

```for(int i=0;i < lsize[numl-1];i++){
delta[numl-1][i]=out[numl-1][i]*
(1-out[numl-1][i])*(tgt[i]-out[numl-1][i]);
}```

then find the delta for the hidden layers...

```for(i=numl-2;i>0;i--){
for(int j=0;j < lsize[i];j++){
sum=0.0;
for(int k=0;k < lsize[i+1];k++){
sum+=delta[i+1][k]*weight[i+1][k][j];
}
delta[i][j]=out[i][j]*(1-out[i][j])*sum;
}
}```

Apply momentum (does nothing if alpha=0):

```for(i=1;i < numl;i++){
for(int j=0;j < lsize[i];j++){
for(int k=0;k < lsize[i-1];k++){
weight[i][j][k]+=alpha*prevDwt[i][j][k];
}
weight[i][j][lsize[i-1]]+=alpha*prevDwt[i][j][lsize[i-1]];
}
}```

Finally, adjust the weights by finding the correction to the weight.

And then apply the correction:

```for(i=1;i < numl;i++){
for(int j=0;j < lsize[i];j++){
for(int k=0;k < lsize[i-1];k++){
prevDwt[i][j][k]=beta*delta[i][j]*out[i-1][k];
weight[i][j][k]+=prevDwt[i][j][k];
}
prevDwt[i][j][lsize[i-1]]=beta*delta[i][j];
weight[i][j][lsize[i-1]]+=prevDwt[i][j][lsize[i-1]];
}
}```

#### How learned is the net?

Mean square error is used as a measure of how well the neural net has learnt.

As shown in the sample XOR program, we apply the above steps until a satisfactorily low error level is achieved. `CBackProp::double mse(double *tgt)` returns just that.

## History

• Created date: 25th Mar'06.

## Share

Software Developer
India
I have done Masters in Computer Application. I work with a software product development company, where I code in c, c++ and java, mostly on unix.

 First Prev Next
 Fine piece of code SoothingMist at 6-Jan-15 22:56 SoothingMist 6-Jan-15 22:56
 Sigmoid derivative Raphael Gruaz at 2-Oct-12 0:12 Raphael Gruaz 2-Oct-12 0:12
 Overflow Miguel Tomas at 5-Aug-12 7:06 Miguel Tomas 5-Aug-12 7:06
 How to save the network? tooym at 20-Jan-10 17:05 tooym 20-Jan-10 17:05
 How to minimize Total Network Error AjayIndian at 31-Dec-09 23:21 AjayIndian 31-Dec-09 23:21
 Understanding the basis for code locuaz at 16-Oct-09 22:34 locuaz 16-Oct-09 22:34
 how can I get the source code xumin1988 at 16-Oct-09 6:03 xumin1988 16-Oct-09 6:03
 Bias value? emrecaglar at 12-Oct-09 9:30 emrecaglar 12-Oct-09 9:30
 Out(0) tangsu at 1-Sep-09 2:10 tangsu 1-Sep-09 2:10
 Thank you for uploading your code for implementation of backpropagation neural networks. I have a question about statement: ">> bp->Out(0) >> endl;" Function "Out" has no reference anywhere in the code. Could you please clarify on this? Tang Su
 telecom billing Sofritom at 13-Jul-09 1:32 Sofritom 13-Jul-09 1:32
 Error in the code [modified] gpwr9k95 at 11-Jun-09 13:08 gpwr9k95 11-Jun-09 13:08
 is this a bug? Member 429450 at 18-Apr-09 15:18 Member 429450 18-Apr-09 15:18
 Normalized values picand at 6-May-08 2:09 picand 6-May-08 2:09
 Feedforward Backpropagation Neural Network raceng0585 at 7-Dec-07 5:41 raceng0585 7-Dec-07 5:41
 Trouble converting to C pradeep swamy at 18-Nov-07 22:37 pradeep swamy 18-Nov-07 22:37
 Re: Trouble converting to C robiii at 16-Jul-09 5:31 robiii 16-Jul-09 5:31
 Scaling Input ravenspoint at 16-Oct-07 4:44 ravenspoint 16-Oct-07 4:44
 MSE Calculation ravenspoint at 16-Oct-07 4:04 ravenspoint 16-Oct-07 4:04
 Re: MSE Calculation brutjbro at 23-Oct-07 3:25 brutjbro 23-Oct-07 3:25
 Re: MSE Calculation ravenspoint at 23-Oct-07 4:48 ravenspoint 23-Oct-07 4:48
 Re: MSE Calculation brutjbro at 24-Oct-07 9:57 brutjbro 24-Oct-07 9:57
 Re: MSE Calculation ravenspoint at 28-Oct-07 9:16 ravenspoint 28-Oct-07 9:16
 Re: MSE Calculation Steven Lutz at 3-Mar-09 14:31 Steven Lutz 3-Mar-09 14:31
 multi hidden layers xxin22 at 15-Oct-07 6:58 xxin22 15-Oct-07 6:58
 Training stops too soon. ravenspoint at 2-Oct-07 12:01 ravenspoint 2-Oct-07 12:01
 Re: Training stops too soon. dr_terry at 17-Nov-08 12:21 dr_terry 17-Nov-08 12:21
 Range of output limited to 0,1 ? ravenspoint at 2-Oct-07 8:35 ravenspoint 2-Oct-07 8:35
 Re: Range of output limited to 0,1 ? Evaldaz at 16-Jun-11 2:26 Evaldaz 16-Jun-11 2:26
 multiplication table Eran Aharonovich at 10-Aug-07 11:25 Eran Aharonovich 10-Aug-07 11:25
 Re: multiplication table ravenspoint at 3-Oct-07 8:02 ravenspoint 3-Oct-07 8:02
 Re: multiplication table freedelta at 31-Jan-10 7:11 freedelta 31-Jan-10 7:11
 Re: its urgent Christian Graus at 22-Mar-07 11:18 Christian Graus 22-Mar-07 11:18
 Re: its urgent Pete O'Hanlon at 23-Mar-07 1:05 Pete O'Hanlon 23-Mar-07 1:05
 Re: its urgent Russell Jones at 23-Mar-07 6:10 Russell Jones 23-Mar-07 6:10
 Re: its urgent DavidCrow at 29-Mar-07 5:15 DavidCrow 29-Mar-07 5:15
 Re: its urgent TheCoolestDudeInComputerWorld at 27-Apr-07 14:40 TheCoolestDudeInComputerWorld 27-Apr-07 14:40
 attacks pozhil at 10-Jan-07 20:14 pozhil 10-Jan-07 20:14
 Re: attacks cristitomi at 7-Mar-07 3:50 cristitomi 7-Mar-07 3:50
 Compile problem callistoss at 16-Nov-06 6:57 callistoss 16-Nov-06 6:57
 Re: Compile problem Tejpal Chhabra at 22-Nov-06 19:43 Tejpal Chhabra 22-Nov-06 19:43
 Re: Compile problem bhooday at 22-Feb-07 17:47 bhooday 22-Feb-07 17:47
 Re: Compile problem Twirrim at 14-Apr-09 10:17 Twirrim 14-Apr-09 10:17
 doubt f2 at 1-Jul-06 7:06 f2 1-Jul-06 7:06
 Re: doubt cristitomi at 7-Mar-07 3:48 cristitomi 7-Mar-07 3:48
 Hyperbolic Tangent or Symetric Sigmoid Petar Slavov at 4-May-06 3:16 Petar Slavov 4-May-06 3:16
 Re: Hyperbolic Tangent or Symetric Sigmoid vikashparida at 20-Jun-06 23:52 vikashparida 20-Jun-06 23:52
 Guidelines for selecting values? AAntix at 28-Mar-06 8:30 AAntix 28-Mar-06 8:30
 Re: Guidelines for selecting values? Tejpal S at 31-Mar-06 21:00 Tejpal S 31-Mar-06 21:00
 Last Visit: 31-Dec-99 19:00     Last Update: 26-Apr-15 16:03 Refresh 1