Click here to Skip to main content
Click here to Skip to main content

Neural Network for Recognition of Handwritten Digits

, 5 Dec 2006
Rate this:
Please Sign up or sign in to vote.
A convolutional neural network achieves 99.26% accuracy on a modified NIST database of hand-written digits.
Prize winner in Competition "MFC/C++ Nov 2006"

Graphical view of the neural network

Contents

Introduction

This article chronicles the development of an artificial neural network designed to recognize handwritten digits. Although some theory of neural networks is given here, it would be better if you already understood some neural network concepts, like neurons, layers, weights, and backpropagation.

The neural network described here is not a general-purpose neural network, and it's not some kind of a neural network workbench. Rather, we will focus on one very specific neural network (a five-layer convolutional neural network) built for one very specific purpose (to recognize handwritten digits).

The idea of using neural networks for the purpose of recognizing handwritten digits is not a new one. The inspiration for the architecture described here comes from articles written by two separate authors. The first is Dr. Yann LeCun, who was an independent discoverer of the basic backpropagation algorithm. Dr. LeCun hosts an excellent site on his research into neural networks. In particular, you should view his "Learning and Visual Perception" section, which uses animated GIFs to show results of his research. The MNIST database (which provides the database of handwritten digits) was developed by him. I used two of his publications as primary source materials for much of my work, and I highly recommend reading his other publications too (they're posted at his site). Unlike many other publications on neural networks, Dr. LeCun's publications are not inordinately theoretical and math-intensive; rather, they are extremely readable, and provide practical insights and explanations. His articles and publications can be found here. Here are the two publications that I relied on:

The second author is Dr. Patrice Simard, a former collaborator with Dr. LeCun when they both worked at AT&T Laboratories. Dr. Simard is now a researcher at Microsoft's "Document Processing and Understanding" group. His articles and publications can be found here, and the publication that I relied on is:

One of my goals here was to reproduce the accuracy achieved by Dr. LeCun, who was able to train his neural network to achieve 99.18% accuracy (i.e., an error rate of only 0.82%). This error rate served as a type of "benchmark", guiding my work.

As a final introductory note, I'm not overly proud of the source code, which is most definitely an engineering work-in-progress. I started out with good intentions, to make source code that was flexible and easy to understand and to change. As things progressed, the code started to turn ugly. I began to write code simply to get the job done, sometimes at the expense of clean code and comprehensibility. To add to the mix, I was also experimenting with different ideas, some of which worked and some of which did not. As I removed the failed ideas, I did not always back out all the changes and there are therefore some dangling stubs and dead ends. I contemplated the possibility of not releasing the code. But that was one of my criticisms of the articles I read: none of them included code. So, with trepidation and the recognition that the code is easy to criticize and could really use a re-write, here it is.

go back to top

Some Neural Network Theory

This is not a neural network tutorial, but to understand the code and the names of the variables used in it, it helps to see some neural networks basics.

The following discussion is not completely general. It considers only feed-forward neural networks, that is, neural networks composed of multiple layers, in which each layer of neurons feeds only the very next layer of neurons, and receives input only from the immediately preceding layer of neurons. In other words, the neurons don't skip layers.

Consider a neural network that is composed of multiple layers, with multiple neurons in each layer. Focus on one neuron in layer n, namely the i-th neuron. This neuron gets its inputs from the outputs of neurons in the previous layer, plus a bias whose valued is one ("1"). I use the variable "x" to refer to outputs of neurons. The i-th neuron applies a weight to each of its inputs, and then adds the weighted inputs together so as to obtain something called the "activation value". I use the variable "y" to refer to activation values. The i-th neuron then calculates its output value "x" by applying an "activation function" to the activation value. I use the letter "F()" to refer to the activation function. The activation function is sometimes referred to as a "Sigmoid" function, a "Squashing" function and other names, since its primary purpose is to limit the output of the neuron to some reasonable range like a range of -1 to +1, and thereby inject some degree of non-linearity into the network. Here's a diagram of a small part of the neural network; remember to focus on the i-th neuron in level n:

General diagram of a neuron in a neural network

This is what each variable means:

Output of the i-th neuron in layer n

is the output of the i-th neuron in layer n

Output of the j-th neuron in layer n-1

is the output of the j-th neuron in layer n-1

Output of the k-th neuron in layer n-1

is the output of the k-th neuron in layer n-1

Weight that the i-th neuron in layer n applies to the output of the j-th neuron from layer n-1

is the weight that the i-th neuron in layer n applies to the output of the j-th neuron from layer n-1 (i.e., the previous layer). In other words, it's the weight from the output of the j-th neuron in the previous layer to the i-th neuron in the current (n-th) layer.

Weight that the i-th neuron in layer n applies to the output of the k-th neuron in layer n-1

is the weight that the i-th neuron in layer n applies to the output of the k-th neuron in layer n-1

General feed-forward equation

is the general feed-forward equation, where F() is the activation function. We will discuss the activation function in more detail in a moment.

How does this translate into code and C++ classes? The way I saw it, the above diagram suggested that a neural network is composed of objects of four different classes: layers, neurons in the layers, connections from neurons in one layer to those in another layer, and weights that are applied to connections. Those four classes are reflected in the code, together with a fifth class -- the neural network itself -- which acts as a container for all other objects and which serves as the main interface with the outside world. Here's a simplified view of the classes. Note that the code makes heavy use of std::vector, particularly std::vector< double >:

// simplified view: some members have been omitted,
// and some signatures have been altered

// helpful typedef's

typedef std::vector< NNLayer* >  VectorLayers;
typedef std::vector< NNWeight* >  VectorWeights;
typedef std::vector< NNNeuron* >  VectorNeurons;
typedef std::vector< NNConnection > VectorConnections;


// Neural Network class

class NeuralNetwork  
{
public:
    NeuralNetwork();
    virtual ~NeuralNetwork();
    
    void Calculate( double* inputVector, UINT iCount, 
        double* outputVector = NULL, UINT oCount = 0 );

    void Backpropagate( double *actualOutput, 
         double *desiredOutput, UINT count );

    VectorLayers m_Layers;
};


// Layer class

class NNLayer
{
public:
    NNLayer( LPCTSTR str, NNLayer* pPrev = NULL );
    virtual ~NNLayer();
    
    void Calculate();
    
    void Backpropagate( std::vector< double >& dErr_wrt_dXn /* in */, 
        std::vector< double >& dErr_wrt_dXnm1 /* out */, 
        double etaLearningRate );

    NNLayer* m_pPrevLayer;
    VectorNeurons m_Neurons;
    VectorWeights m_Weights;
};


// Neuron class

class NNNeuron
{
public:
    NNNeuron( LPCTSTR str );
    virtual ~NNNeuron();

    void AddConnection( UINT iNeuron, UINT iWeight );
    void AddConnection( NNConnection const & conn );

    double output;

    VectorConnections m_Connections;
};


// Connection class

class NNConnection
{
public: 
    NNConnection(UINT neuron = ULONG_MAX, UINT weight = ULONG_MAX);
    virtual ~NNConnection();

    UINT NeuronIndex;
    UINT WeightIndex;
};


// Weight class

class NNWeight
{
public:
    NNWeight( LPCTSTR str, double val = 0.0 );
    virtual ~NNWeight();

    double value;
};

As you can see from the above, class NeuralNetwork stores a vector of pointers to layers in the neural network, which are represented by class NNLayer. There is no special function to add a layer (there probably should be one); simply use the std::vector::push_back() function. The NeuralNetwork class also provides the two primary interfaces with the outside world, namely, a function to forward propagate the neural network (the Calculate() function) and a function to Backpropagate() the neural network so as to train it.

Each NNLayer stores a pointer to the previous layer, so that it knows where to look for its input values. In addition, it stores a vector of pointers to the neurons in the layer, represented by class NNNeuron, and a vector of pointers to weights, represented by class NNWeight. Similar to the NeuralNetwork class, the pointers to the neurons and to the weights are added using the std::vector::push_back() function. Finally, the NNLayer class includes functions to Calculate() the output values of neurons in the layer, and to Backpropagate() them; in fact, the corresponding function in the NeuralNetwork class simply iterate through all layers in the network and call these functions.

Each NNNeuron stores a vector of connections that tell the neurons where to get its inputs. Connections are added using the NNNeuron::AddConnection() function, which takes an index to a neuron and an index to a weight, constructs a NNConnection object, and push_back()'s the new connection onto the vector of connections. Each neuron also stores its own output value, even though it's the NNLayer class that is responsible for calculating the actual value of the output and storing it there. The NNConnection and NNWeight classes respectively store obviously-labeled information.

One legitimate question about the class structure is, why are there separate classes for the weights and the connections? According to the diagram above, each connection has a weight, so why not put them in the same class? The answer lies in the fact that weights are often shared between connections. In fact, the convolutional neural network of this program specifically shares weights amongst its connections. So, for example, even though there might be several hundred neurons in a layer, there might only be a few dozen weights due to sharing. By making the NNWeight class separate from the NNConnection class, this sharing is more readily accomplished.

go back to top

Forward Propagation

Forward propagation is the process whereby each of all of the neurons calculates its output value, based on inputs provided by the output values of the neurons that feed it.

In the code, the process is initiated by calling NeuralNetwork::Calculate(). NeuralNetwork::Calculate() directly sets the values of neurons in the input layer, and then iterates through the remaining layers, calling each layer's NNLayer::Calculate() function. This results in a forward propagation that's completely sequential, starting from neurons in the input layer and progressing through to the neurons in the output layer. A sequential calculation is not the only way to forward propagate, but it's the most straightforward. Here's simplified code, which takes a pointer to a C-style array of doubles representing the input to the neural network, and stores the output of the neural network to another C-style array of doubles:

// simplified code

void NeuralNetwork::Calculate(double* inputVector, UINT iCount, 
               double* outputVector /* =NULL */, 
               UINT oCount /* =0 */)
                              
{
    VectorLayers::iterator lit = m_Layers.begin();
    VectorNeurons::iterator nit;
    
    // first layer is input layer: directly
    // set outputs of all of its neurons
    // to the given input vector
    
    if ( lit < m_Layers.end() )  
    {
        nit = (*lit)->m_Neurons.begin();
        int count = 0;
        
        ASSERT( iCount == (*lit)->m_Neurons.size() );
        // there should be exactly one neuron per input
        
        while( ( nit < (*lit)->m_Neurons.end() ) && ( count < iCount ) )
        {
            (*nit)->output = inputVector[ count ];
            nit++;
            count++;
        }
    }
    
    // iterate through remaining layers,
    // calling their Calculate() functions
    
    for( lit++; lit<m_Layers.end(); lit++ )
    {
        (*lit)->Calculate();
    }
    
    // load up output vector with results
    
    if ( outputVector != NULL )
    {
        lit = m_Layers.end();
        lit--;
        
        nit = (*lit)->m_Neurons.begin();
        
        for ( int ii=0; ii<oCount; ++ii )
        {
            outputVector[ ii ] = (*nit)->output;
            nit++;
        }
    }
}

Inside the layer's Calculate() function, the layer iterates through all neurons in the layer, and for each neuron the output is calculated according to the feed-forward formula given above, namely

General feed-forward equation

This formula is applied by iterating through all connections for the neuron, and for each connection, obtaining the corresponding weight and the corresponding output value from a neuron in the previous layer:

// simplified code

void NNLayer::Calculate()
{
    ASSERT( m_pPrevLayer != NULL );
    
    VectorNeurons::iterator nit;
    VectorConnections::iterator cit;
    
    double dSum;
    
    for( nit=m_Neurons.begin(); nit<m_Neurons.end(); nit++ )
    {
        NNNeuron& n = *(*nit);  // to ease the terminology
        
        cit = n.m_Connections.begin();
        
        ASSERT( (*cit).WeightIndex < m_Weights.size() );