Neural Network for Recognition of Handwritten Digits

Mike O'Neill

4.97/5 (235 votes)

Dec 5, 2006

68 min read

2859260

57577

A convolutional neural network achieves 99.26% accuracy on a modified NIST database of hand-written digits.

Download the Neural Network demo project - 203 Kb (includes a release-build executable that you can run without the need to compile)
Download a sample neuron weight file - 2,785 Kb (achieves the 99.26% accuracy mentioned above)
Download the MNIST database - 11,594 Kb total for all four files (external link to four files which are all required for this project)

Graphical view of the neural network

Introduction
Some Neural Network Theory
Structure of the Convolutional Neural Network
- Illustration and General Description
- Code For Building the Neural Network
MNIST Database of Handwritten Digits
Overall Architecture of the Test/Demo Program
Training the Neural Network
Tricks That Make Training Faster
Experiences in Training the Neural Network
Results
Bibliography
License and Version Information

Introduction

This article chronicles the development of an artificial neural network designed to recognize handwritten digits. Although some theory of neural networks is given here, it would be better if you already understood some neural network concepts, like neurons, layers, weights, and backpropagation.

The neural network described here is not a general-purpose neural network, and it's not some kind of a neural network workbench. Rather, we will focus on one very specific neural network (a five-layer convolutional neural network) built for one very specific purpose (to recognize handwritten digits).

The idea of using neural networks for the purpose of recognizing handwritten digits is not a new one. The inspiration for the architecture described here comes from articles written by two separate authors. The first is Dr. Yann LeCun, who was an independent discoverer of the basic backpropagation algorithm. Dr. LeCun hosts an excellent site on his research into neural networks. In particular, you should view his "Learning and Visual Perception" section, which uses animated GIFs to show results of his research. The MNIST database (which provides the database of handwritten digits) was developed by him. I used two of his publications as primary source materials for much of my work, and I highly recommend reading his other publications too (they're posted at his site). Unlike many other publications on neural networks, Dr. LeCun's publications are not inordinately theoretical and math-intensive; rather, they are extremely readable, and provide practical insights and explanations. His articles and publications can be found here. Here are the two publications that I relied on:

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. [46 pages]
Y. LeCun, L. Bottou, G. Orr, and K. Muller, "Efficient BackProp," in Neural Networks: Tricks of the trade, (G. Orr and Muller K., eds.), 1998. [44 pages]

The second author is Dr. Patrice Simard, a former collaborator with Dr. LeCun when they both worked at AT&T Laboratories. Dr. Simard is now a researcher at Microsoft's "Document Processing and Understanding" group. His articles and publications can be found here, and the publication that I relied on is:

Patrice Y. Simard, Dave Steinkraus, John Platt, "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis," International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Los Alamitos, pp. 958-962, 2003.

One of my goals here was to reproduce the accuracy achieved by Dr. LeCun, who was able to train his neural network to achieve 99.18% accuracy (i.e., an error rate of only 0.82%). This error rate served as a type of "benchmark", guiding my work.

As a final introductory note, I'm not overly proud of the source code, which is most definitely an engineering work-in-progress. I started out with good intentions, to make source code that was flexible and easy to understand and to change. As things progressed, the code started to turn ugly. I began to write code simply to get the job done, sometimes at the expense of clean code and comprehensibility. To add to the mix, I was also experimenting with different ideas, some of which worked and some of which did not. As I removed the failed ideas, I did not always back out all the changes and there are therefore some dangling stubs and dead ends. I contemplated the possibility of not releasing the code. But that was one of my criticisms of the articles I read: none of them included code. So, with trepidation and the recognition that the code is easy to criticize and could really use a re-write, here it is.

	is the output of the i-th neuron in layer n
	is the output of the j-th neuron in layer n-1
	is the output of the k-th neuron in layer n-1
	is the weight that the i-th neuron in layer n applies to the output of the j-th neuron from layer n-1 (i.e., the previous layer). In other words, it's the weight from the output of the j-th neuron in the previous layer to the i-th neuron in the current (n-th) layer.
	is the weight that the i-th neuron in layer n applies to the output of the k-th neuron in layer n-1

	is the error due to a single pattern P at the last layer n;
	is the target output at the last layer (i.e., the desired output at the last layer); and
	is the actual value of the output at the last layer.


Input Layer 29x29	Layer #1 6 Feature Maps Each 13x13	Layer #2 50 Feature Maps Each 5x5	Layer #3 Fully Connected 100 Neurons	Layer #4 Fully Connected 10 Neurons

Neural Network for Recognition of Handwritten Digits

Contents

Introduction

Some Neural Network Theory

Forward Propagation

The Activation Function (or, "Sigmoid" or "Squashing" Function)

Backpropagation

Second Order Methods

Structure of the Convolutional Neural Network

Illustration and General Description

Code For Building the Neural Network

MNIST Database of Handwritten Digits

Overall Architecture of the Test/Demo Program

Using The Demo Program

Graphical View of the Neural Network

Training View and Control Over the Neural Network

Testing View of the Neural Network

Training the Neural Network

Tricks That Make Training Faster

Second Order Backpropagation Using Pseudo-Hessian

Simultaneous Backpropagation and Forward Propagation

Skip Backpropagation for Small Errors

Experiences In Training the Neural Network

Results

Bibliography

License and Version Information