Click here to Skip to main content
Click here to Skip to main content

Creating Optical Character Recognition (OCR) applications using Neural Networks

By , 1 Sep 2004
 

Introduction

A lot of people today are trying to write their own OCR (Optical Character Recognition) System or to improve the quality of an existing one.

This article shows how the use of artificial neural network simplifies development of an optical character recognition application, while achieving highest quality of recognition and good performance.

Background

Developing proprietary OCR system is a complicated task and requires a lot of effort. Such systems usually are really complicated and can hide a lot of logic behind the code. The use of artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network in OCR is extensibility of the system – ability to recognize more character sets than initially defined. Most of traditional OCR systems are not extensible enough. Why? Because such task as working with tens of thousands Chinese characters, for example, is not as easy as working with 68 English typed character set and it can easily bring the traditional system to its knees!

Well, the Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems.

Originated in late 1950's, neural networks didn’t gain much popularity until 1980s – a computer boom era. Today ANNs are mostly used for solution of complex real world problems. They are often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found) and are often well suited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and signal recognition, as well as functional prediction and system modeling, where the physical processes are not understood or are highly complex. The advantage of ANNs lies in their resilience against distortions in the input data and their capability to learn.

Using the code

In this article I use a sample application from Neuro.NET library to show how to use Backpropagation neural network in a simple OCR application.

Let’s assume you that you already have gone through all image pre-processing routines (resampling, deskew, zoning, blocking etc.) and you already have images of the characters from your document. (In the example I simply generate those images).

Creating the neural network.

Let’s construct the network first. In this example I use a Backpropagation neural network. The Backpropagation network is a multilayer perceptron model with an input layer, one or more hidden layers, and an output layer.

The nodes in the Backpropagation neural network are interconnected via weighted links with each node usually connecting to the next layer up, till the output layer which provides output for the network. The input pattern values are presented and assigned to the input nodes of the input layer. The input values are initialized to values between -1 and 1. The nodes in the next layer receive the input values through links and compute output values of their own, which are then passed to the next layer. These values propagate forward through the layers till the output layer is reached, or put another way, till each output layer node has produced an output value for the network. The desired output for the input pattern is used to compute an error value for each node in the output layer, and then propagated backwards (and here's where the network name comes in) through the network as the delta rule is used to adjust the link values to produce better, the desired output. Once the error produced by the patterns in the training set is below a given tolerance, the training is complete and the network is presented new input patterns and produce an output based on the experience it gained from the learning process.

I will use a library class BackPropagationRPROPNetwork to construct my own OCRNetwork.

//Inherit form Backpropagation neural network
public class OCRNetwork: BackPropagationRPROPNetwork
{
    //Override method of the base class in order to implement our 
    //own training method
    public override void Train(PatternsCollection patterns) 
    {    
        ...
    }
}

I override the Train method of the base class to implement my own training method. Why do I need to do it? I do it because of one simple reason: the training progress of the network is measured by quality of produced result and speed of training. You have to establish the criteria when the quality of network output is acceptable for you and when you can stop the training process. The implementation I provide here is proven (based on my experience) to be fast and accurate. I decided that I can stop the training process when network is able to recognize all of the patterns, without a single error. So, here is the implementation of my training method.

public override void Train(PatternsCollection patterns) 
{   //Current iteration number 
    if (patterns != null) 
    {
        double error = 0;
        int good = 0;
        // Train until all patterns are correct
        while (good < patterns.Count)
         {
            good = 0;
            for (int i = 0; i<patterns.Count; i++)
            {
                //Set the input values of the network 
                for (int k = 0; k<NodesInLayer(0); k++) 
                    nodes[k].Value = patterns[i].Input[k];
                //Run the network
                this.Run();
                //Set the expected result
                for (int k = 0;k< this.OutputNodesCount;k++) 
                    this.OutputNode(k).Error = patterns[i].Output[k];
                //Make the network to remember corresponding output 
                //values. (Teach the network)
                this.Learn();
                //See if network did produced correct result during 
                //this iteration
                if (BestNodeIndex == OutputPatternIndex(patterns[i]))
                                  good++;
             }
            //Adjust weights of the links in the network to their
            //average value. (An epoch training technique)
            foreach (NeuroLink link in links) 
                ((EpochBackPropagationLink)link).Epoch(patterns.Count);
        }
    }
}

Also, I have implemented a BestNodeIndex property that returns the index of the node having maximum value and having the minimal error. An OutputPatternIndex method returns the index of the pattern output element having value of 1. If those indices are matched – the network has produced correct result. Here is how the BestNodeIndex implementation looks like:

public int BestNodeIndex
{
    get {
        int result = -1;
        double aMaxNodeValue = 0;
        double aMinError = double.PositiveInfinity;
        for (int i = 0; i< this.OutputNodesCount;i++)
        {
            NeuroNode node = OutputNode(i);
            //Look for a node with maximum value or lesser error
            if ((node.Value > aMaxNodeValue)||
                  ((node.Value >= aMaxNodeValue)&&(node.Error <aMinError))) 
            {
                aMaxNodeValue = node.Value;
                aMinError = node.Error;
                result = i;
            }
        }
        return result;
     }
}

As simple as it gets I create the instance of the neural network. The network has one constructor parameter – integer array describing number of nodes in each layer of the network. First layer in the network is an input layer. The number of elements in this layer corresponds to number of elements in input pattern and is equal to number of elements in digitized image matrix (we will talk about it later). The network may have multiple middle layers with different number of nodes in each layer. In this example I use only one layer and apply “not official rule of thumb” to determine number of nodes in this layer:

NodesNumber = (InputsCount+OutputsCount) / 2

Note: You can experiment by adding more middle layers and using different number of nodes in there - just to see how it will affect the training speed and recognition quality of the network.

The last layer in the network is an output layer. This is the layer where we look for the results. I define the number of nodes in this layer equal to a number of characters that we going to recognize.

//Create an instance of the network
backpropNetwork = new OCRNetwork(new int[3] {aMatrixDim * aMatrixDim, 
       (aMatrixDim * aMatrixDim + aCharsCount)/2, aCharsCount});

Creating training patterns

Now let's talk about the training patterns. Those patterns will be used for teaching the neural network to recognize the images. Basically, each training pattern consists of two single-dimensional arrays of float numbers – Inputs and Outputs arrays.

/// <summary>
/// A class representing single training pattern and is used to train a 
/// neural network. Contains input data and expected results arrays.
/// </summary>
public class Pattern: NeuroObject
{
    private double[] inputs, outputs;
    ...
}

The Inputs array contains your input data. In our case it is a digitized representation of the character's image. Under “digitizing” the image I mean process of creating a brightness (or absolute value of the color vector-whatever you choose) map of the image. To create this map I split the image into squares and calculate average value of each square. Then I store those values into the array.

I have implemented CharToDoubleArray method of the network to digitize the image. There I use an absolute value of the color for each element of the matrix. (No doubt that you can use other techniques there…) After the image is digitized, I have to scale-down the results in order to fit them into a range from -1 ..1 to comply with input values range of the network. To do this I wrote a Scale method, where I look for the maximum element value of the matrix and then divide all elements of the matrix by it. So, implementation of CharToDoubleArray looks like this:

//aSrc – an image of the character
//aArrayDim – dimension of the pattern matrix
//calculate image quotation X step
double xStep = (double)aSrc.Width/(double)aArrayDim; 
//calculate image quotation Y step
double yStep = (double)aSrc.Height/(double)aArrayDim;
double[] result = new double[aMatrixDim*aMatrixDim ];
for (int i=0; i<aSrc.Width; i++)
    for (int j=0;j<aSrc.Height;j++)
    {
        //calculate matrix address 
        int x = (int)(i/xStep);
        int y = (int)(j/yStep);
        //Get the color of the pixel 
        Color c = aSrc.GetPixel(i,j);
        //Absolute value of the color, but I guess, it is possible to
        //use the B component of Alpha color space too...
        result[y*x+y]+=Math.Sqrt(c.R*c.R+c.B*c.B+c.G*c.G); 
    }
//Scale the matrix to fit values into a range from 0..1 (required by 
//ANN) In this method we look for a maximum value of the element 
//and then divide all elements of the matrix by this maximum value.
return Scale(result);

The Outputs array of the pattern represents an expected result – the result that network will use during the training. There are as many elements in this array as many characters we going to recognize. So, for instance, to teach the network to recognize English letters from “A” to “Z” we will need 25 elements in the Outputs array. Make it 50 if you decide to include lower case letters. Each element corresponds to a single letter. The Inputs of each pattern are set to a digitized image data and a corresponding element in the Outputs array to 1, so network will know which output (letter) corresponds to input data. The method CreateTrainingPatterns does this job for me.

public PatternsCollection CreateTrainingPatterns(Font font) { 
//Create pattern collection 
// As many inputs (examples) as many elements in digitized image matrix 
// As many outputs as many characters we going to recognize.
PatternsCollection result = new PatternsCollection(aCharsCount, 
                              aMatrixDim * aMatrixDim, aCharsCount);
// generate one pattern for each character
for (int i= 0; i<aCharsCount; i++)
{
      //CharToDoubleArray creates an image of the character and digitizes it.
      //You can change this method to pass actual the image of the character 
      double[] aBitMatrix = CharToDoubleArray(Convert.ToChar(aFirstChar + i), 
                                                        font, aMatrixDim, 0); 
      //Assign matrix value as input to the pattern
       for (int j = 0; j<aMatrixDim * aMatrixDim; j++)
            result[i].Input[j] = aBitMatrix[j];
      //Output value set to 1 for corresponding character.
      //Rest of the outputs are set to 0 by default.
           result[i].Output[i] = 1; 
     }
       return result;
}

Now we have completed creation of patterns and we can use those to train the neural network.

Training of the network.

To start training process of the network simple call the Train method and pass your training patterns in it.

//Train the network 
backpropNetwork.Train(trainingPatterns);

Normally, an execution flow will leave this method when training is complete, but in some cases it could stay there forever (!).The Train method is currently implemented relying only on one fact: the network training will be completed sooner or later. Well, I admit - this is wrong assumption and network training may never complete. The most “popular” reasons for neural network training failure are:

Training never completes because:

Possible solution

1. The network topology is too simple to handle amount of training patterns you provide. You will have to create bigger network.

Add more nodes into middle layer or add more middle layers to the network.

2. The training patterns are not clear enough, not precise or are too complicated for the network to differentiate them.

As a solution you can clean the patterns or you can use different type of network /training algorithm. Also, you cannot train the network to guess next winning lottery numbers... :-)

3. Your training expectations are too high and/or not realistic.

Lower your expectations. The network could be never 100% "sure"

4. No reason

Check the code!

Most of those reasons are very easy to resolve and it is a good subject for a future article. Meanwhile, we can enjoy the results.

Enjoying the results

Now we can see what the network has learned. Following code fragment shows how to use trained neural network in your OCR application.

//Get your input data
 double[] aInput = ... (your digitized image of the character)
//Load the data into the network
for (int i = 0; i< backpropNetwork.InputNodesCount;i++)
    backpropNetwork.InputNode(i).Value = aInput[i];
//Run the network
backpropNetwork.Run();
//Get result from the network and convert it to a character
return Convert.ToChar(aFirstChar + backpropNetwork.BestNodeIndex).ToString();

In order to use the network you have to load your data into input layer. Then use the Run method to let the network process your data. Finally, get your results out from output nodes of the network and analyze those (The BestNodeIndex property I created in OCRNetwork class does this job for me).

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

About the Author

Alex Cherkasov
Web Developer
United States United States
Member

Senior System Architect with 11+ yrs experience. Masters degree in CS (PhD is
in progress). Experienced in design and implementation of enterprise wide computer systems,
n-Tier applications, application servers, computer vision, image analysis and
AI. Provide project design, development and management consulting services
through owned company http://www.xpidea.com


Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralHelp needed....memberdeepak777715 Dec '09 - 18:44 
Can u pls send me the code for reading multiple charactors from an image......
pls....
GeneralRe: Help needed....memberKayode Nubi24 May '10 - 20:46 
hello, did you get a response as regards this inquiry because I also have the same inquiry
 
Greetz
Nubi

QuestionGreat article but...membervbguyny11 May '09 - 10:43 
It is useless unless there is a way to extract the OCR string from an image. I see that it works great with a single character, but now the issue is how to grab multiple characters from a single image!
 
Has anyone else been able to do this?
 
-Mike
QuestionOcr with java........memberEti Roy13 Sep '08 - 0:43 
How do i make an Ocr software using java..?
AnswerRe: Ocr with java........memberscanreg28 Jan '09 - 3:57 
Same here....
QuestionWhat about MNIST?memberjuicy_emad18 Jul '08 - 14:09 
Hi 2 all!
MNIST is a base of handprinted characters: http://yann.lecun.com/exdb/mnist/index.html[^].
My project (Character Recognition by BRAIN-LAB[^] has 97.38%.
At whis project i use MLP neural net (ex. ~3000x50x10 neurons config at numbers-net ... just see Demo[^] ).
 
So... what about yours project at MNIST?
QuestionLiscence?memberBarbarrosa11 May '08 - 4:38 
What type of liscence is attached to this project?
Generalthanks :)member*Jori*22 Apr '08 - 12:46 
you are serve me,great service.. Wink | ;)
 
thank you very much Big Grin | :-D
 
*Jori*
GeneralDocument of Neuro.Net librarymemberToro_Sun16 Apr '08 - 17:00 
Hi ,
I'm finding document of Neuro.NET library to perfect my project about Vietnamese Character Recognition . If you have it , could you help me ?
Thanks,
QuestionCan you give me the right solution as the Demo?memberMember 356308815 Apr '08 - 22:12 
Hi , the code was different from the demo. Can you give me the right solution as the Demo?
 
Bach Khoa

GeneralHey guy, the code was different from the demomemberMr. Cencious27 Mar '08 - 15:50 
Hey guy, the code was different from the demo. Can you provide the right solution as the Demo?
 
www.socbay.com - Best data center - search engine in Vietnam

GeneralRe: Hey guy, the code was different from the demomemberibnkhaldun1 Jan '12 - 18:27 
hmmm
Questionhow to read text on a image through ocrmemberharivinod25 Jan '08 - 14:16 
hi i need some help
 
I am doing my b.tech project and i need to read text present on a image using ocr how can i do that a small code will help me a lot plz help me
 
with regards
 
harivinod

AnswerRe: how to read text on a image through ocrmemberPrasun Roygupta28 Feb '08 - 20:14 
Hi Harivinod,
 
I am doing some work on that you r saying, i.e. I need to read text present on a high resolution image for my project. so if you'll get any idea please reply me, it'll be helpful for me.
 
Thanks & Regards,
 
Prasun
GeneralPrinted chinese charactersmemberAmanda_Lau7 Jan '08 - 19:54 
I am final year student doing OCR by using chinese characters as my data input.I am using backpropagation method to test the input.I just want to know where i can get my data because it is kind of hard to get the data set.I need few chinese characters in binary representation.
 
I'm hoping for a reply soon.
It's urgent.
 
Thank you and regards from me.
Questionwhat are pros and cons in current ocr..membermurali48824 Aug '07 - 21:39 
hi i am murali,
i would like to know in detail about the pros and cons of ocr.
tell me how can i make better efficient than current one..
suggest me some ideas..
Questionhow to compile source code in Visual studio-6member74yrsold10 Jun '07 - 7:49 
I am newbie(74 yrs old) I have installed visual Studio 6.
Downloaded source code and created Exe using VC++6 - but failed to run.
Kindly guide me step by step how to create exe file and run.
If suceeded, I wanted to experiment with Indian language(Kannada).

AnswerRe: how to compile source code in Visual studio-6memberAlex Cherkasov10 Jun '07 - 10:34 
Wink | ;-) Wow! … I always amazed and proud of people striving to learn new things, esp. at your age!
To compile this project you’d need to install Visual Studio 2003 (A.K.A Visual Studio 7 with .NET framework 1.1)- it will let you compile and run it. I’m not sure if VS 6.0 supports .NET framework.


GeneralSource Code is not perfect !memberAlireza . Shirazi14 May '07 - 20:26 
Hi
The source code doesn't have any button at the tabpage4 !!!!
 
Learning without thinking is wasting of time

GeneralZoning and BlockingmemberGuy00713 Mar '07 - 23:43 
Hi Alex,
 
Great article!
 
Do you know where i can find some information on how to implement the zoning and blocking parts for an OCR (before I get to this stage)?
 
thanks
GeneralRe: Zoning and Blockingmembermichael_121316 Jul '07 - 18:42 
I've purchased complete Delphi 5 source code of OCR library here:
http://www.xpidea.com/tabid/53/ProductID/10/Default.aspx[^].
 
The source code is "almost" free (I think it’s cheaper than any other book on Image analysis), although in my opinion it is very valuable since has implementation of page segmentation, feature extraction and many classification techniques.

 
Sincerely, Mike
GeneralLicense numbermemberGrimmsimon23 Jan '07 - 2:41 
Hello to everyone,
 
can I use this sample to create a program,
I want to load a picture in this porgram and this program scan the picture and write if there is a License number in a textbox or something else?
 
Greetz Simon
 
Sorry for my very bad english
GeneralRe: License numbermemberarorahere31 May '07 - 1:02 
hey i am also looking for the same program concept have you found something relevant to that, than please mail me at arorahere@yahoo.co.in
 
thanks in advance for your effort....
bye take care......
 
raman
QuestionOCR Project.memberPrasanna Vignesh1 Dec '06 - 20:00 
Hi Alex,
 
I am a final year Engineering student. I am doing OCR Based Project for my final year project work. I go thru your article about the OCR by Neural Networks.
 
I also go thru the complet sample source code.
 
I have a question. It is possible to get all the text present in a image. By modifing you code.
 
Here what i done is i create trainning patterns for all the number,Alphabets and Symbols and save it as a file.
 
I need to do the follow.
 
Load a image file from a physical drive to the application, then need to get all the text present in that image. It is possible to get by using the class library you given in the download.
 
Give me some hints to do my final Project work.
 
Thank you in advance,
Please Replay me ASAP....
 

With Regards,
 
R.Prasanna Vignesh
India.
 
R.Prasanna Vignesh

QuestionRe: OCR Project.membermythily1328 Aug '07 - 10:27 

Hi Prasanna,
 
I am a new user to this forum.I am a final year computing student.
 
I just happened to come across through ur mail.Incidentally I have also chosen to do Optical character recognition project for my final year which is same as u described in this thread.It is about detecting and recognising texts in images ..............and additionally translating the retrieved text to the target language(say from chinese to english)
 
I would like to know if this project is possible....or too difficult to do within 8 months......????
 
Please do reply....am in urgent need of some suggestion.....thought u could advice me better as u hv done a similar project.
 
Best Regards,
Vashini

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 2 Sep 2004
Article Copyright 2003 by Alex Cherkasov
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid