Click here to Skip to main content
Click here to Skip to main content

Optical Character Recognition

By , 14 Oct 2012
 

Download demo project - 37.5 Kb 

Introduction   

The OCR (Optical Character Recognition) algorithm relies on a set of learned characters. It compares the characters in the scanned image file to the characters in this learned set. Generating the learned set is quite simple. Learned set requires an image file with the desired characters in the desired font be created, and a text file representing the characters in this image file.

In the below discussion the learned set is in xml format. This learned set is basically coordinates related information which will be explained in below article.

The below article describes the OCR recognition character example. Generating the learned set for different font style and sizes will be described in my next article. In this article we have already generated learned character set for font style verdana and font size 8px.      

Background

Four basic algorithms
• Image labelling.
• Finding boundary and Generating X, Y coordinate pixel array.
• Matching connected pixels with learned set (.xml).
• Forming words.

Image labeling algorithm:
It uses the Two-pass algorithm, which relatively simple to implement and understand, the two-pass algorithm iterates through 2-dimensional, binary data. The algorithm makes two passes over the image: one pass to record equivalences and assign temporary labels and the second to replace each temporary label by the label of its equivalence class.  

Connectivity checks are carried out by checking the labels of pixels that are North-East, North, North-West and West of the current pixel (assuming 8-connectivity). 4-connectivity uses only North and West neighbors of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current.   

The above enlarged image is a pixel representation which serves purpose for our discussion. Every pixel in the bitmap image is represented by its X and Y coordinates. The letter "B" in the above example shows how all the pixels are connected. The image labeling algorithm will label the entire connected pixel with the same label. The UML diagram below illustrates the flow of the algorithm.

On the first pass:

 

The below example illustrates how the image labeling algorithm perform as per the above flow chart.

  • The array from which connected regions are to be extracted is given below (8-connectivity based)
  • After the first pass, the following labels are generated. Total of 9 labels are generated in accordance with the conditions highlighted above. I have shown 8 labels below. The background of the "Basic" in the image is one label. But I have not shown it in the below image since it gets discarded as we only match connected component of max 10 x 10 dimensions.

 

The label equivalence relationships generated are

Set ID Equivalent Labels
1 1
2
3 3,7
4 4,8
5 5
6 6
7 3,7
8 4,8

On the second pass:
The UML diagram below shows how the connected pixel, whose labels are not same, is assigned the lowest label value from the Equivalence record. In the end all the connected component will have the same label. Character "B" will have one label i.e. 2 and character "a" will have label 3.Once we get the relabel the distinct labels with the available lowest label value from the equivalence record we get one complete connected component. Each character "B", "a" etc will have distinct connect component. The character "i" has extra dot above so the Second pass algorithm also looks for extra dot above and below connected component. So extra dot of "i" is also will be joined with label 5.

 

Finding boundary and Generating X, Y coordinate pixel array:

 

From the labels from the above algorithm, then its merely adding all the connected X, Y coordinates in the connect component list. The above image shows all the connected component boundary which marked in yellow. I have highlighted the boundary (X, Y) coordinates of the connected component "a".

  • LeftXCor: - Starting left X coordinate of the connected component. For the connected component "a" it is 9.
  • RightXCor: - Ending left X coordinate of the connected component. For the connected component "a" it is 13.
  • TopYIndex: - Starting or the lowest Y coordinate of the connected component. For the connected component "a" it is 4.
  • BottomYIndex: - Ending or the highest Y coordinate of the connected component. For the connected component "a" it is 9.
  • Width: - Width of the connect component will be RightXCor – LeftXCor.In case of "a" it will be 13 - 9 = 4.But add one since it start from zero so the width will be 5. Height:-Similarly height of the connected component will be BottonYCor – TopYCor.In this case for "a" the height will be
  • 6. PixelCoordinate [,]:- As per the height and width of the connected component initialize the two dimensional array. For "a" it will be [5, 6].For the appropriate connected pixel coordinate set the bit high. For e.g. for the connected component "a" (9, 4) coordinate there is no connected pixel so set [0,0] to false. Since (9, 4) is the starting X, Y coordinate, so it is (0, 0).Similarly for (13, 9) there is a connected coordinate so [4, 5] is set to true. Similarly for the entire connected component X and Y coordinates.

Explaining data in xml.

 

    <characterinfo>
     <ParamValue>a>⁄ParamValue>
     <PixelInfo>
      (0,3)(0,4)(1,0)(1,2)(1,5)(2,0)(2,2)(2,5)(3,0)(3,2)(3,5)(4,1)(4,2)(4,3)(4,4)(4,5)
     <⁄PixelInfo>
    <⁄characterinfo>
    

Let take the above bitmap image and pixel information in xml for character "a". As we see the boundary in yellow line. In above diagram the first pixel coordinate (0,0) where X and Y coordinates are zero. As explained earlier the boundary conditions. The properties and their values are listed below.

  • LeftXCor: - Starting left X coordinate of the connected component. For the connected component "a" it is 0.
  • RightXCor: - Ending left X coordinate of the connected component. For the connected component "a" it is 4.
  • TopYIndex: - Starting or the lowest Y coordinate of the connected component. For the connected component "a" it is 0.
  • BottomYIndex: - Ending or the highest Y coordinate of the connected component. For the connected component "a" it is 5.
  • Width: - Width of the connect component will be RightXCor – LeftXCor.In case of "a" it will be 4 - 0 = 4.But add one since it start from zero so the width will be 5.
  • Height:-Similarly height of the connected component will be BottonYCor – TopYCor.In this case for "a" the height will be 6.

Note the connected X, Y coordinate in the xml.For "a" (0,3)(0,4) etc pixels are high so they are noted down.X,Y coordinates whose pixels are not high they are not noted.The tag <pixelinfo> represent the pixel coordinates whose pixels are high. The <ParamValue> tag has the character value "a".

Matching character:
Finally this most easy task. We match the connect component bit array with the xml data. Each pixel are matched according the X, Y coordinates. The fully matched pixels coordinates is the matched character from the xml.

Forming words.:
As per the above example "Basic". We maintain the LeftXindex and RightXindex for each character. The LeftXindex represent the left most index of the character in the bitmap specified initially in the blog. The RightXindex represent the right most X coordinate of the character. When the difference coordinates of current character and previous character is less than 3 pixels then they are joined. This algorithm is quite simple. But you can extend to join words according to grammar in the dictionary.

I have attached the demo exe with the sample image. In the demo app just browse the image and click submit. The grid displays all the character with coordinates.

Reference
1. Artificial Intelligence and cognitive science © 2006, Nils J. Nilsson Stanford AI Lab http://ai.stanford.edu/~nilsson
2. Neural Networks and Fuzzy Logic.
3. http://www.cs.berkeley.edu/~fateman/kathey/char_recognition.html
4. http://en.wikipedia.org/wiki/Optical_character_recognition

5 .http://en.wikipedia.org/wiki/Connected-component_labeling

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Vijay Rajan Nadar
Technical Lead
India India
Member
Web Developer

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralVery useful. Thank you.memberdinradhika5 Dec '12 - 19:57 
Posting source code will be much useful.
GeneralMy vote of 5mvpKanasz Robert6 Nov '12 - 2:32 
good one, but it would be a great if you post source code!
GeneralMy vote of 5membersandeep_deshmukh18 Oct '12 - 17:29 
Well written article. Looking forward to get some more on this article.
GeneralMy vote of 3memberSayehboon14 Oct '12 - 22:17 
good
QuestionYour algorithm is good for Printed Characters of a specific FontgroupGrasshopper.iics14 Oct '12 - 12:57 
However by changing the type from Normal to Bold/Italic or to arial, connected neighbors changes. So Characters are not invariable in spectral domain no matter what you do. The invariancy of characters are found best in Polar domain. So best matching practice would be to first segment the characters vertically and then normalize them. You take polar transform or Curvelet transform and then run HMM. If you build that , it can recognize any character pattern of any language ( No idea about yuktakshara of Devnagari).
 
Do provide the code, you will get valuable insights from members here.
AnswerRe: Your algorithm is good for Printed Characters of a specific FontmemberMember 324023415 Oct '12 - 6:08 
Thanks for going through my article.Yes you are right the above example is based on the learned character set for font style verdana and font weight 8px.So when the font size changes we need characater set in xml for that font size.But need to explore the idea about polar transform and HMM.Haven't tried that.I will upload the source code soon probably this week.
QuestionSourcecode ?memberSerge Desmedt14 Oct '12 - 9:15 
Is there no sourcecode available?
 
As you reference artificial intelligence, neural networks and fuzzy logic, I'm curious to see where and how you applied these in this application.
AnswerRe: Sourcecode ?memberVijay Rajan Nadar14 Oct '12 - 12:50 
Thanks for your interest.Source will available soon probably next week.Artificial intelligence and fuzzy logic gave me the idea but the code is based on the Wikipedia link for connected component labeling.8 Pixel neighborhood search this idea i got from fuzzy logic.Later similar concept was mentioned in wiki.
GeneralRe: Sourcecode ?memberSergey Durnov14 Dec '12 - 21:24 
Hey, are you still going to public source?

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 14 Oct 2012
Article Copyright 2012 by Vijay Rajan Nadar
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid