Click here to Skip to main content
12,292,969 members (73,662 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


221 bookmarked

A C# Project in Optical Character Recognition (OCR) Using Chain Code

, 22 Feb 2011 CPOL
Rate this:
Please Sign up or sign in to vote.
An article that looks to use chaing code to do optical character recognition

Introduction: What is OCR?

OCR stands for optical character recognition i.e. it is a method to help computers recognize different textures or characters

OCR are some times used in signature recognition which is used in bank

And other high security buildings

In addition, texture recognition could be used in fingerprint recognition

OCR's are known to be used in radar systems for reading speeders license plates and lot other things

A Detailed Look on the OCR Implementation and its use in this Paper

The goal of Optical Character Recognition (OCR) is to classify optical patterns (often contained

A digital image) corresponding to alphanumeric or other characters. The process of OCR

Involves several steps including segmentation, feature extraction, and classification. Each of

These steps is a field unto itself, and is described briefly here

Implementation of OCR.

One example of OCR is shown below. A portion of a scanned image of text, borrowed from the

Web, is shown along with the corresponding (human recognized) characters from that text.


Of descriptive bibliographies of authors and presses. His ubiquity in the broad field of bibliographical and textual study, his seemingly complete possession of it, distinguished him from his illustrious predecessors and made him the personification of bibliographical scholarship in his time.

Figure 1: Scanned image of text and its corresponding recognized representation

A few examples of OCR applications are listed here. The most common for use OCR is the first

Item, people often wish to convert text documents to some sort of digital representation.

  1. People wish to scan in a document and have the text of that document available in a word processor.
  2. Recognizing license plate numbers
  3. Post Office needs to recognize zip codes

Other Examples of Pattern Recognition

  1. Facial feature recognition (airport security) – Is this person a bad-guy?
  2. Speech recognition – Translate acoustic waveforms into text.
  3. A Submarine wishes to classify underwater sounds – A whale. A Russian sub? A

Friendly ship?

The Classification Process

(Classification in general for any type of classifier) There are two steps in building a classifier:

Training and testing. These steps can be broken down further into sub-steps.

  1. Training
    1. Pre-processing – Processes the data so it is in a suitable form for…
    2. Feature extraction – Reduce the amount of data by extracting relevant

      Information—usually results in a vector of scalar values. (We also need to

      NORMALIZE the features for distance measurements!)

    3. Model Estimation – from the finite set of feature vectors, need to estimate a model

      (Usually statistical) for each class of the training data

  2. Testing
    1. Pre-processing
    2. Feature extraction – (both same as above)
    3. Classification – Compare feature vectors to the various models and find the

    Closest match. One can use a distance measure


OCR – Pre-processing

These are the pre-processing steps often performed in OCR

  • Binarization – Usually presented with a grayscale image, binarization is then simply a matter of choosing a threshold value.
  • Morphological Operators – Remove isolated specks and holes in characters, can use the majority operator.
  • Segmentation – Check connectivity of shapes, label, and isolate.

Segmentation is by far the most important aspect of the pre-processing stage. It allows the

Recognizer to extract features from each individual character. In the more complicated case of

Handwritten text, the segmentation problem becomes much more difficult as letters tend to be

Connected to each other.

OCR – Feature Extraction

Given a segmented (isolated) character, what are useful features for recognition?

  1. Moment based features

Think of each character as a PDF. The 2-D moments of the character are:


From the moments, we can compute features like:

  1. Total mass (number of pixels in a binarized character)
  2. Centroid - Center of mass
  3. Elliptical parameters
    1. Eccentricity (ratio of major to minor axis)
    2. Orientation (angle of major axis)
  4. Skewness
  5. Kurtosis
  6. Higher order moments
  7. Hough and Chain code transform
  8. Fourier transform and series

There are different methods for feature extraction or finding an image descriptor, these methods lie into two categories

  1. one which uses the whole area of the image
  2. an other that uses the contour or edges of the object

All the above methods uses the contour of the object to collect the object’s features.


The algorithm we needed for this OCR had to satisfy requirements

  1. It must faithfully preserve the information of interest
  2. Its must permit for compact storing and convenient retrieval
  3. It must facilitate the required processing
  4. Scaling invariant
  5. Easy to implement

And so we decided to implement for this OCR in particular is the chain code

Which is also known as Freeman’s chain code.

Freeman’s chain code is one of the best and easiest methods for texture recognition

Freeman designed the chain-code in 1964 (also, the chain code is known to be a good method for image encoding but here we are using it as a method for feature extraction)

Although the chain code is a compact way to represent the contour of an abject yet is has some serious draw back when used as a shape descriptor

Chain-code Drawbacks

  1. It works only on contoured shapes (which in our case means characters should be inputted in bold font )
  2. it is so sensitive to noise as the errors are cumulative
  3. The starting point of the chain code, and the orientation and scale of a contour affects the chain code.

Therefore, the chain correlation scheme, which can be used to match two chains, suffers from these drawbacks

N.B: How ever the scaling draw back was partially overcame

The Freeman’s chain-code implementation process for character recognition?

First step as discussed before is the pre processing:

The only preprocessing step we will discuss in this paper is the edge detection or contouring.

First, what does a contoured object mean?

Well it means an object with edges only.

How to obtain a contoured object of the BOLD character object?

The concept of obtaining the edges was applied in reverse manner

I.e. if we can detect the filling and remove it then the remains are the edges.

So all we need is to find a unique property of the filling and apply this

Filling property:

  • All eight surrounding pixels are black


	WHILE (! end of image)
Search original image for black pixel
	If (the eight surrounding are black)
	      This pixel is filling and we should remove it		
Go to next-pixel

So by implementing the above code in an image like this


Second step is the feature extraction process (using freeman’s chain-code)

There are many methods to implement the chain-code and all of them lead to the idea of partitioning the target objects

And so the one we used is based on the idea of partitioning the object into tracks and sectors and then apply the chain-code in each sector for getting the pixels relations and saving it on a file.

How do we achieve this level of partitioning on the contoured object and extracting the feature vector it?

Achieving a sectors tracks partitioned object and extracting the feature vector from it involves applying the following steps

  1. get the Centroid - Center of mass
  2. find the longest radius
  3. getting the track-step
  4. virtually divide the object into tracks using the track-step(which is based on the default number of tracks used i.e. the same number of tracks must be used for both training and testing)
  5. getting the sector step
  6. divide those virtual tracks into equal sectors using the sector-step (which is based on the default number of sectors used i.e. the same number of sectors must be used for both training and testing)
  7. find relations between adjacent pixels
  8. putting all the features together

The above steps will be discussed later in greater details

A DETAILED LOOK ON HOW TO achieve this level of partitioning of the contoured object.


The center of mass of any texture (in our case character) is driven by the following equation:

    Xc=∑x/ ∑∑f(x, y)
Yc= ∑y/ ∑∑f(x,y)

In English, this means that the X coordinate of the centroid is the sum of all the positions of x coordinate of all the pixels in the object

Divided by the number of pixel of the object


So the Xc for the above image= (0+1+2+2+3)/5

Therefore, the Y coordinate of the centroid is the sum of all the positions of y coordinate of all the pixels in the object divided by the number of pixel of the object


So the Yc for the above image= (3+2+2+2+1)/5

And so by applying the above rule on a character we get



To get the longest radius we have to calculate the distance between the centriod (center of mass) and every other pixel of the contoured object and find the maximum length and this would be the longest radius

How to get the distance between to pixels?

Thanks to Pythagoras who invented Pythagoras theorem, which gives us the distance between any, two points and it states:

"The distance between any two points equals

Sqrt ((Xc-Xi)2 +(Yc-Yi) 2)"



What is the track-step?

The track step is the distance between any two adjacent tracks, which will be used to identify the pixels position i.e. in which track

How to get the track step?

The track step equals max-radius dived by the predefined number of tracks i.e. (Track_Step=M-radius/No’ of tracks)

In our case, the number of tracks we used was five


(Which is based on the default number of tracks used i.e. the same number of tracks must be used for both training and testing)

This way we can identify in which track a pixel lies in.


Based on the number of sectors you decide to use

The (SECTOR_STEP =360/No’ of Sectors).

The sector step will be used to know under which sector does a pixel lie.


(Which is based on the default number of sectors used i.e. the same number of sectors must be used for both training and testing);

In other words using the sector-step and the track-step we can identify under which sector and which track a pixel lies.

A detailed look on how to identify a pixel lies on which sector:

First, we have to get Ө, which is the angle between the pixel and the x-axis



Then pixel would lie at sector=Ө/sector-step


Finding pixel relation is by far the most important and easiest step in the feature extraction step since it some how describes the shape.

How do you extract relations?

To extract relations between pixels we follow this algorithm in


For every pixel surrounding the target pixel
  Moving clock-wise from north direction
  If a pixel is, present surrounding the target pixel
Store its position
Go to next target pixel
 End if

By now, you should be able to identify for any pixel in which track and sector it lies on and its relation with its neighboring pixels.

By the way, those are the image features;


The feature vector is simply the collection of the features of every pixel sorted by the track i.e. check the position of each pixel and add its properties to the feature vector. E.g. if we have two pixels lie in sector 1 and track 1 with the same relation of 4 then in cell4 in the relations table the one in t1 and s1 will be incremented twice since there are TWO pixels having this property


Freeman’s chain code drawbacks that we were able to solve:

Actually, the only draw back that was solved was the scaling problem and it was partially solved (i.e. up scaling only).

How to over come the up scaling draw back of the chain code?

It has been solved by dividing the feature vector by the number of pixels in the object so this way the number of pixels (size of character) has no effect on the feature vector.


Classification is the process of identifing the unknown object

There are a number of classifiers availible that can be used such as

  1. Neural networks
  2. Support vecotor machines
  3. K-nearest neighbor
  4. Eculidian distance

In this article we discuss eculidian distance which is a variation from the knn

Simply the eculidian distance is calculating the distnace between the relations

public double get_distence(cfeature_vector vec2)
	double x=0,y=0,z=0,zf=0;
	for (int i=0;i<6;i++)
		for (int j=0;j<4;j++)
			for (int k=0;k<8;k++)
				z=x-y ;
				zf+=z ;
return zf ;


The recognition rate for character images of the same font used (Arial) of up scaling is almost 100% correct

How ever, for down scaling the recognition rate is very poor

When we tested our OCR on hand written there where two VI observations that affects the recognition rate

1st people tend to use different fonts than the on it’s trained

2nd objects with curves and like character “C” &”O”&”Q” ESPECIALLY were not recognized at all perhaps because most people used very bad handwriting

The handwritten test data we used where from two different sources

  1. We asked people to draw the characters using the paintbrush and so we got so little number of volunteers actually one and my self.
  2. the 2nd source was hand written letters on a piece of paper and we had approximately 7 samples of 3 different hand written letters


Correctness Rate for the Different Sources

For the first source the recognition rate came to its peek of 75% correctness (on a well-written letters <”neat writing”> but on the other samples the recognition rate was approximately 62%

As for source, number two the recognition rate was poorer than what was expected, it was approximately 57% for most of the samples


The following is teaching sample the engine was originally trained for:





This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

No Biography provided

You may also be interested in...

Comments and Discussions

Questionmonotype cursiva Pin
salimkadri5-Mar-14 2:09
membersalimkadri5-Mar-14 2:09 
Questionplease reply Pin
Member 1045418311-Feb-14 16:21
professionalMember 1045418311-Feb-14 16:21 
Questionpleasee help Pin
Member 1045418314-Jan-14 16:38
professionalMember 1045418314-Jan-14 16:38 
AnswerRe: pleasee help Pin
Hussein El Saadi9-Feb-14 11:41
memberHussein El Saadi9-Feb-14 11:41 
Questionplease help Pin
Member 1045418314-Jan-14 16:26
professionalMember 1045418314-Jan-14 16:26 
GeneralMy vote of 5 Pin
al13n12-Jan-14 19:17
memberal13n12-Jan-14 19:17 
Questionhow to use the application?? Pin
Member 103613981-Dec-13 20:19
memberMember 103613981-Dec-13 20:19 
QuestionPlease Help! Pin
Member 1034962621-Oct-13 4:06
memberMember 1034962621-Oct-13 4:06 
AnswerRe: Please Help! Pin
Hussein El Saadi27-Oct-13 9:02
memberHussein El Saadi27-Oct-13 9:02 
GeneralRe: Please Help! Pin
Member 1034962619-Nov-13 7:10
memberMember 1034962619-Nov-13 7:10 
QuestionMy Vote is 5 Pin
thaer.sqor5-Mar-13 14:37
memberthaer.sqor5-Mar-13 14:37 
QuestionHelp Pin
thaer.sqor5-Mar-13 14:33
memberthaer.sqor5-Mar-13 14:33 
how to add more characters

What are the steps to add more characters to the c:\ocr directory.
How do I train it..

modified 5-Mar-13 20:39pm.

QuestionCan i port this code to work with hardware !! Pin
Sameh El-Hakim12-Feb-13 11:15
memberSameh El-Hakim12-Feb-13 11:15 
AnswerRe: Can i port this code to work with hardware !! Pin
Rohit Dubey from Hyderabad15-Feb-13 5:45
memberRohit Dubey from Hyderabad15-Feb-13 5:45 
GeneralRe: Can i port this code to work with hardware !! Pin
Sameh El-Hakim15-Feb-13 6:59
memberSameh El-Hakim15-Feb-13 6:59 
QuestionMy Vote is 5 Pin
Tauqeer Ul Islam5-Jan-13 3:12
memberTauqeer Ul Islam5-Jan-13 3:12 
GeneralMy vote of 5 Pin
Grasshopper.iics26-Oct-12 19:19
groupGrasshopper.iics26-Oct-12 19:19 
GeneralMy vote of 1 Pin
programmerdon24-Sep-12 6:28
memberprogrammerdon24-Sep-12 6:28 
Questionhelp please Pin
jebryns2-Sep-12 22:03
memberjebryns2-Sep-12 22:03 
QuestionHow to train the system for separate characters Pin
champike kegalle12-Aug-12 19:17
memberchampike kegalle12-Aug-12 19:17 
GeneralMy vote of 1 Pin
Mr Pham2-Jul-12 23:23
memberMr Pham2-Jul-12 23:23 
QuestionNeed help Pin
Member 177136210-Jun-12 23:09
memberMember 177136210-Jun-12 23:09 
QuestionPlease Help Pin
michel chanaa26-May-12 6:32
membermichel chanaa26-May-12 6:32 
GeneralMy vote of 5 Pin
manoj kumar choubey11-Apr-12 2:54
membermanoj kumar choubey11-Apr-12 2:54 
GeneralMy vote of 4 Pin
Hassanin20-Mar-12 1:50
memberHassanin20-Mar-12 1:50 
GeneralMy vote of 5 Pin
a7med3laa12-Mar-12 9:51
membera7med3laa12-Mar-12 9:51 
GeneralMy vote of 5 Pin
Xmar17-Jul-11 16:52
memberXmar17-Jul-11 16:52 
QuestionConfused - how to add more characters Pin
Supervan229-Jun-11 3:53
memberSupervan229-Jun-11 3:53 
AnswerRe: Confused - how to add more characters Pin
Supervan229-Jun-11 10:56
memberSupervan229-Jun-11 10:56 
QuestionCan this be modified to OCR dollar bills Pin
kristopher maillard23-Mar-11 11:36
memberkristopher maillard23-Mar-11 11:36 
AnswerRe: Can this be modified to OCR dollar bills Pin
Hussein El Saadi26-Mar-11 10:15
memberHussein El Saadi26-Mar-11 10:15 
GeneralRe: Can this be modified to OCR dollar bills Pin
saad3242-Apr-13 23:58
membersaad3242-Apr-13 23:58 
GeneralRe:help!!! Pin
saad3242-Apr-13 23:58
membersaad3242-Apr-13 23:58 
Questionhow to run the prog. [modified] Pin
nona44418-Mar-11 23:58
membernona44418-Mar-11 23:58 
AnswerRe: how to run the prog.he Pin
Hussein El Saadi22-Mar-11 11:14
memberHussein El Saadi22-Mar-11 11:14 
GeneralRe: how to run the prog.he Pin
Member 783682918-Apr-11 21:20
memberMember 783682918-Apr-11 21:20 
Generalwant more details... Pin
Ahmed.lpo15-Mar-11 11:56
memberAhmed.lpo15-Mar-11 11:56 
GeneralThanx Pin
sajithdilhan12-Mar-11 1:41
membersajithdilhan12-Mar-11 1:41 
QuestionWhat does the demo code do? Pin
ACanadian1-Mar-11 4:11
memberACanadian1-Mar-11 4:11 
AnswerRe: What does the demo code do? Pin
Hussein El Saadi22-Mar-11 11:16
memberHussein El Saadi22-Mar-11 11:16 
Generaldownload link is working now Pin
Hussein El Saadi25-Feb-11 2:24
memberHussein El Saadi25-Feb-11 2:24 
GeneralMy vote of 5 Pin
sean133223-Feb-11 4:47
membersean133223-Feb-11 4:47 
GeneralCan't download source Pin
nixite23-Feb-11 0:24
membernixite23-Feb-11 0:24 
GeneralRe: Can't download source Pin
nixite23-Feb-11 3:59
membernixite23-Feb-11 3:59 
GeneralRe: Can't download source Pin
Member 109233763-Jul-14 16:01
memberMember 109233763-Jul-14 16:01 
GeneralMy vote of 5 Pin
Slacker00723-Feb-11 0:13
memberSlacker00723-Feb-11 0:13 
GeneralCan't download Souce Code Pin
Theingi Win22-Feb-11 20:09
memberTheingi Win22-Feb-11 20:09 
GeneralRe: Can't download Souce Code Pin
knoami22-Feb-11 22:03
memberknoami22-Feb-11 22:03 
GeneralRe: Can't download Souce Code Pin
Theingi Win22-Feb-11 22:10
memberTheingi Win22-Feb-11 22:10 
GeneralMy vote of 5 Pin
Basarat Ali Syed22-Feb-11 19:01
memberBasarat Ali Syed22-Feb-11 19:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160518.1 | Last Updated 22 Feb 2011
Article Copyright 2011 by Hussein El Saadi
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid