Click here to Skip to main content
Click here to Skip to main content

OCR Line Detection

By , 10 Jul 2010
 

Introduction

One of the first steps in developing OCR systems is line detection. Farsi/Arabic text has some properties which make them difficult to recognize. For example, there are characters in Farsi like "i" in English which has two parts but are recognized as one character. And I have covered this problem in the following code.

Background

The reader is assumed to have basic GDI skills and knowledge of elementary concepts of image processing.

Using the code

First of all, you should take it into account that this algorithm does not detect lines of characters covered vertically by a line like in the image below:

NotRecognizable.png

The algorithm is so easy:

  • Threshold image
  • Consider horizontal projection of line of character as a continuous vertical line
  • Scan image from top to bottom and find the top and bottom of each vertical line from the previous phase
  • Because characters like ? are identified as two lines, we merge those lines whose distance to the next line is a fraction of their height
  • Save lines in the output directory

First, we should threshold the image. I used a trivial thresholding algorithm, but algorithms like the famous Otsu thresholding will result in a better image.

public Bitmap Threshold(Bitmap bitmap, int thresholdValue)
{
     byte thrByte = (byte)(thresholdValue);
     bitmap = ApplyFilter(new Threshold(thrByte), bitmap);
     bitmap = GetIndexedPixelFormat(bitmap);
     return bitmap;
}

In the second step, we try to project all black cells horizontally to extract the horizontal projection of the image. This will result in a discontinuous collection of black points which we consider the top and bottom of each collection, as the top and bottom of the line:

LineDetection.png

public List<Belt> ExtractBeltsBasedonCoveredHeight(Bitmap mehrImage)
{
    int y = 0;
    int x = 0;
    bool line_present = true;
    List<int> line_top = new List<int>(1000);
    List<int> line_bottom = new List<int>(1000);
    List<Belt> lines = new List<Belt>();
    while (line_present)
    {
        x = 0;
        y = FindNextLine(mehrImage, y, ref x);
        if (y == -1)
        break;
        if (y >= mehrImage.Height)
        {
            line_present = false;
        }
        if (line_present)
        {
            line_top.Add(y);
            y = FindBottomOfLine(mehrImage, y) + 1;
            line_bottom.Add(y);
        }
    }
   
    for (int line_number = 0; line_number < line_top.Count; line_number++)
    {
        int height = line_bottom[line_number] - line_top[line_number] + 1;
        Bitmap bmp = new Bitmap(mehrImage.Width, height + 2);
        FillImage(bmp, Brushes.White);
        bmp = GetSpecificAreaOfImage(
        new Rectangle(0, line_top[line_number] - 1, 
                      mehrImage.Width, height + 2), mehrImage);
        Belt belt = new Belt(bmp);
        belt.RelativeTop = line_top[line_number];
        belt.RelativeBottom = line_bottom[line_number];
        lines.Add(belt);
    }
    lines = RemoveNoisyData(lines);
    return lines;
}

To find the bottom and top of lines, I developed these two functions: FindNextLine, which finds the first black pixel of the next collection extracted from the horizontal projection, and FindBottomOfLine, which looks for the first white pixel with a Y dimension bigger than the top of the line.

public int FindBottomOfLine(Bitmap bitmap, int topOfLine)
{
     int x;
     bool no_black_pixel;
     no_black_pixel = false;
     while (no_black_pixel == false)
     {
         topOfLine++;
         no_black_pixel = true; 
         for (x = 0; x < bitmap.Width && topOfLine < bitmap.Height; x++)
         {
              if ((Convert.ToString(bitmap.GetPixel(x, 
                           topOfLine)) == Shape.BlackPixel))
              no_black_pixel = false;
         }
     }
     return topOfLine - 1;
}

public int FindNextLine(Bitmap bitmap, int y, ref int x)
{
      if (y >= bitmap.Height)
      return -1;
      while (Convert.ToString(bitmap.GetPixel(x, y)) == Shape.WhitePixel)
      {
          x++;
          if (x == bitmap.Width)
          {
              x = 0;
              y++;
          }
          if (y >= bitmap.Height)
          {
              break;
          }
      }
      return y < bitmap.Height ? y : -1;
}

Because characters like '?' are identified as two lines, we merge those lines whose distance to the next line is a constant fraction of their height:

private static List<Belt> RemoveNoisyData(List<Belt> belts)
{
   if (!Directory.Exists("temp"))
   {
        Directory.CreateDirectory("temp");
   }
   else
   {
        foreach (string file in Directory.GetFiles("temp"))
        {
              try
              {
                   //File.Delete(file);
              }
              catch
              { }
        }
  }
  for (int i = 1; i < belts.Count; i++)
  {
        if (belts[i].RelativeTop - belts[i - 1].BaseHorizontalLine - 
            belts[i - 1].RelativeTop < 
            Belt.UpAndDownWhiteSpaceRatio * belts[i].Height)
        {
              Image<Gray, Byte> img1 = new Image<Gray, byte>(belts[i].Image);
              Image<Gray, Byte> img2 = new Image<Gray, byte>(belts[i - 1].Image);
              Image<Gray, Byte> img3 = img2.ConcateVertical(img1);
              string path = @".\temp\" + System.Guid.NewGuid().ToString();
              img3.Save(path);
              belts[i - 1].Image = (Bitmap)Bitmap.FromFile(path);
              belts[i - 1].RelativeBottom = belts[i].RelativeBottom;
              belts[i - 1].BaseHorizontalLine = -1;
              belts.RemoveAt(i);
        }
  }
  return belts;
}

And ultimately, we save the images of the lines in the output directory.

Experimental results

I tested this algorithm for different fonts and sizes, including Mitra, TimesNewRoman, Arial, and Zar. For those without any noise, it works 96% percent, but for noisy samples, based on the noise ratio, we get different results which are not acceptable.

History

I have spent two years of my life developing an Open Source Farsi /Arabic OCR, and now I want to share some of my experiences here. If you are interested in developing Farsi/Arabic OCR, you can join the following group: farsi_arabic_OCR@groups.yahoo.com.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

mehran ghainian hasaruye
Iran (Islamic Republic Of) Iran (Islamic Republic Of)
Member
Hands-on .Net developer with 8 years of working experience, as C# developer, software designer, Test developer and architect including 3 years of part time and project based and near 5 years of full time job, contributing to and leading all phases of the software development life cycle (SDLC) for a wide variety of enterprise systems and Web-based applications, particularly within the Automation / Data mining /Insurance sectors. Highly skilled in application design, architecture and development with strong expertise in server side programming as well as in the complete range of .net technologies.
 
I have got the rank 301 among 500,000 in math & physics university entrance exam of IRAN in 2003 and I was member of national elites of IRAN for one year ,I got my BS Of Information Technology from Tehran University in 2009. now I am in spending the last semester of my Master degree in the field of Software engineering at Shahid Beheshti University .

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Questionnew project (other open source ocr project) [modified]memberreza161514 Oct '12 - 14:56 
Hi,
good job thank you for you codes
now some people work on other project that supports arabics and it's core is started by HP and Google. you can find more information in here and Persian Project is here

modified 5 Nov '12 - 14:38.

Questionآی.دی شما؟memberHamid Reza Niroomand29 Sep '12 - 9:50 
سلام
آقا مهران، آی.دی شما توی جی.میل چیه؟
 
اگر ممکنه یه جایی اعلام کن یا بهم ایمیل بزن باهاتون کار دارم
 
ممنون
hr.niroomand [at] gmail
AnswerRe: آی.دی شما؟membermehran ghainian hasaruye2 Oct '12 - 23:48 
براتون ایمیل کردم آیدی من ghainian@gmail.com هست
QuestionOCRmemberprograming7 Aug '12 - 19:15 
salam dooste aziz.khoshhalam ke shomaro peyda kardam.man khodam barname nevise c# hastam
va daram rooye projecte ocr vase zabane (BALOOCHI) kar
mikonam.rsmolkhatesh mesle farsi va arabi hast.mamnoon misham age
komakam konid.
 
behem email bezzan.mc
 
shahinpendar@gmail.com
AnswerRe: OCRmembermehran ghainian hasaruye29 Aug '12 - 18:48 
hi
i sent you an email to make a an online appointment
QuestionOcr line detectionmemberRaph Jojo3 Jul '12 - 6:24 
After the line detection using threshold algorithm have been done how to segment it to seperate the text into words, punctuation and space??
GeneralMy vote of 5membermanoj kumar choubey11 Apr '12 - 2:52 
Nice
GeneralMy vote of 5memberHossein_Hadi23 Feb '12 - 23:18 
Could you Send any OCR library if you have?
pleaaaaaaseee...
Thanks By "Hossein Hadi"
hadi.hossein.128@gmail.com
Questionhelp mememberMember 826846610 Nov '11 - 22:57 
i need to do segmentation character..which is vertically..how to do it?
i try change from yours but can't
tq
AnswerRe: help memembermehran ghainian hasaruye12 Dec '11 - 18:42 
i have tried lots of algorithms an segmenting characters
which takes lots of time to explain this is my gmail id
you can send me an email to make an internet appointment
for sharing knowledge via googletalk
Big Grin | :-D

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 10 Jul 2010
Article Copyright 2010 by mehran ghainian hasaruye
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid