Click here to Skip to main content
14,637,552 members
Rate this:
Please Sign up or sign in to vote.
I am trying to make image processing algorithm before proceeding to an OCR procedure.

Since the OCR fails. I needed to make a preprocessing in order to get better accuracy.

I am providing sample code used here and also the input and result image.

input image
result image

If you look close the dotted area is the problem.

Could you give me advise how to optimize the image processing? What kinds of procedures are needed to improve the OCR output?

What I have tried:

Here is my solution

Mat image = imread("address2.jpg");

Mat gray_image;
cvtColor(image, gray_image, CV_BGR2GRAY);

Mat bluredImage;
GaussianBlur(gray_image, bluredImage, Size(1, 1), 0);

Mat threshImage;
threshold(bluredImage, threshImage, 0, 255, THRESH_OTSU);

imwrite("result.jpg", asd);
Updated 5-Jan-17 4:19am
Rate this:
Please Sign up or sign in to vote.

Solution 1

It looks like the dotted area is shining through from backside. A solution could be to change the lighting of the card -- avoid lighting from the backside.

If you can't change the lighting, I would try to increase the size of the Gaussian blur to about twice the size of distance dot in that pattern. You will also have to use dynamic thresholding, i.e. the threshold value should vary across your image (you can prepare a corresponding mask and the subtract function for that). In general you are already on the right path. This is just a relatively difficult source image.
Rate this:
Please Sign up or sign in to vote.

Solution 2

Years ago I worked for a company doing similar to what you are describing. I used CorelDRAW and CorelTRACE and a script file to automate the process. I might be able to describe how to do this with C++ (not .net, and not Visual C++).

For your application:

In C++;

For this, do not use blur of any kind. For this, stay away from any and all blurs. Your directive should be to remove all extra from your image and then work with what remains. Blurring damages this process.

Remove all of the colors that are not black or close to it as being a dark version of some color. Do not do this via grey-scaling. Do not do this via lowering the bits per pixel and the color count.

Strip out the other colors one pixel at a time, converting all of the removed pixels to white RGB(255,255,255), not to transparent. For this process, write the code to include a slider that you can adjust how much of the darker colors that you keep.

Save the result as "cleaned.bmp" or something else that is obvious to you.

Copy the result to another new image and convert that to 2 color black and white.

Save the new image as "bw.bmp" or something that is obvious to you.

Use directly (or subclass and have your own executable control it) CorelTRACE (which is included with older versions (~2000AD) of CorelDRAW to "trace" the bw.bmp .

Adjust the traced image, inside of CorelDRAW, and clean up the edges. Combine nodes as much as you can without damaging the final image (this part I used to do via a script on images that were literally multi-occurrence damaged.).

Compare the traced bw.bmp to the cleaned.bmp to see how close you are getting. Use this comparison to adjust the "trace" in the CorelTRACE process.

The final result might shock you as to how extremely close to the original it is.

You should be able to use a common OCR on it easily.

If anywhere in this process the hardware or software that you are using or even your operating system is difficult for you to use, go back to pre-2004 hardware and/or software and/or operating system. Do not ask me why as I do not want to get flamed for telling the truth about what is being shoved onto the public today.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100