|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
This article is in the Product Showcase section for our sponsors at The Code Project. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.
IntroductionWithout proper preparation, converting a scanned image to text via OCR can produce substandard results. Without the proper tools, preparing a scanned image for OCR can be a complex, if not an impossible task. LEADTOOLS Document Imaging Suite solves both of these problems by providing functions that get the image from the scanner, removes nonessential elements that could cause problems for OCR and finally converts the image to a text or document format with the industry leader OCR engine. In this article I’ll create an application that allows you to choose a scanner, adjust the scanner’s settings to produce images optimal for OCR, scan from the device, OCR the documents, and save recognized text out to disk as a searchable PDF. EnvironmentThe sample in this article was compiled with Visual Studio 2005 using C# and LEADTOOLS Document Imaging Suite version 15 with the OCR PDF plug-in or the LEADTOOLS v15 .NET evaluation with OCR runtime. The CodeI’ve created a Windows application with three buttons to keep this simple.
Under the hood, the LEADTOOLS .NET classes perform the bulk of the work. We’ll walk through the code in the order that it’s executed, starting with the form load event.
I first unlock the support for some of the Document Imaging Suite features. These functions only have to be called once (typically in a startup routine) and the features they unlock are then available for the life of the process. If you are using the LEADTOOLS evaluation, you do not have to call these functions, as all functionality is available. Next we create each object. The OCR object ( For both the scanning object and OCR object, you must call the I link a function to the In this sample, we are saving out the text that the OCR object has generated from the images (recognized text) as a searchable PDF (PDF Image with text underneath). We also set the Each document clean object is then initialized to values that are optimal for most scanned bitonal images.
In the
If you would like to select a scanning device without showing this dialog, simply pass the name of the device for the parameter in the Before we begin the scan, you'll want to set up the scanner to produce images that are optimal for OCR. We set the X and Y resolution to 300 and set the bits per pixel to one, which essentially tells the scanning device to scan in black and white. Next, Here is the code that does what was just described:
At this point, the images are being scanned and the The
Lastly, I save the results from the OCR to disk. As you remember, I set up the OCR to output the results as a PDF file. The OCR will take the data in the recognition data file and convert it to a searchable PDF file.
ResultsBelow are the results of an image before cleaning versus a cleaned image from the LEADTOOL’s OCR:
ConclusionThis is just one way you can implement a workflow of scanning, cleaning, OCRing and archiving. Should you have barcodes or patch codes on the scanned document, you can use LEADTOOL’s barcode functionality to detect them in the Required Software to Build this SampleIn order to build an application such as this, you will need the following products: In order to run this sample on your machine, you can download LEADTOOLS 60-day fully functional evaluation. SupportHave questions about this sample? Contact our expert support team for free evaluation support!
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||