|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
This article is in the Product Showcase section for our sponsors at The Code Project. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.
IntroductionLEADTOOLS is the #1 imaging toolkit in the world and has earned its place on top by consistently delivering imaging components of the highest quality, performance and stability in a format that is “programmer friendly”. Developers are able to significantly reduce time-to-market for their applications, thereby maximizing productivity and ensuring the greatest possible return on investment. LEADTOOLS V16 has an all new design that greatly simplifies development without sacrificing control. One important enhancement is the set of high level .NET classes available for enabling Optical Character Recognition (OCR) of scanned images. This new architecture is intuitive, flexible and incredibly easy to follow. A programmer can enable image OCR functionality in as little as three lines of code, while maintaining the necessary level of control required by the specific application or workflow. In this article, we will introduce you to the key features of the new .NET OCR classes, provide you with a step-by-step approach for creating an OCR application, and provide you with sample code. Feel free to try it out for yourself by downloading a fully functional evaluation SDK from the links provided below. Key FeaturesLEADTOOLS provides methods to:
EnvironmentThe LEADTOOLS OCR .NET class library comes in Win32 and x64 editions that can support development of software applications for any of the following environments:
Samples provided will work in Visual Studio 2005 or Visual Studio 2008. How LEADTOOLS OCR WorksLEADTOOLS uses an OCR handle to interact with the OCR engine and the OCR document containing the list of pages. The OCR handle is a communication session between LEADTOOLS OCR and an OCR engine installed on the system. This OCR handle is an internal structure that contains all the necessary information for recognition, getting and setting information, and text verification. The following is an outline of the general steps involved in recognizing one or more pages. For a more detailed explanation, download the LEADTOOLS version 16 evaluation and refer to the “Programming with Leadtools .NET OCR” topic in the .NET help:
Where steps 4, 5, 6 and 7 can pretty much be done in any order, as long as they are carried out after starting up the OCR engine and before recognizing a page. You can start using LEADTOOLS for .NET OCR in your application by adding a reference to the Leadtools.Forms.Ocr.dll assembly into your .NET application. This assembly contains the various interfaces, classes, structures and delegates used to program with LEADTOOLS OCR. Since the toolkit supports multiple engines, the actual code that interfaces with the engine is stored in a separate assembly that will be loaded dynamically once an instance of the IOcrEngine interface is created. Hence, you must make sure the engine assembly you are planning to use resides next to the Leadtools.Forms.Ocr.dll assembly. You can add the engine assembly as a reference to your project if desired to automatically detect dependencies, even though this is not required by LEADTOOLS. The CodeThe following example shows how to perform the above steps in code: Visual Basic' *** Step 1: Select the engine type and
' create an instance of the IOcrEngine interface.
' We will use the LEADTOOLS OCR Plus engine and use it in the same process
Dim ocrEngine As IOcrEngine = _
OcrEngineManager.CreateEngine(OcrEngineType.Plus, False)
' *** Step 2: Startup the engine.
' Use the default parameters
ocrEngine.Startup(Nothing, Nothing, Nothing)
' *** Step 3: Create an OCR document with one or more pages.
Dim ocrDocument As IOcrDocument = _
ocrEngine.DocumentManager.CreateDocument()
' Add all the pages of a multi-page TIF image to the document
ocrDocument.Pages.AddPages("C:\Images\Ocr.tif", 1, -1, Nothing)
' *** Step 4: Establish zones on the page(s), either manually or automatically
' Automatic zoning
ocrDocument.Pages.AutoZone(Nothing)
' *** Step 5: (Optional) Set the active languages to be used by the OCR engine
' Enable English and German languages
ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"})
' *** Step 6: (Optional) Set the spell checking language
' Enable the spell checking system and set English as the spell language
ocrEngine.SpellCheckManager.Enabled = True
ocrEngine.SpellCheckManager.SpellLanguage = "en"
' *** Step 7: (Optional) Set any special recognition module options
' Change the fill method for the first zone in the first page to be Omr
Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0)
ocrZone.FillMethod = OcrZoneFillMethod.Omr
ocrDocument.Pages(0).Zones(0) = ocrZone
' *** Step 8: Recognize
ocrDocument.Pages.Recognize(Nothing)
' *** Step 9: Save recognition results
' Save the results to a PDF file
ocrDocument.Save("C:\\Images\Document.pdf", OcrDocumentFormat.PdfA, Nothing)
ocrDocument.Dispose()
' *** Step 10: Shut down the OCR engine when finished
ocrEngine.Shutdown()
ocrEngine.Dispose()
C#// *** Step 1: Select the engine type and
// create an instance of the IOcrEngine interface.
// We will use the LEADTOOLS OCR Plus engine and use it in the same process
IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, false);
// *** Step 2: Startup the engine.
// Use the default parameters
ocrEngine.Startup(null, null, null);
// *** Step 3: Create an OCR document with one or more pages.
IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();
// Add all the pages of a multi-page TIF image to the document
ocrDocument.Pages.AddPages(@"C:\Images\Ocr.tif", 1, -1, null);
// *** Step 4: Establish zones on the page(s), either manually or automatically
// Automatic zoning
ocrDocument.Pages.AutoZone(null);
// *** Step 5: (Optional) Set the active languages to be used by the OCR engine
// Enable English and German languages
ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de"});
// *** Step 6: (Optional) Set the spell checking language
// Enable the spell checking system and set English as the spell language
ocrEngine.SpellCheckManager.Enabled = true;
ocrEngine.SpellCheckManager.SpellLanguage = "en";
// *** Step 7: (Optional) Set any special recognition module options
// Change the fill method for the first zone in the first page to be default
OcrZone ocrZone = ocrDocument.Pages[0].Zones[0];
ocrZone.FillMethod = OcrZoneFillMethod.Default;
ocrDocument.Pages[0].Zones[0] = ocrZone;
// *** Step 8: Recognize
ocrDocument.Pages.Recognize(null);
// *** Step 9: Save recognition results
// Save the results to a PDF file
ocrDocument.Save(@"C:\Images\Document.pdf", OcrDocumentFormat.PdfA, null);
ocrDocument.Dispose();
// *** Step 10: Shut down the OCR engine when finished
ocrEngine.Shutdown();
ocrEngine.Dispose();
Finally, the following sample shows how to perform the same task above using the one shot "fire and forget" IOcrAutoRecognizeManager interface: Visual Basic' Create the engine instance
Using ocrEngine As IOcrEngine = _
OcrEngineManager.CreateEngine(OcrEngineType.Plus, False)
' Startup the engine
ocrEngine.Startup(Nothing, Nothing, Nothing)
' Convert the multi-page TIF image to a PDF document
ocrEngine.AutoRecognizeManager.Run( _
"C:\Images\Ocr.tif", _
"C:\Images\Document.pdf", _
Nothing, _
OcrDocumentFormat.PdfA, _
Nothing)
End Using
C#// Create the engine instance
using (IOcrEngine ocrEngine =
OcrEngineManager.CreateEngine(OcrEngineType.Plus, false))
{
// Startup the engine
ocrEngine.Startup(null, null, null);
// Convert the multi-page TIF image to a PDF document
ocrEngine.AutoRecognizeManager.Run(
@"C:\Images\Ocr.tif",
@"C:\Images\Document.pdf",
null,
OcrDocumentFormat.PdfA,
null);
}
ConclusionLEADTOOLS provides developers with access to the world’s best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of business-critical applications. The new version 16 design will simplify the development effort, without sacrificing the level of control dictated by the specific application. As demonstrated by the samples above, LEAD’s new high level OCR interface and design provide a logical and flexible approach to converting scanned images to editable and searchable documents. Classes are provided to allow you to control the entire process, or you can simply start the engine and convert any of the 150+ supported image formats to all common document formats with a single method call. OCR is one of the many things LEADTOOLS has to offer. For more information be sure to visit our home page and download a free fully functioning evaluation SDK. Required Software to Build this SampleOr if you want to try it before you make a purchasing decision, you can download the free 60 day fully functional evaluation for LEADTOOLS version 16. SupportNeed help getting this sample up and going? Contact our support team for free evaluation support!
|
||||||||||||||||||||||