Click here to Skip to main content
15,867,453 members
Articles / Programming Languages / C#
Article

OCR Documents in .NET

21 Jan 2013CPOL3 min read 51.6K   4.3K   16   2
OCR Documents in .NET.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Overview

Dynamsoft’s OCR SDK is an add-on of Dynamic .NET TWAIN, an image acquisition SDK optimized for .NET applications. The OCR SDK allows you to convert scanned images to searchable PDF/text files. Recognized as a useful feature, it’s not easy to implement it. A lot of complicated things, such as accuracy, image format and more, are involved to get better results. OCR performance is another important factor that affects the efficiency of the whole process.

Dynamsoft’s OCR SDK, optimized based on the highly developed open source engine (Tesseract OCR engine), helps you relieve from these burdens. By integrating with Dynamic .NET TWAIN, you can create a robust image acquisition and processing solution in several lines of source code.

Key Features

  • Supports more than 40 languages, including Arabic and various Asian languages.
  • High OCR performance by supporting multi-thread processing.
  • Accurate recognition with font identification
  • Easy integration with the image acquisition SDK – Dynamic .NET TWAIN.

The following sections will show you how to integrate the OCR add-on to your WinForm application and convert scanned images to searchable PDF/text files.

Source Code

1. Embed Dynamic .NET TWAIN to your WinForm or WPF app.

We will take WinForm as an example.

Assume you’ve already downloaded and installed the .NET component onto your development machine (If not, please download the 30-day free trial from Dynamsoft’s website.).

Open your WinForm app or create a new one in Visual Studio. From the Tools menu, select Choose Toolbox Items. In the prompt dialog box, click Browse and select DynamicDotNetTWAIN.dll which can be found in the installation folder of Dynamic .NET TWAIN. Click OK to close the dialog box.

Image 1

Drag and drop the component to the form.

2. Scan images from scanners, webcams or get from local folders.

Dynamic .NET TWAIN supports getting images from various sources, including scanners, webcams and other TWAIN/WIA/UVC compatible devices. In this article, I’ll show you how to load an existing image from your local disk.

SetViewMode: defines the view mode of the control.
LoadImage: loads the existing local images. Supported image format includes BMP, PNG, JPEG, TIFF (both single and multi-page) and PDF (both single and multi-page).

this.dynamicDotNetTwain1.SetViewMode(1, 1);
OpenFileDialog filedlg = new OpenFileDialog();

if (filedlg.ShowDialog() == DialogResult.OK)
{
    foreach (string strfilename in filedlg.FileNames)
    {
        this.dynamicDotNetTwain1.LoadImage(strfilename);
    }
}

3. Initialize the OCR add-on and choose the language package.

1) Choose the language package and define the path of the package by using the OCRTessDataPath property.

Dynamsoft’s OCR SDK supports more than 40 languages, including English, Spanish, Arabic and more. The sample code below chooses English as the default language. Other language packages can be downloaded from Dynamsoft’s website: OCR SDK Language Packages

string languageFolder = Application.StartupPath;

this.dynamicDotNetTwain1.OCRTessDataPath = languageFolder;
this.dynamicDotNetTwain1.OCRLanguage = "eng";

2) Set the path of DynamicOCR.dll or DynamicOCRx64.dll to initialize the OCR add-on.

this.dynamicDotNetTwain1.OCRDllPath = "";

3) Choose the OCR result file format and save. Supported file format includes Text, PDF Plain Text and PDF Image over Text. By setting the format to PDF Image over Text, the detailed image/text position and format, such as font names, font sizes, line widths and more, will keep as original.

this.dynamicDotNetTwain1.OCRResultFormat = (Dynamsoft.DotNet.TWAIN.OCR.ResultFormat)this.ddlResultFormat.SelectedIndex;


byte[] sbytes = this.dynamicDotNetTwain1.OCR(this.dynamicDotNetTwain1.CurrentSelectedImageIndicesInBuffer);

if(sbytes != null)
{
    SaveFileDialog filedlg = new SaveFileDialog();
    if (this.ddlResultFormat.SelectedIndex != 0)
    {
        filedlg.Filter = "PDF File(*.pdf)| *.pdf";
    }
    else
    {
        filedlg.Filter = "Text File(*.txt)| *.txt";
    }
    if (filedlg.ShowDialog() == DialogResult.OK)
    {
        FileStream fs = File.OpenWrite(filedlg.FileName);
        fs.Write(sbytes, 0, sbytes.Length);
        fs.Close();
    }
}
else
{
    MessageBox.Show(this.dynamicDotNetTwain1.ErrorString);
}

Distribution

To distribute the application to the end users, please copy the following files to the client machine along with the EXE file.

The language package
DynamicOCR.dll (for 32-bit Windows OS) and/or DynamicOCRx64.dll (for 64-bit Windows OS)
DynamicDotNetTwain.dll

Xcopy deployment is also supported.

Resources 

The complete source code of OCR can be downloaded from the article. To test and/or customize the code, you can download the trial version of Dynamic .NET TWAIN from Dynamsoft’s website.

Download Dynamic .NET TWAIN 30-Day Free Trial

Other demos/samples of .NET image acquisition and processing can be found here:

Dynamic .NET TWAIN Demos

If you have any questions, you can contact our support team at nettwain@dynamsoft.com.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Canada Canada
Dynamsoft has more than 15 years of experience in TWAIN SDKs, imaging SDKs and version control solutions.

Our products include:

TWAIN SDK
- Dynamic Web TWAIN: a TWAIN scanning SDK optimized for web document management applications.
- Dynamic .NET TWAIN: a .NET TWAIN and Directshow Image Capture SDK for WinForms/WPF applications.

Imaging SDKs
- Barcode Reader for Windows, Linux, macOS, iOS, Android and Raspberry Pi.
- OCR addon for both web and .NET TWAIN SDKs

Version Control
- SourceAnywhere: a SQL server-based source control solution. Both on-premise and hosting options are provided.

http://www.dynamsoft.com/
This is a Organisation

21 members

Comments and Discussions

 
QuestionCrash Pin
Kiss Gergely 852-Jul-13 1:16
Kiss Gergely 852-Jul-13 1:16 
AnswerRe: Crash Pin
visnumca12312-Sep-13 22:46
visnumca12312-Sep-13 22:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.