Click here to Skip to main content
15,074,394 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi

I am trying to get text from image using Tesseract with C# below is the code:

Bitmap img = (Bitmap)Bitmap.FromFile("/Users/prkotagi/Desktop/Test.bmp"); 
TesseractEngine engine = new TesseractEngine("/Volumes/Macintosh HD - Data/Csharpcode/eng.traineddata","eng",EngineMode.Default);
Page page = engine.Process(img, PageSegMode.Auto);
string result = page.GetText();
Console.WriteLine(result);


But we are gerting error as
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'
at line
Page page = engine.Process(img, PageSegMode.Auto);


What I have tried:

I tried with OCR nothing works. Please let me know the solution for the above one.
Posted
Updated 9-Nov-20 22:44pm
Comments
Richard MacCutchan 10-Nov-20 4:52am
   
That just means that the Process method expects a tesseract.pix object not a Bitmap.

You need to convert it to a format that the engine understands. See teh first example here: TesseractEngine.Process C# (CSharp) Code Examples - HotExamples[^]
   
Quote:
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'

Based on error, seems Process() is expecting parameter of type tesseract.pix and not an bitmap.

Believe there is something like PixConverter[^] that can be used:
C#
using (var img = PixConverter.ToPix(imgsource))
{
    using (var page = engine.Process(img))
    {
        ocrtext = page.GetText();
    }
}

Example reference: tesseract-samples/Program.cs [^]

Alternatively, looking at the demo sample provided, it seems you need to do following:
C#
var tesseractPath = solutionDirectory + @"\tesseract-master.1153";
var imageFile = File.ReadAllBytes(fileName);
var text = ParseText(tesseractPath, imageFile, "eng", "fra");

//results in
Console.WriteLine("File:" + fileName + "\n" + text + "\n");

Reference: GitHub - doxakis/How-to-use-tesseract-ocr-4.0-with-csharp: How to use Tesseract OCR 4.0 with C#[^]
   
Comments
Member 14988829 10-Nov-20 9:34am
   
Hi,
I have tried below one

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
{
using (var img = Pix.LoadFromFile("/Users/prkotagi/Desktop/Test.bmp"))
{
using (var page = engine.Process(img))
{
var text = page.GetText();

But i am getiing below error.

System.DllNotFoundException: Failed to find library "liblept1753.so" for platform x64.
Sandeep Mewara 10-Nov-20 9:37am
   
https://github.com/charlesw/tesseract/issues/290

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900