Click here to Skip to main content
15,065,758 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi

I am trying to get text from image using Tesseract with C# below is the code:

Bitmap img = (Bitmap)Bitmap.FromFile("/Users/prkotagi/Desktop/Test.bmp"); 
TesseractEngine engine = new TesseractEngine("/Volumes/Macintosh HD - Data/Csharpcode/eng.traineddata","eng",EngineMode.Default);
Page page = engine.Process(img, PageSegMode.Auto);
string result = page.GetText();
Console.WriteLine(result);


But we are gerting error as
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'
at line
Page page = engine.Process(img, PageSegMode.Auto);


What I have tried:

I tried with OCR nothing works. Please let me know the solution for the above one.
Posted
Updated 9-Nov-20 22:44pm
Comments
Richard MacCutchan 10-Nov-20 4:52am
   
That just means that the Process method expects a tesseract.pix object not a Bitmap.

Quote:
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'

Based on error, seems Process() is expecting parameter of type tesseract.pix and not an bitmap.

Believe there is something like PixConverter[^] that can be used:
C#
using (var img = PixConverter.ToPix(imgsource))
{
    using (var page = engine.Process(img))
    {
        ocrtext = page.GetText();
    }
}

Example reference: tesseract-samples/Program.cs [^]

Alternatively, looking at the demo sample provided, it seems you need to do following:
C#
var tesseractPath = solutionDirectory + @"\tesseract-master.1153";
var imageFile = File.ReadAllBytes(fileName);
var text = ParseText(tesseractPath, imageFile, "eng", "fra");

//results in
Console.WriteLine("File:" + fileName + "\n" + text + "\n");

Reference: GitHub - doxakis/How-to-use-tesseract-ocr-4.0-with-csharp: How to use Tesseract OCR 4.0 with C#[^]
   
Comments
Member 14988829 10-Nov-20 9:34am
   
Hi,
I have tried below one

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
{
using (var img = Pix.LoadFromFile("/Users/prkotagi/Desktop/Test.bmp"))
{
using (var page = engine.Process(img))
{
var text = page.GetText();

But i am getiing below error.

System.DllNotFoundException: Failed to find library "liblept1753.so" for platform x64.
Sandeep Mewara 10-Nov-20 9:37am
   
https://github.com/charlesw/tesseract/issues/290
You need to convert it to a format that the engine understands. See teh first example here: TesseractEngine.Process C# (CSharp) Code Examples - HotExamples[^]
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900