Click here to Skip to main content
15,891,529 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi

I am trying to get text from image using Tesseract with C# below is the code:

Bitmap img = (Bitmap)Bitmap.FromFile("/Users/prkotagi/Desktop/Test.bmp"); 
TesseractEngine engine = new TesseractEngine("/Volumes/Macintosh HD - Data/Csharpcode/eng.traineddata","eng",EngineMode.Default);
Page page = engine.Process(img, PageSegMode.Auto);
string result = page.GetText();
Console.WriteLine(result);


But we are gerting error as
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'
at line
Page page = engine.Process(img, PageSegMode.Auto);


What I have tried:

I tried with OCR nothing works. Please let me know the solution for the above one.
Posted
Updated 9-Nov-20 22:44pm
Comments
Richard MacCutchan 10-Nov-20 4:52am    
That just means that the Process method expects a tesseract.pix object not a Bitmap.

Quote:
cannot convert from 'system.drawing.bitmap' to 'tesseract.pix'

Based on error, seems Process() is expecting parameter of type tesseract.pix and not an bitmap.

Believe there is something like PixConverter[^] that can be used:
C#
using (var img = PixConverter.ToPix(imgsource))
{
    using (var page = engine.Process(img))
    {
        ocrtext = page.GetText();
    }
}

Example reference: tesseract-samples/Program.cs [^]

Alternatively, looking at the demo sample provided, it seems you need to do following:
C#
var tesseractPath = solutionDirectory + @"\tesseract-master.1153";
var imageFile = File.ReadAllBytes(fileName);
var text = ParseText(tesseractPath, imageFile, "eng", "fra");

//results in
Console.WriteLine("File:" + fileName + "\n" + text + "\n");

Reference: GitHub - doxakis/How-to-use-tesseract-ocr-4.0-with-csharp: How to use Tesseract OCR 4.0 with C#[^]
 
Share this answer
 
Comments
Member 14988829 10-Nov-20 9:34am    
Hi,
I have tried below one

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
{
using (var img = Pix.LoadFromFile("/Users/prkotagi/Desktop/Test.bmp"))
{
using (var page = engine.Process(img))
{
var text = page.GetText();

But i am getiing below error.

System.DllNotFoundException: Failed to find library "liblept1753.so" for platform x64.
Sandeep Mewara 10-Nov-20 9:37am    
https://github.com/charlesw/tesseract/issues/290
You need to convert it to a format that the engine understands. See teh first example here: TesseractEngine.Process C# (CSharp) Code Examples - HotExamples[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900