Click here to Skip to main content
15,991,353 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
I have copied a codes from the internet to OCR an image. I can get the text of all document but when I try using the below code I can only get the first line.
It execute an infinite inner loop Example: This is a line in image
When the items get in image, loop is not exiting Loop Until items.Next(Tesseract.PageIteratorLevel.TextLine, Tesseract.PageIteratorLevel.Word)
VB
Dim engine As New Tesseract.TesseractEngine("tessdata", "eng", Tesseract.EngineMode.TesseractOnly)
Dim page As Tesseract.Page = engine.Process(New Bitmap("imagepath"))
Dim items As Tesseract.ResultIterator = page.GetIterator()
items.Begin()
MsgBox(page.GetMeanConfidence)
Console.WriteLine("Mean Confidence: " & page.GetMeanConfidence)
Console.WriteLine("")
Dim i As Integer = 1
Do
    'If i Mod 2 = 0 Then
    Console.Write("Line " & i & " ")
    Do
        If items.IsAtBeginningOf(Tesseract.PageIteratorLevel.Block) Then
            Console.WriteLine("New Block" & vbNewLine)
        End If
        If items.IsAtBeginningOf(Tesseract.PageIteratorLevel.Para) Then
            Console.WriteLine("New Paragraph" & vbNewLine)
        End If
        If items.IsAtBeginningOf(Tesseract.PageIteratorLevel.TextLine) Then
            Console.WriteLine("New TextLine" & vbNewLine)
        End If
        Console.WriteLine(items.GetText(Tesseract.PageIteratorLevel.Word))
    Loop Until items.Next(Tesseract.PageIteratorLevel.TextLine, Tesseract.PageIteratorLevel.Word)
    'End If
    i = i + 1
Loop Until items.Next(Tesseract.PageIteratorLevel.Para, Tesseract.PageIteratorLevel.TextLine)
'RichTextBox1.Text = page.GetText()
engine.Dispose()
MsgBox("done")
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900