Click here to Skip to main content
15,867,895 members
Please Sign up or sign in to vote.
2.00/5 (1 vote)
See more:
XML
During processing of tiff files, which are having 600 - 700 pages from Tesseract OCR engine with hocr option, we monitored that files are taking around 40 - 50 minutes.

We monitored that it is so much time for processing large files.

Do we have any way to speed up the process?

Following command is using: -
tesseract.exe "Source_Tiff_File" "Destination_File" hocr
Posted
Updated 18-Apr-15 19:28pm
v3
Comments
Sergey Alexandrovich Kryukov 19-Apr-15 1:56am    
No, there is no such way, and here is why: you can write your own OCR engine, or modify Tesseract to work better, but it won't be Tesseract anymore. :-)
—SA
Mehdi Gholam 19-Apr-15 2:48am    
How much cpu is being used?
shivmymail 2-May-15 1:07am    
Some time, extraction process reaches to 100 % CPU for large files.

1) Contact the library authors for performance tweaks.
2) Run on faster hardware.
 
Share this answer
 
Comments
kiredw 20-May-20 4:30am    
:(
I had a quick look at Tesseract search results and it appears that the OCR engine supports AMD GPUs.

Therefore in theory it may be possible to speed up processing by;
a) investing in a higher specification GPU
coupled with;
b) higher performance CPU.

Do some research, ask some questions of the developers.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900