Click here to Skip to main content
15,171,873 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
I am not able to search for key information as Google only supports exact match. However, due to OCR errors, the OCR text would not exactly represent the text in PDF documents. Are there any techniques/softwares that can search accurately in spite of bad quality scan documents (and subsequent OCR errors)?


What I have tried:

The scanned pdf documents I am handling have poor scan quality. In spite of searching on google Drive after enabling OCR,
Posted
Updated 2-Nov-21 5:09am

1 solution

"Bad quality" is not exactly an exact measurement, so the question is really "are there any OCR tools that are better at text conversion with poorer quality scanned documents?"

Well, that all depends on how bad the quality of the scanned document is. There is no way for someone to tell you what's going to work with your documents with any accuracy. You simply have to try variously libraries until you find something that works with your documents.

The second part of this is searching against words that are poorly spelled. Searching against content like that, there is no such things as "accurately". There are matches that may be close matches but with a "confidence" value that the match is the word you're looking for. That comes down to the search engine you're using, or going to use. Such engines are going to use various "fuzzy match" techniques to generate results.

*
   
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900