Click here to Skip to main content
15,881,139 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to extract data programmatically from unstructured documents having printed and handwritten text as well.Is there any best tool or way to extract data from unstructured document and convert to xml so that we can map fields with data ?
As these unstructured document like OLD Pancard type is different and new pancard is different.
Also some document pattern(format) changes every financial year or emplyeewise.
Like payslip changes for every employee wise and application form of bank changes every financial year....

Thanks in advance

What I have tried:

For this i used various tools like Abby,IRDS,Nicomsoft,Accusoft,Aspose,LeadTools trial version (these tools having OCR,ICR Technology)but none of them is giving me correct result.The document may contain handwritten and printed text.I am reading now IWR(word recogintion technology) but which tool uses this technology.As abby is giving very poor response as i experienced I need much good response as well.
CHill60 10-Mar-16 7:11am    
SAS Text Miner has been around for a while and has had some good reports. Couldn't find out how much though so I suspect it is not cheap!
Text Mining Software, SAS Text Miner | SAS[^]
[Disclaimer - I am nothing to do with this company, nor is this necessarily a recommendation]
sp_suresh 10-Mar-16 7:15am    
thanks a lot. i will check it......
LEADTOOLS Support 11-May-16 7:18am    
Since the locations of fields keep changing, you could perform general image-wide OCR and search the result for keywords you expect to appear in the document (like "Employee name" or "Salary"). Once you find a keyword, search for associated fields (like salary amount) in a location relative to the keyword's location. Keep in mind that ICR in general does not give good results unless the hand-written text is clear and has print-like quality.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900