Click here to Skip to main content
15,887,444 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Is there any open source OCR of .NET that can extract text from scanned pdf even if the text is in different fonts and it gives the ability to render it in html( or xml or text)format.
Posted
Updated 14-Jun-12 4:30am
v2
Comments
Richard MacCutchan 14-Jun-12 11:33am    
No idea, have you asked Google?

Use this links:
OCR[^]
OCR Source code[^]
 
Share this answer
 
v2
Comments
Sandeep Mewara 14-Jun-12 16:32pm    
Nice links 5!
Maciej Los 14-Jun-12 17:46pm    
Thank you, Sandeep ;)
Don't limit yourself to .NET

OCR has been a solved problem for years -- well before .NET came out, and open source projects tend to use non-proprietary languages.

I was part of the team that produced one of the first comercially successful OCR products for the PC in 1988. I would expect that most open source OCR projects were started in the early 90's.

There are probably very good open source solutions out there -- most likely in C++.

You are going to be a lot happier if you select the best quality OCR available and then do the work to interface to it -- rather than settling for inferior OCR that's easy to incorporate in your project.

A quick search turns up this project:

http://code.google.com/p/tesseract-ocr/[^]

Apparently it was pretty accurate back in 1995 and Google has adopted it and done a lot of work on it since 2006.

It's already ported to Windows and VS2008/2010 -- so all you have to do is interface your .NET code with it.
 
Share this answer
 
Comments
Thiago Silva 14-Aug-12 11:10am    
If you are looking at Tesseract, there's a project called Tessnet2 which does the cumbersome work of wrapping the C++ lib with .NET CLI (better than you would have to do with P/Invokes). www.pixel-technology.com/freeware/tessnet2/

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900