Click here to Skip to main content
15,891,864 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Is this possible to Extract Text from Scanned PDF Documents and Images in asp.net? If yes Please help me
Posted

to get text from a pdf look into ITextSharp have a look at Converting PDF to Text in C#[^]

to extract text from an image you need to do OCR have a look at google tesseract[^] Traceract[^]
 
Share this answer
 
Comments
anjali2 20-Aug-11 7:07am    
Thanks for your reply But,
ITextSharp convert only text from PDF document but I have a scanned PDF Document
Simon Bang Terkildsen 20-Aug-11 7:22am    
What's the difference? well I suppose your scanned PDF document contains images, in that case you need to extract the images and then you're back to OCR.
RaisKazi 20-Aug-11 8:46am    
My 5 for accurate direction.
Sergey Alexandrovich Kryukov 21-Aug-11 22:58pm    
Agree, my 5.
Please see my solution; some useful advice on OCR.
--SA
thatraja 23-Aug-11 0:31am    
5!
Yes Agree with Simon's comments in Solution-1. This requirement is related to OCR - "Optical Character Recognition".

You can use below Microsoft's SDK for this.

Microsoft Office Document Imaging -
http://social.technet.microsoft.com/Forums/en-US/officeappcompat/thread/93d6f285-dc98-46e2-b7e0-872bba9c4e35/[^]

I had evaluated several Third Party OCR SDK's in one of my assignment. In case if you are open for Third Party OCR SDK then search below SDK's on Google.

1) Nuance OmniPage OCR
2) Accusoft SmartZone OCR
 
Share this answer
 
v2
Comments
Simon Bang Terkildsen 20-Aug-11 10:03am    
I'm not a big fan of the MS Office OCR.
If the OP want a good versatile OCR api then third party is the way to go.
+5
RaisKazi 20-Aug-11 10:09am    
Do agree. MS Office OCR has less accracy in comparsion. I found Nuance and Accusoft most accurate amongst the SDK's I had evaluated. By the way "Transym" is also a good OCR SDK.
thatraja 23-Aug-11 0:31am    
5!
I agree with Simon: if the PDF is a scanned image, you will need OCR, which is not easy.

Please see my advice on OCR in my past solution OCR Software[^].

Good luck,
—SA
 
Share this answer
 
Comments
thatraja 23-Aug-11 0:31am    
5!
Sergey Alexandrovich Kryukov 23-Aug-11 7:57am    
Thank you, Raja.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900