Click here to Skip to main content
15,306,032 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Is this possible to Extract Text from Scanned PDF Documents and Images in asp.net? If yes Please help me
Posted

to get text from a pdf look into ITextSharp have a look at Converting PDF to Text in C#[^]

to extract text from an image you need to do OCR have a look at google tesseract[^] Traceract[^]
   
Comments
anjali2 20-Aug-11 7:07am
   
Thanks for your reply But,
ITextSharp convert only text from PDF document but I have a scanned PDF Document
Simon Bang Terkildsen 20-Aug-11 7:22am
   
What's the difference? well I suppose your scanned PDF document contains images, in that case you need to extract the images and then you're back to OCR.
[no name] 20-Aug-11 8:46am
   
My 5 for accurate direction.
Sergey Alexandrovich Kryukov 21-Aug-11 22:58pm
   
Agree, my 5.
Please see my solution; some useful advice on OCR.
--SA
thatraja 23-Aug-11 0:31am
   
5!
Yes Agree with Simon's comments in Solution-1. This requirement is related to OCR - "Optical Character Recognition".

You can use below Microsoft's SDK for this.

Microsoft Office Document Imaging -
http://social.technet.microsoft.com/Forums/en-US/officeappcompat/thread/93d6f285-dc98-46e2-b7e0-872bba9c4e35/[^]

I had evaluated several Third Party OCR SDK's in one of my assignment. In case if you are open for Third Party OCR SDK then search below SDK's on Google.

1) Nuance OmniPage OCR
2) Accusoft SmartZone OCR
   
v2
Comments
Simon Bang Terkildsen 20-Aug-11 10:03am
   
I'm not a big fan of the MS Office OCR.
If the OP want a good versatile OCR api then third party is the way to go.
+5
[no name] 20-Aug-11 10:09am
   
Do agree. MS Office OCR has less accracy in comparsion. I found Nuance and Accusoft most accurate amongst the SDK's I had evaluated. By the way "Transym" is also a good OCR SDK.
thatraja 23-Aug-11 0:31am
   
5!
I agree with Simon: if the PDF is a scanned image, you will need OCR, which is not easy.

Please see my advice on OCR in my past solution OCR Software[^].

Good luck,
—SA
   
Comments
thatraja 23-Aug-11 0:31am
   
5!
   
Thank you, Raja.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900