Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i want to read pdf file in asp.net
development language :c#

i need to find
1) paragraph by paragraph text from pdf

2) image

3) all font and styles

4)annotation in pdf file

5) table content in pdf file


please guide me on this which library should i use.

after this i am going to some processing on these content.

if library is paid , its ok no problem to pay

Thanks in advance

Regards

Sachin
Posted
Updated 13-Sep-13 22:40pm
v2
Comments
Thanks7872 14-Sep-13 4:48am    
And what have you tried so far?
landesachin14 14-Sep-13 5:30am    
pdfbox, pdfsharp but didn't get solution ....
Thanks7872 14-Sep-13 6:06am    
Post the code and explain what you mean by didn't get solution.

1 solution

What you need is an OCR engine. The best I know is this one: abby api[^]
 
Share this answer
 
Comments
landesachin14 14-Sep-13 10:03am    
thanks,we use abbyy. however it works on principle glyph recognition .
it will convert pdf to images before extract then extract the text and there content accuracy in big problem
Zoltán Zörgő 14-Sep-13 16:23pm    
All OCR work like that. I am using Abby myself, and never had problems. PDF is not a document format (despite it's name), it is a specialized postscript file, thus actually meant to support rendering. It is a "write-only" format in the sense, that there is no guarantee to have the two consecutive characters of the same word in the same order in the file - not speaking about tables and so. Editing a PDF in general is not like editing a Word document. Not even Adobe Acrobat Professional is able to edit all PDF files - that look like ordinary ones.
You don't have other real option but to to treat your PDF as image and recognize it's content with one or the other OCR engine.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900