Click here to Skip to main content
15,507,372 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
When I'm trying to extract plain text from a PDF it is giving me some unclear data instead of exact text. For that PDF the fonts are something like TT222FO00 embedded subset and encoding is custom.

Can anybody help me with this?

Thanks in advance.

[moved up from comment]

This is how I'm doing it:
Posted
Updated 30-Jul-12 4:52am
v4
Comments
Joan M 7-Jul-11 6:06am    
Just in case that the Manfred R. Bihy answer would not work for you (which I think it will) you should post a small sample of what are you doing and then we will be able to help... Good luck...
Nagy Vilmos 7-Jul-11 9:46am    
I've removed the duplicate comment and placed them in the question, if you want to add something click the nice green "Improve question" link

Maybe you'd want to try one these free libraries here: http://java-source.net/open-source/pdf-libraries[^].

Hope you'll find something appropriate there :).

Cheers!

—MRB
 
Share this answer
 
Comments
ajaad 8-Jul-11 7:27am    
presently im using same library only.
is there any better solution for this
I can recommend PDF Clown[^]

well documented, works fine.
 
Share this answer
 
Itext is the 3rd party library that most developers used. And for extraction, please see this discussion: http://stackoverflow.com/questions/4026614/extract-text-from-pdf-files[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900