Click here to Skip to main content
15,067,335 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
How to convert pdf files to html? I have tried pdftohtml and pdf2text. I found somewhere pdftohtml generates better html layout. I have used the command

pdftohtml  -c 1.pdf 2.htm


This generates html file but layout is not like the original pdf. Can anyone suggest better solution for pdf to html conversion?

Note: HTML file may have images.
Posted
Updated 17-Nov-11 0:18am
v4

There is a basic problem here: PDF is essentially an output format and all the semantic information you would want to be in your HTML (this is a header, that is a table, that is a subelement of that) is simply missing. There's also the issue that PDF is a page-based format and HTML isn't, so any sort of vaguely layout-preserving translation won't feel right as a HTML page.

The short answer to this one is: don't do it. Either serve up the PDF (modern browsers generally know how to deal with one), or start from something earlier in the production chain, i.e. whatever you used to generate the PDF in the first place.
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900