Click here to Skip to main content
14,641,690 members
Rate this:
Please Sign up or sign in to vote.
See more:
In my project i hava a requirment to convert doc,docx,xls,xlsx,tiff,txt files to PDf format.

The txt,tiff,Xls,Doc convertion already achieved using java code.
But the problem is while converting Docx/Xlsx the data was missing.

Can you please any one help me to achieve this.
ZurdoDev 12-Jul-13 12:15pm
How can we know why your data is missing? You haven't posted any code.
Rate this:
Please Sign up or sign in to vote.

Solution 3

For the file formats you have listed, doc, docx, xls, xlsx, tiff, and txt files are data formats. PDF is a printer format, in that PDF pages are an image or projection of a printer page. There is even some work in the linux world to 'print' to PDF format as a step in communicating with a real printer (as a way to reduce the rediculous number of printer drivers).

If you want to 'convert' something to PDF, you merely print it, with PDF as output type. If there is something missing in the PDF that you are looking at, the fault lies in how the document was 'printed' to PDF. This is no different from printing to an actual printer device.
Best answer so far, my 5. It explains the different nature of PDF as opposed to Office (and many other) types of document.
That said: strictly speaking, "conversion" without data loss is impossible.

Nevertheless, I added one more, solution 4, where I add more to explain things and add a couple of useful references to my past answer explaining the use of different APIs (notably Open XML SDK) to actually do the conversion, albeit with some data loss.

Rate this:
Please Sign up or sign in to vote.

Solution 1

Any type of document (doc, docx, xls, xlsx, tiff, txt) file can be converted to PDF using java program.
Please find the code for document conversion in the given link.
Convert to PDF
Shubhashish_Mandal 23-Jul-13 7:28am
Rate this:
Please Sign up or sign in to vote.

Solution 4

Strictly speaking, it cannot be done totally without data loss, due to the nature of PDF, as opposed to Office documents. Excel and other Office documents carry much more information: they are flexible, fluid, can be rendered to different paper sizes. Besides, they support inner structure: for example, if you have auto-numbered sections in your Words document, you can always add a new one, and the content will be automatically re-numbered, TOC updated, and a lot more stuff like that, notably, styles. Excel documents contain formulas. They are structured: they execute, not rendered on paper.

With PDF, nearly all such information is lost.

Please see my past answers:
Convert Office-Documents to PDF without interop[^],
Question Convert word to PDF without offce or openoffice[^].


This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100