Click here to Skip to main content
15,918,742 members
Please Sign up or sign in to vote.
1.50/5 (2 votes)
See more:
I want to convert a pdf file to word (.rtf, .doc) but the structure of the document should not be changed.
Posted
Comments
ridoy 6-Jan-16 1:18am    
Not a question!

1 solution

To word with PDF, you can use Java library iText:
https://en.wikipedia.org/wiki/IText[^],
iText[^].

I have no idea why would you want to create RTF or DOC with Java (especially proprietary DOC; I could only understand if it was .DOCX). I would suggest to convert it to HTML or some of the document formats based on XML. The fact that you don't know exactly what document format you want, RTD of DOC, and the fact you did not mention DOCX, strongly suggests that you don't really need any of them, and HTML would be your best choice.

However, if you really want RTF (again, I doubt it), it's not too bad: the description of the format is publicly available, use it. Or use some 3rd-party library. One of them is jRTF:
jRTF = a new library for building RTF documents | Java Blog[^].

Another option is Apache RTFlib: Apache(tm) FOP Development: RTFLib (jfor)[^].

You can do your own search and find something else.

Microsoft DOCX format is much more complicated. And I don't even want to discuss DOC, which is obsolete and messed up; there is no an official public standard.

With DOCX, you have the ECMA standard, which is publicly available:
Office Open XML - Wikipedia, the free encyclopedia[^],
Microsoft Office XML formats - Wikipedia, the free encyclopedia[^],
Standard ECMA-376[^].

You can use open-source docx4j: docx4j[^].

That's all. But better listen to a good advice and create HTML.

—SA
 
Share this answer
 
v2
Comments
ridoy 6-Jan-16 1:18am    
5ed!
Sergey Alexandrovich Kryukov 6-Jan-16 9:47am    
Thank you.
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900