Click here to Skip to main content
15,885,890 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
How to convert a .doc, .docx file in .html
Posted
Comments

First and foremost, you should understand that the problem is ambiguous, as there is not one universal one-to-one correspondence between the format and their data and rendering model. That said, if you develop such function, it should accept two, not one parameter (document itself), it should also receive some set of mapping rules, which can be different, producing different results.

To read/parse Word documents, you can use Microsoft Office Interop for Word. This is the assembly already put to GAC if you install office, so you can reference it using ".NET" tab of the "Add Reference" window. Please see:
http://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office[^],
http://msdn.microsoft.com/en-us/library/ff601860.aspx[^],
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.aspx[^].

This article can also be useful:
http://www.dotnetperls.com/word[^].

If you want to work with Word formats without installation of Office, you still can do it. After all, OpenOffice, LibreOffice and other products support all versions of the format, please see:
http://en.wikipedia.org/wiki/OpenOffice.org[^],
http://en.wikipedia.org/wiki/LibreOffice[^].

These products are open-source, so you can always download the source code and see the code behind the conversion.

If you would like to support only the newer Office Open XML, the format itself is available and is standardized under ECMA-376 and ISO/IEC 29500:2008:
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://en.wikipedia.org/wiki/Office_Open_XML_software[^].

Please see the comparison chart on Office Open XML software:
http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_software[^].

As some source code is available and open, you can use it.

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900