Click here to Skip to main content
15,879,535 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Does anybody know any algorithm for reading *.doc (1997 - 2003) files without using (opening) of Word ?
Posted

You need to use the Microsoft Word Interop assemblies[^].
 
Share this answer
 
Comments
Andy1828 2-Mar-12 11:04am    
It works slowly. I wrote code which analyzes symbols, but it works wrong with style descriptions.
Richard MacCutchan 2-Mar-12 11:25am    
Sorry, I don't understand what you are saying.

fjdiewornncalwe 2-Mar-12 11:43am    
[Op's answer]: Well, Word document includes text symbols and formatting symbols. I try to separate first from other witout using standard office functions (it is faster). Really, Word does the same - processes text symbols and formats them.
I will give you an idea:
If you really want to read only*.doc files, then my idea is of no good.
But if you are undecided and want to read something formatted in a doc style, you can re-save the doc file in a *.docx which is basically an archive that you can unzip it, then search the file containing the formatted text in that unzipped folder. If I remember well, is called [document.xml]. From that xml you can extract the text unformatted or whatever you like.
Good luck.
 
Share this answer
 
Comments
Andy1828 3-Mar-12 2:00am    
Thanks for answer. The idea is FAST reading of great income stream. You propose good but slow solution.
C#
oWord.visible = false;//should be sufficient.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900