Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi,
Is there is a any possible way to read data in doc file without using Interop. I have tried with StreamReader but I m getting the data in non human readable format.

Thanks in advance.
Posted
Comments
Maciej Los 3-Jun-13 4:00am    
What version of MS Word?

You have to parse yourself the Word file format (Microsoft gently provides its specification: "Microsoft Office File Formats"[^] ). Feasible but hard, I guess.
 
Share this answer
 
 
Share this answer
 
Comments
Prasaad SJ 11-Jun-13 4:41am    
So.. as per the first link going for the third party tool like Aspose is best I suppose. and in the 2nd Link You are once again using Interop in between.. Do you any other 3rd party tool which is free. I tried with iTextSharp but it does not compatible with Doc files..
Prasaad SJ 25-Jun-13 1:24am    
Ok.. Thank you..
Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll from the given URL :
Code7248.word_reader / Wiki / Home[^][]
(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)

The Sample Code is in simple Console in C#:

C#
using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;


namespace testWordRead
{
    class Program
    {
        private void readFileContent(string path)
        {
            TextExtractor extractor = new TextExtractor(path);
            string text = extractor.ExtractText();
            Console.WriteLine(text);
        }
        static void Main(string[] args)
        {
            Program cs = new Program();
            string path = "D:\Test\testdoc1.docx";
            cs.readFileContent(path);
            Console.ReadLine();
        }
    }
}


It is working fine.
 
Share this answer
 
v2
Comments
gladsong 30-Nov-16 1:52am    
Hi,

Using Code7248.word_reader.dll is working well. But still an issue there. How to close the object once the file is read. when uploading the same file again & trying to read throws object is in use error.
modi.sagar4u 23-Jan-17 7:05am    
An exception of type 'System.IO.IOException' occurred in mscorlib.dll but was not handled in user codeAdditional information: The process cannot access the file '' because it is being used by another process. how to solve above error. I am using same .dll to convert .doc and .docx file to .txt facing above issue more specifically on .docx files
Member 3795248 30-Nov-18 1:59am    
Its working but How to get the content with formatted output?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900