Click here to Skip to main content
15,885,546 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I wish to Read a word file without using Interop.word dll...Do not want to install word in IIS..Actualy I made a keyword search by converting word file into txt file and reading from it..I tried using Open xml SDK but it doesn't read old doc files correctly..Also found spire.doc which is payment type..Provide a complete code with solution at the earliest...
Code as follows:
C#
private void SearchWord(string[] str1)
      {
          string filename1 = "";
          string randomName = "";
          string fname = "";
          Session["cids"] = "";
          object missingType = Type.Missing;
          object readOnly = true;
          object isVisible = false;
          object documentFormat = 8;
          string s12 = "select id,docfilename from Uploadeddocsmaster";
          dt = cn.viewdatatable(s12);
          int dtcount = dt.Rows.Count * 2;
          string[] ids = new string[dtcount];

          for (int k = 0; k < dt.Rows.Count; k++)
          {
              string id = dt.Rows[k]["id"].ToString();
              filename1 = dt.Rows[k]["docfilename"].ToString();
              string fileName = Server.MapPath("~/UploadedFiles/") + filename1;
              string ext = Path.GetExtension(fileName);
              if (ext == ".doc" || ext == ".docx")
              {
                  RichEditDocumentServer server = new RichEditDocumentServer();
                  server.LoadDocument("document.doc", DocumentFormat.Doc);
                  server.ExportToPdf(memoryStream);

                  Application applicationclass = new Application();
                  string[] crefids = filename1.Split('.');
                  for (int mj = 0; mj < crefids.Length; mj++)
                  {
                      randomName = crefids[0].ToString();
                  }

                  object Source = fileName;
                  object Target = Server.MapPath("~/Temp/" + randomName + ".txt");
                  fname = Target.ToString();
                  // object Target = @"D:\Alex\ResumeManager Dec 6,2012\ResumeManager\Uploaddocs\test1.txt";

                  //Upload the word document and save to Temp folder
                  // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));


                  applicationclass.Documents.Open(ref Source,
                                                  ref readOnly,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref isVisible,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType);
                  applicationclass.Visible = false;
                  Document document = applicationclass.ActiveDocument;
                  object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatUnicodeText;

                  //Save the word document as HTML file
                  document.SaveAs(ref Target, ref format, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType);

                  //Close the word document
                  document.Close(ref missingType, ref missingType, ref missingType);


                  foreach (string str in str1)
                  {

                      using (StreamReader sr = new StreamReader(fname))
                      {

                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }

                  }
              }

              else if (ext == ".pdf")
              {
                  string randomName1 = DateTime.Now.Ticks.ToString();
                  string fname1 = "";



                  object Target1 = Server.MapPath("~/Temp/" + randomName1 + ".txt");
                  fname1 = Target1.ToString();

                  PDDocument doc = PDDocument.load(fileName);
                  PDFTextStripper stripper = new PDFTextStripper();
                  string s = stripper.getText(doc).ToLower();
                  System.IO.StreamWriter LogFile = new System.IO.StreamWriter(fname1, true);
                  LogFile.WriteLine(s);
                  LogFile.Close();
                  foreach (string str in str1)
                  {
                      using (StreamReader sr = new StreamReader(fname1))
                      {

                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }

                  }
              }
              mn++;

          }




          //Upload the word document and save to Temp folder
          // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));

      }

[Edit]Code block added[/Edit]
Posted
Updated 9-Jan-13 6:55am
v2
Comments
Philippe Mori 11-Jan-16 12:37pm    
Well, it is up to you to try products to see which one correspond to your needs (performance, ease of use, handling of various formats and variations, exactitude of the result, concurrency, support, price...).

The only thing I know is that you usually don't want to use Office Interrop on a server... Lot of site explain why.

I perfectly understand if you don't want to mess with Microsoft Office installation and Office interop, but first of all, think why messing with Microsoft Office documents at all — proprietary product is proprietary. These days, there is a number of other option.

Nevertheless, the last versions of Office documents are not so proprietary. You can always learn them, as they are no standardized. Please see:
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats[^],
http://en.wikipedia.org/wiki/Office_Open_XML_file_formats[^].

(Don't mix them up with OpenDocument, http://en.wikipedia.org/wiki/OpenDocument[^].)

Now, there is another approach to it. There are third-party products working with Microsoft Office document. If they can do it, you can, too. You just need to download source code of some open-source products and find out how it works. The only open-source code I know is OpenOffice itself (where .odt came from) and its fork LibreOffice. Please see:
http://en.wikipedia.org/wiki/OpenOffice.org[^],
http://www.openoffice.org/[^],
http://en.wikipedia.org/wiki/LibreOffice[^],
http://www.libreoffice.org/[^].

You can download the source and find the code working with nearly all versions of Office documents. And, of course, .ODT and all other OpenOffice/LibreOffice documents.

Please also see my past answers:
Convert Office-Documents to PDF without interop[^],
Hi how can i display word file in windows application using c#.net[^].

—SA
 
Share this answer
 
Comments
Maciej Los 9-Jan-13 16:51pm    
Agree, +5!
Sergey Alexandrovich Kryukov 9-Jan-13 17:03pm    
Thank you, Maciej.
—SA
See my comment to Sergey's answer, and read this: http://a.nnotate.com/server-installation-windows.html[^] - section: Adding support for uploading DOC, PPT, XLS etc using OpenOffice.
 
Share this answer
 
Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll

download the Dll from the given URL :

http://sourceforge.net/p/word-reader/wiki/Home/

(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)

The Sample Code is in simple Console in C#:

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;

namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}


It is working fine.
 
Share this answer
 
v3
Comments
modi.sagar4u 5-Jan-17 7:23am    
Thanks @kutty. Where can I find source for Code7248.word_reader?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900