Click here to Skip to main content
12,505,335 members (76,979 online)
Rate this:
 
Please Sign up or sign in to vote.
See more: C# .NET C#4.0 .NET4
I wish to Read a word file without using Interop.word dll...Do not want to install word in IIS..Actualy I made a keyword search by converting word file into txt file and reading from it..I tried using Open xml SDK but it doesn't read old doc files correctly..Also found spire.doc which is payment type..Provide a complete code with solution at the earliest...
Code as follows:
private void SearchWord(string[] str1)
      {
          string filename1 = "";
          string randomName = "";
          string fname = "";
          Session["cids"] = "";
          object missingType = Type.Missing;
          object readOnly = true;
          object isVisible = false;
          object documentFormat = 8;
          string s12 = "select id,docfilename from Uploadeddocsmaster";
          dt = cn.viewdatatable(s12);
          int dtcount = dt.Rows.Count * 2;
          string[] ids = new string[dtcount];
 
          for (int k = 0; k < dt.Rows.Count; k++)
          {
              string id = dt.Rows[k]["id"].ToString();
              filename1 = dt.Rows[k]["docfilename"].ToString();
              string fileName = Server.MapPath("~/UploadedFiles/") + filename1;
              string ext = Path.GetExtension(fileName);
              if (ext == ".doc" || ext == ".docx")
              {
                  RichEditDocumentServer server = new RichEditDocumentServer();
                  server.LoadDocument("document.doc", DocumentFormat.Doc);
                  server.ExportToPdf(memoryStream);
 
                  Application applicationclass = new Application();
                  string[] crefids = filename1.Split('.');
                  for (int mj = 0; mj < crefids.Length; mj++)
                  {
                      randomName = crefids[0].ToString();
                  }
 
                  object Source = fileName;
                  object Target = Server.MapPath("~/Temp/" + randomName + ".txt");
                  fname = Target.ToString();
                  // object Target = @"D:\Alex\ResumeManager Dec 6,2012\ResumeManager\Uploaddocs\test1.txt";

                  //Upload the word document and save to Temp folder
                  // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));

 
                  applicationclass.Documents.Open(ref Source,
                                                  ref readOnly,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref isVisible,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType);
                  applicationclass.Visible = false;
                  Document document = applicationclass.ActiveDocument;
                  object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatUnicodeText;
 
                  //Save the word document as HTML file
                  document.SaveAs(ref Target, ref format, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType);
 
                  //Close the word document
                  document.Close(ref missingType, ref missingType, ref missingType);
 

                  foreach (string str in str1)
                  {
 
                      using (StreamReader sr = new StreamReader(fname))
                      {
 
                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }
 
                  }
              }
 
              else if (ext == ".pdf")
              {
                  string randomName1 = DateTime.Now.Ticks.ToString();
                  string fname1 = "";
 

 
                  object Target1 = Server.MapPath("~/Temp/" + randomName1 + ".txt");
                  fname1 = Target1.ToString();
 
                  PDDocument doc = PDDocument.load(fileName);
                  PDFTextStripper stripper = new PDFTextStripper();
                  string s = stripper.getText(doc).ToLower();
                  System.IO.StreamWriter LogFile = new System.IO.StreamWriter(fname1, true);
                  LogFile.WriteLine(s);
                  LogFile.Close();
                  foreach (string str in str1)
                  {
                      using (StreamReader sr = new StreamReader(fname1))
                      {
 
                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }
 
                  }
              }
              mn++;
 
          }
 

 

          //Upload the word document and save to Temp folder
          // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));

      }
[Edit]Code block added[/Edit]
Posted 9-Jan-13 6:53am
Updated 9-Jan-13 6:55am
ProgramFOX187.5K
v2
Comments
Philippe Mori 11-Jan-16 12:37pm
   
Well, it is up to you to try products to see which one correspond to your needs (performance, ease of use, handling of various formats and variations, exactitude of the result, concurrency, support, price...).

The only thing I know is that you usually don't want to use Office Interrop on a server... Lot of site explain why.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

I perfectly understand if you don't want to mess with Microsoft Office installation and Office interop, but first of all, think why messing with Microsoft Office documents at all — proprietary product is proprietary. These days, there is a number of other option.

Nevertheless, the last versions of Office documents are not so proprietary. You can always learn them, as they are no standardized. Please see:
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats[^],
http://en.wikipedia.org/wiki/Office_Open_XML_file_formats[^].

(Don't mix them up with OpenDocument, http://en.wikipedia.org/wiki/OpenDocument[^].)

Now, there is another approach to it. There are third-party products working with Microsoft Office document. If they can do it, you can, too. You just need to download source code of some open-source products and find out how it works. The only open-source code I know is OpenOffice itself (where .odt came from) and its fork LibreOffice. Please see:
http://en.wikipedia.org/wiki/OpenOffice.org[^],
http://www.openoffice.org/[^],
http://en.wikipedia.org/wiki/LibreOffice[^],
http://www.libreoffice.org/[^].

You can download the source and find the code working with nearly all versions of Office documents. And, of course, .ODT and all other OpenOffice/LibreOffice documents.

Please also see my past answers:
Convert Office-Documents to PDF without interop[^],
Hi how can i display word file in windows application using c#.net[^].

—SA
  Permalink  
Comments
Maciej Los 9-Jan-13 16:51pm
   
Agree, +5!
   
Thank you, Maciej.
—SA
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 4

See my comment to Sergey's answer, and read this: http://a.nnotate.com/server-installation-windows.html[^] - section: Adding support for uploading DOC, PPT, XLS etc using OpenOffice.
  Permalink  
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 6

Add the Namespace using Add Reference-->Browse-->Code7248.word_reader.dll

download the Dll from the given URL :

http://sourceforge.net/p/word-reader/wiki/Home/

(A simple .NET Library compatible with .NET 2.0, 3.0, 3.5 and 4.0 for C#. It can currently extract only the raw text from a .doc or .docx file.)

The Sample Code is in simple Console in C#:

using System;
using System.Collections.Generic;
using System.Text;
//add extra namespaces
using Code7248.word_reader;

namespace testWordRead
{
class Program
{
private void readFileContent(string path)
{
TextExtractor extractor = new TextExtractor(path);
string text = extractor.ExtractText();
Console.WriteLine(text);
}
static void Main(string[] args)
{
Program cs = new Program();
string path = "D:\Test\testdoc1.docx";
cs.readFileContent(path);
Console.ReadLine();
}
}
}


It is working fine.
  Permalink  
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Mobile
Web02 | 2.8.160927.1 | Last Updated 11 Jan 2016
Copyright © CodeProject, 1999-2016
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100