Click here to Skip to main content
15,077,688 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Actually i want to read some particular content from word document using openxml. I've tried the below code for reading the document first. but i'm getting "file contains corrupted data".
Is anyone came across openxml word document requirement. Kindly let me have a sample code. Thanks in advance.

What I have tried:

C#
 public void main..
    {
        
            string path = "";
            path = path + "D:\\OpenXml_Word\\Test_Docx.doc";
            SearchWordIsMatched(path);
        
    }


public void SearchWordIsMatched(string path)
    {
        try
        {
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(path, true))
            {
                Stream stream = File.Open(path, FileMode.Open);
                Body body = wordDoc.MainDocumentPart.Document.Body;
                string content = body.InnerText;

            }
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }
Posted
Updated 5-Aug-21 2:38am
v2

Hello,

Check this tutorial from Microsoft:
> How to: Open a word processing document for read-only access (Open XML SDK)[^]

@JAFC
   
v2
Comments
rajah rajah 13-Oct-16 8:15am
   
Hi jose,
Thanks for the response.

I'm getting "The specified package is invalid. The main part is missing." error in this part of the code

using (WordprocessingDocument wordDocument =
WordprocessingDocument.Open(filepath, false))

My input file format is .docx

Kindly suggest
Open XML Formats and file name extensions - Office Support[^]

OpenXML will not work with doc format but with docx format. So if you are trying with doc format then first change it extension using below method

]public void ConvertDocToDocx(string path)
{
Application word = new Application();

if (path.ToLower().EndsWith(".doc"))
{
var sourceFile = new FileInfo(path);
var document = word.Documents.Open(sourceFile.FullName);

string newFileName = sourceFile.FullName.Replace(".doc", ".docx");
document.SaveAs2(newFileName, WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: WdCompatibilityMode.wdWord2010);

word.ActiveDocument.Close();
word.Quit();

File.Delete(path);
}
}

Now OpenXML will parse your data along with some JUNK Numbers which we cannot get rid of. But we can do verification that our data exists.
   
Comments
Dave Kreskowiak 5-Aug-21 9:41am
   
So, you have to use code using Word Interop to resave the file in a new format just so you avoid using Word Interop to process the new file?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900