Click here to Skip to main content
15,881,803 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Is there any sample of splitting docx file into multiple files by searching keywords?

for example we have a big word file which has repeating content but with different ID and informations. I would like to split them into separate files.

How can we achieve this in C#?

Separating based on sections or paragraphs is not suitable for our scenario.

I think there is no need to mention that should keep the formatting.

Thanks.

PS: My current code is as following. This code doesnt work properly. The count value which I get inside loop and pass to split seems to be incorrect.

C#
int count = 0;
 	   
	    OpenXmlPowerToolsDocument doc = WmlDocument.FromFileName(TxtSource.Text);
            using (OpenXmlMemoryStreamDocument streamDoc = new OpenXmlMemoryStreamDocument(doc))
            using (WordprocessingDocument document = streamDoc.GetWordprocessingDocument())
            {
                XDocument mainDocument = document.MainDocumentPart.GetXDocument();
                                
                for (int i = 1; i < 1000; i++)
                {

                    IEnumerable<xelement> content = mainDocument.Document.Descendants(W.p).Skip(i).Take(1);
                                           
                        Regex regex = new Regex("Delimiter");
                        count = OpenXmlRegex.Match(content, regex);
                        if (count >0)
                        {
                            count = i;
                            break;
                        }                    
                    
                }


            }

            List<source> documentSource = new List<source> {
                new Source(new WmlDocument(TxtSource.Text), 0, count, true)
            };


            int filenumber = 2;
            string filename = string.Format("{0}test_{1}.docx", Txtdest.Text, filenumber);
            DocumentBuilder.BuildDocument(documentSource, filename);
Posted
Updated 4-Jan-15 22:08pm
v3
Comments
CHill60 5-Jan-15 4:32am    
You can't use a count to define a location - you might be better off using Eric White's suggestion of breaking it down into paragraphs. You could always check for the presence of the delimiter within a paragraph to decide whether that paragraph ends up in the document or not.

1 solution

It's not clear whether you are asking how to manipulate word files with C# (so why have you tagged your question with VB.NET??) OR you are asking how to search word files with C#.

These CodeProject links should help ...

Search and highlight text in MS Word through C#[^]

C#: Create and Manipulate Word Documents Programmatically Using DocX[^]

and here is an example of a splitter, from SO[^] although rumour has it it that it is not fully working - an exercise for yourself perhaps
 
Share this answer
 
Comments
Maciej Los 5-Jan-15 2:55am    
+5!
CHill60 5-Jan-15 2:57am    
Thank you!
JShayan 5-Jan-15 3:33am    
Basically what I meant is I dont mind if the solution is in VB.NET.

First link is completely different, it is using Word Object model that I dont want. I am looking for solution using OpenXml.

The splitter sample is also implemented using Word Object model

Only relevant link is DOCX article which doesnt contain splitting sample.
If there is, I would be happy to fulfil my requirement.

Thanks.
CHill60 5-Jan-15 3:40am    
Ah - that is the beauty of being specific in the question and/or the tags, you get answers that are more appropriate!
However there should be enough on that article to get you started - find the split point and copy everything from prior to that point to a new docx...
JShayan 5-Jan-15 3:53am    
That is the point. I have tried already. My problem is finding the index point of splitting. I looped through paragraphs and searched in paragraph text for my text, and once I have found my delimiter in that paragraph I used that paragraph number as index of splitting. But it slitted the document from wrong place.
I dont know what is the problem.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900