65.9K
CodeProject is changing. Read more.
Home

Merging Word Documents with C#

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.13/5 (7 votes)

Jul 1, 2004

3 min read

viewsIcon

162788

downloadIcon

2353

This article describes how to, given an initial file and a set of modified versions of that file, generate a summarized document with all the changes.

Introduction

In my current project, the customer wants a set of features that will allow the administrator to publish a document in a website, let users work in that document (either changing it or adding comments) and submit the modified version back to the site. Then, the administrator would be able to see a summarized document with the changes introduced by all the users, allowing him to approve or reject them in a change-by-change basis.

Although I saw it quite difficult at first glance, digging deeper into the documentation, I found that besides the "track changes" feature in Word is also possible to merge several documents into a final one, so that all changes will be available in a single document. Of course, such feature could be used by means of VBA automation; hence, it's also available in .NET framework via COM interop.

The solution

In order to implement a solution, I built a C# ASP.NET project in Visual Studio .NET 2003 with the following structure:

In a nutshell, the Default.apsx page is the front end, the DocMerger performs the actual merging and the documents are located in the "files" folder ("OriginalDoc" contains the original version of the document, "Copies" is where all the uploaded files from the users reside, and "Output" is the folder in which the summarized document is generated).

The web form is fairly simple and allows downloading the original document, uploading a modified version and creating the output document.

The initial version is published in the server with the "Track Changes" option turned on, so every change the user does will be easily recognized.

Let's suppose this is the document:

The upload logic in the web page is quite straightforward: It creates a unique name for the changed document (I did this trick with Guid.NewGuid()), and then it stores the file in the "Copies" folder.

private void btnGo_Click(object sender, System.EventArgs e)
{
 string strBaseDir = Server.MapPath("files/copies");
 string strFileName = Guid.NewGuid().ToString().Replace("{","").Replace("}","");
 upload.PostedFile.SaveAs(Path.Combine(strBaseDir, strFileName + ".doc"));
}

Following our previous example, let's say three users changed the document separately.

User 1

User 2

User 3

In the "copies" folder are located all the copies that different users submitted:

The DocMerger class has a Merge method that performs the document combination. I wrote an overload with a folder name instead of a file list.

    
        
    /// <summary>
    /// Merge a document with a set of copies
    /// </summary>
    /// <param name="strOrgDoc">
    /// Original file name
    /// </param>
    /// <param name="arrCopies">
    /// File names of the modified files
    /// </param>
    /// <param name="strOutDoc">
    /// The result filename
    /// </param>
    public void Merge(string strOrgDoc, string[] arrCopies, string strOutDoc)
    {
      ApplicationClass objApp = null;

      //boxing of default values for COM interop purposes
      object objMissing = Missing.Value;
      object objFalse = false;
      object objTarget = WdMergeTarget.wdMergeTargetSelected;
      object objUseFormatFrom = WdUseFormattingFrom.wdFormattingFromSelected;

      try
      {
        objApp = new ApplicationClass();
        object objOrgDoc = strOrgDoc;
        
        Document objDocLast = null;
        Document objDocBeforeLast = null;

        objDocLast = objApp.Documents.Open(
          ref objOrgDoc,    //FileName
          ref objMissing,   //ConfirmVersions
          ref objMissing,   //ReadOnly
          ref objMissing,   //AddToRecentFiles
          ref objMissing,   //PasswordDocument
          ref objMissing,   //PasswordTemplate
          ref objMissing,   //Revert
          ref objMissing,   //WritePasswordDocument
          ref objMissing,   //WritePasswordTemplate
          ref objMissing,   //Format
          ref objMissing,   //Enconding
          ref objMissing,   //Visible
          ref objMissing,   //OpenAndRepair
          ref objMissing,   //DocumentDirection
          ref objMissing,   //NoEncodingDialog
          ref objMissing    //XMLTransform
          );

        foreach(string strCopy in arrCopies)
        {
          Debug.WriteLine("Merging file " + strCopy);
          objDocLast.Merge(
            strCopy,                //FileName    
            ref objTarget,          //MergeTarget
            ref objMissing,         //DetectFormatChanges
            ref objUseFormatFrom,   //UseFormattingFrom
            ref objMissing          //AddToRecentFiles
            ); 
          objDocBeforeLast = objDocLast;
          objDocLast = objApp.ActiveDocument;
          Debug.WriteLine("The active document is " + objDocLast.Name);

          if (objDocBeforeLast != null)
          {
            Debug.WriteLine("Closing " + objDocBeforeLast.Name);
            objDocBeforeLast.Close(
              ref objFalse,     //SaveChanges
              ref objMissing,   //OriginalFormat
              ref objMissing    //RouteDocument
              );
          }
            
          
        }

        object objOutDoc = strOutDoc;
      
        objDocLast.SaveAs(    
          ref objOutDoc,      //FileName
          ref objMissing,     //FileFormat
          ref objMissing,     //LockComments
          ref objMissing,     //PassWord     
          ref objMissing,     //AddToRecentFiles
          ref objMissing,     //WritePassword
          ref objMissing,     //ReadOnlyRecommended
          ref objMissing,     //EmbedTrueTypeFonts
          ref objMissing,     //SaveNativePictureFormat
          ref objMissing,     //SaveFormsData
          ref objMissing,     //SaveAsAOCELetter,
          ref objMissing,     //Encoding
          ref objMissing,     //InsertLineBreaks
          ref objMissing,     //AllowSubstitutions
          ref objMissing,     //LineEnding
          ref objMissing      //AddBiDiMarks
          );

        foreach(Document objDocument in objApp.Documents)
        {
          objDocument.Close(
            ref objFalse,     //SaveChanges
            ref objMissing,   //OriginalFormat
            ref objMissing    //RouteDocument
            );
        }
        
      }
      finally
      {
        objApp.Quit(          
          ref objMissing,     //SaveChanges
          ref objMissing,     //OriginalFormat
          ref objMissing      //RoutDocument
          );
        objApp = null;
      }
    }

    /// <summary>
    /// Merge a document with a set of copies
    /// </summary>
    /// <param name="strOrgDoc">
    /// Original file name
    /// </param>
    /// <param name="strCopyFolder">
    /// Folder in which the copies are located
    /// </param>
    /// <param name="strOutDoc">
    /// The result filename
    /// </param>
    public void Merge(string strOrgDoc, string strCopyFolder, string strOutDoc)
    {
      string[] arrFiles = Directory.GetFiles(strCopyFolder);
      Merge(strOrgDoc, arrFiles, strOutDoc);
    }
    

The WebForm invokes this method:

    private void btnMerge_Click(object sender, System.EventArgs e)
    {
      string strOrigFile = Server.MapPath("files/originaldoc/thedocument.doc");
      string strCopiesDir = Server.MapPath("files/copies");
      string strOutputDir = Server.MapPath("files/output/output.doc");
      DocMerger objMerger = new DocMerger();
      objMerger.Merge(strOrigFile, strCopiesDir, strOutputDir);
      lnkResult.NavigateUrl = "files/output/output.doc";
      lnkResult.Visible = true;
    }

    

The final outcome is a document with the changes proposed by all the users:

This way, the administrator will be able to see just in one place all the changes and either approve or reject each one of them.

The code of this simple application could be found with this article, but there's a couple of thinks you must keep in mind:

  • I generated the interop assemblies for Office 2003. You have to re-import the Office's type libraries (I think it will work in Office XP, 2000 and even in 97) and recompile the project
  • Microsoft discourages the use of Office Automation in a web server. Nonetheless, all the tests that I did were fine. However, if needed, the DocMerger class could be moved to another kind of project and replace the front end with a Console or a WinForm Application.

Conclusion

By using automation in the .NET framework, taking advantage of Office's features is simple, and it allows the delivery of quick-to-implement and cool-featured solutions.

History

  • 2004/06/29 - Initial upload