Merging Word Documents with C#






4.13/5 (7 votes)
Jul 1, 2004
3 min read

162788

2353
This article describes how to, given an initial file and a set of modified versions of that file, generate a summarized document with all the changes.
Introduction
In my current project, the customer wants a set of features that will allow the administrator to publish a document in a website, let users work in that document (either changing it or adding comments) and submit the modified version back to the site. Then, the administrator would be able to see a summarized document with the changes introduced by all the users, allowing him to approve or reject them in a change-by-change basis.
Although I saw it quite difficult at first glance, digging deeper into the documentation, I found that besides the "track changes" feature in Word is also possible to merge several documents into a final one, so that all changes will be available in a single document. Of course, such feature could be used by means of VBA automation; hence, it's also available in .NET framework via COM interop.
The solution
In order to implement a solution, I built a C# ASP.NET project in Visual Studio .NET 2003 with the following structure:
In a nutshell, the Default.apsx page is the front end, the DocMerger performs the actual merging and the documents are located in the "files" folder ("OriginalDoc" contains the original version of the document, "Copies" is where all the uploaded files from the users reside, and "Output" is the folder in which the summarized document is generated).
The web form is fairly simple and allows downloading the original document, uploading a modified version and creating the output document.
The initial version is published in the server with the "Track Changes" option turned on, so every change the user does will be easily recognized.
Let's suppose this is the document:
The upload logic in the web page is quite straightforward: It creates a unique name for the changed document (I did this trick with Guid.NewGuid()), and then it stores the file in the "Copies" folder.
private void btnGo_Click(object sender, System.EventArgs e)
{
string strBaseDir = Server.MapPath("files/copies");
string strFileName = Guid.NewGuid().ToString().Replace("{","").Replace("}","");
upload.PostedFile.SaveAs(Path.Combine(strBaseDir, strFileName + ".doc"));
}
Following our previous example, let's say three users changed the document separately.
User 1
User 2
User 3
In the "copies" folder are located all the copies that different users submitted:
The DocMerger class has a Merge method that performs the document combination. I wrote an overload with a folder name instead of a file list.
/// <summary>
/// Merge a document with a set of copies
/// </summary>
/// <param name="strOrgDoc">
/// Original file name
/// </param>
/// <param name="arrCopies">
/// File names of the modified files
/// </param>
/// <param name="strOutDoc">
/// The result filename
/// </param>
public void Merge(string strOrgDoc, string[] arrCopies, string strOutDoc)
{
ApplicationClass objApp = null;
//boxing of default values for COM interop purposes
object objMissing = Missing.Value;
object objFalse = false;
object objTarget = WdMergeTarget.wdMergeTargetSelected;
object objUseFormatFrom = WdUseFormattingFrom.wdFormattingFromSelected;
try
{
objApp = new ApplicationClass();
object objOrgDoc = strOrgDoc;
Document objDocLast = null;
Document objDocBeforeLast = null;
objDocLast = objApp.Documents.Open(
ref objOrgDoc, //FileName
ref objMissing, //ConfirmVersions
ref objMissing, //ReadOnly
ref objMissing, //AddToRecentFiles
ref objMissing, //PasswordDocument
ref objMissing, //PasswordTemplate
ref objMissing, //Revert
ref objMissing, //WritePasswordDocument
ref objMissing, //WritePasswordTemplate
ref objMissing, //Format
ref objMissing, //Enconding
ref objMissing, //Visible
ref objMissing, //OpenAndRepair
ref objMissing, //DocumentDirection
ref objMissing, //NoEncodingDialog
ref objMissing //XMLTransform
);
foreach(string strCopy in arrCopies)
{
Debug.WriteLine("Merging file " + strCopy);
objDocLast.Merge(
strCopy, //FileName
ref objTarget, //MergeTarget
ref objMissing, //DetectFormatChanges
ref objUseFormatFrom, //UseFormattingFrom
ref objMissing //AddToRecentFiles
);
objDocBeforeLast = objDocLast;
objDocLast = objApp.ActiveDocument;
Debug.WriteLine("The active document is " + objDocLast.Name);
if (objDocBeforeLast != null)
{
Debug.WriteLine("Closing " + objDocBeforeLast.Name);
objDocBeforeLast.Close(
ref objFalse, //SaveChanges
ref objMissing, //OriginalFormat
ref objMissing //RouteDocument
);
}
}
object objOutDoc = strOutDoc;
objDocLast.SaveAs(
ref objOutDoc, //FileName
ref objMissing, //FileFormat
ref objMissing, //LockComments
ref objMissing, //PassWord
ref objMissing, //AddToRecentFiles
ref objMissing, //WritePassword
ref objMissing, //ReadOnlyRecommended
ref objMissing, //EmbedTrueTypeFonts
ref objMissing, //SaveNativePictureFormat
ref objMissing, //SaveFormsData
ref objMissing, //SaveAsAOCELetter,
ref objMissing, //Encoding
ref objMissing, //InsertLineBreaks
ref objMissing, //AllowSubstitutions
ref objMissing, //LineEnding
ref objMissing //AddBiDiMarks
);
foreach(Document objDocument in objApp.Documents)
{
objDocument.Close(
ref objFalse, //SaveChanges
ref objMissing, //OriginalFormat
ref objMissing //RouteDocument
);
}
}
finally
{
objApp.Quit(
ref objMissing, //SaveChanges
ref objMissing, //OriginalFormat
ref objMissing //RoutDocument
);
objApp = null;
}
}
/// <summary>
/// Merge a document with a set of copies
/// </summary>
/// <param name="strOrgDoc">
/// Original file name
/// </param>
/// <param name="strCopyFolder">
/// Folder in which the copies are located
/// </param>
/// <param name="strOutDoc">
/// The result filename
/// </param>
public void Merge(string strOrgDoc, string strCopyFolder, string strOutDoc)
{
string[] arrFiles = Directory.GetFiles(strCopyFolder);
Merge(strOrgDoc, arrFiles, strOutDoc);
}
The WebForm
invokes this method:
private void btnMerge_Click(object sender, System.EventArgs e)
{
string strOrigFile = Server.MapPath("files/originaldoc/thedocument.doc");
string strCopiesDir = Server.MapPath("files/copies");
string strOutputDir = Server.MapPath("files/output/output.doc");
DocMerger objMerger = new DocMerger();
objMerger.Merge(strOrigFile, strCopiesDir, strOutputDir);
lnkResult.NavigateUrl = "files/output/output.doc";
lnkResult.Visible = true;
}
The final outcome is a document with the changes proposed by all the users:
This way, the administrator will be able to see just in one place all the changes and either approve or reject each one of them.
The code of this simple application could be found with this article, but there's a couple of thinks you must keep in mind:
- I generated the interop assemblies for Office 2003. You have to re-import the Office's type libraries (I think it will work in Office XP, 2000 and even in 97) and recompile the project
- Microsoft discourages the use of Office Automation in a web server. Nonetheless, all the tests that I did were fine. However, if needed, the DocMerger class could be moved to another kind of project and replace the front end with a Console or a WinForm Application.
Conclusion
By using automation in the .NET framework, taking advantage of Office's features is simple, and it allows the delivery of quick-to-implement and cool-featured solutions.
History
- 2004/06/29 - Initial upload