Introduction & Background
With the advent of Microsoft Office 2007 Open XML formats, the philosophy of Office report generation was deeply changed into making it detached from Office itself and open to any kind of programming language which is capable of reading compressed archives and manipulating XML. For further reading, visit:
In this article, I'm going to illustrate an SOA approach for generating Docx reports in a distributed environment with the necessity of having MS Office 2007 installed only on the developer machine (not the production server). The application is composed of the following parts:
- An ASP.NET web application
- An IIS-hosted WCF service
- A business tier
- A data access tier
- A database
The scope of this article is limited to the top two tiers. By using the Open XML SDK (now 2.0), it's possible to programmatically read and write inside Office Open XML packages - that means, reading and writing Office files without using Office COM objects. This approach is very fast, easy, light on resources, and stable. The WCF service in this application must be able to create Docx reports on the basis of an existing docx template and some database data serialized as XML. Docx files are constructed in a modular way. To be able to appreciate this, you can just rename a docx file, changing its extension to ".zip". To know more about how this archive is organized, visit this link. The part that we're interested in is called Custom XML (read this). The approach that's best to follow for manipulating data within a Docx file is binding content controls to custom XML parts.
The need for such a system arose in my company when the head office requested a data report accessible through the web which was supposed to be formatted exactly like they wanted. They supplied a sample docx document containing sample data, and they expected automatic generation of those reports. With this system, there's no need to painfully replicate a Word format in HTML, because the system input is the docx template itself. The future of this application involves PDF conversion of the docx reports, which totally eliminates the need of having MS Office installed anywhere in the system.
1. Generating a docx Template Document
The first thing to do is to build a docx document which defines the layout of the reports by using Word 2007 or above. In this document, there are going to be static parts (text-blocks, images, and so on), and dynamic parts which are going to be dependent on the data. At first, we build and format the docx file as we expect it to look with dynamic data on it. Then, when we're happy enough with the way it looks, it's time to add the content controls. On the Word ribbon, we need to go to the Developer tab (if you don't see it, click here to learn how to activate it). In this tab, we can find some content controls, such as rich text, plain text, image, etc. We now need to replace the sample static data that we've put into the document with the appropriate content controls.
2. Creating Custom XML Parts
Using Word 2007, we're able to put Content Controls into a docx document, but we're not able to bind those controls to custom data. In order to do this, we either need to modify the XML files inside the docx archive "manually", or follow the much simpler approach of using a tool like Word 2007 Content Control Toolkit. At this point, our docx document doesn't contain any custom XML parts. We can create these by using WCCT. Open the docx document inside WCCT. On the right panel, click on "Create a new Custom XML part". The custom XML part will be created and we'll be able to see it from the "Bind view" tab. On the left part of the window, we will be able to see references to the content controls that we've inserted in the file. Clicking on the "Edit view" tab of the right panel, it's possible to edit the XML. The XML structure that we need to create has to be valid, and needs to correspond to the content controls in the page. For example:
<title alias="Title">document title</title>
<body alias="Body">document body</body>
When we've finished creating the XML, it's always good to get the XML syntax checked by WCCT by clicking on the "Check Syntax" button. We're now ready to go back to the "Bind View". We will now be able to see the XML nodes we've just inserted in a tree-like structure, and the fun part is about to begin. We'll now bind the XML nodes to the content controls, and this is as easy as drag-and-drop. Select one of the nodes on the right panel, and drag it on the reference to one of the content controls of the document. Repeat this operation for all of the XML nodes until all the content controls have been bound to data. When you're done, save the file and click on the Preview button to open the document using Word. Notice how the custom XML data has replaced the text inside the content controls.
3. Building the WCF Service
The WCF service will replace the custom XML inside the docx template with business logic XML data. Using the Open XML SDK, this is actually very easy. Here's the
private void replaceCustomXML(string docxTemplate, string customXML)
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(
using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
catch (Exception ex)
throw new FaultException("WCF error!\r\n" + ex.Message);
4. Building the ASP.NET Client
The ASP.NET client will have a template.xml file which replicates the structure of the custom XML part in the server's docx template. Ideally, there would be a web page which automatically generates web controls for inputting data which mirrors the structure of the XML template file. After the data is inputted, the web client must compose an XML document which follows the structure of the existing template.xml but replaces the data with those inputted by the user. The XML
string is then sent to the WCF service which returns the bytes of the docx file. These bytes can then either be saved as a docx file on the server, or sent directly to the client through HTTP.
5. Points of Interest
Using the Office Open XML SDK 2.0 is a piece of cake, and it's a revolutionary approach to generating MS Office based reports.
The best approach to inputting custom data in a docx document is to bind content controls to XML. Actually, Microsoft, in the beginning, took another alternative approach which permitted more flexibility (associating an XML schema to documents), but it was stopped due to patent infringement issues. (Go here if you're interested to read more about this.) Word Content Control Toolkit makes life a lot easier when it comes to binding custom XML to content controls.
The Office Open XML format gives the possibility of generating MS Office documents without needing to interface to MS Office components. This gives the possibility of building distributed applications.
More to come: The future of this demo application includes adding PDF conversion. By doing this, the need of having MS Office installed somewhere in the system is totally eliminated, because PDF becomes the document exchange format.
Part 2: A SOA Approach to Dynamic DOCX-PDF Report Generation