Click here to Skip to main content
Click here to Skip to main content

A SOA Approach to Dynamic DOCX-PDF Report Generation - Part 1

, 23 Aug 2010 CDDL
Rate this:
Please Sign up or sign in to vote.
Generating docx reports in a client-server architecture, without using MS Office.

Introduction & Background

With the advent of MS Office 2007 Open XML formats, the philosophy of Office report generation was deeply changed into making it detached from Office itself and open to any kind of programming language which is capable of reading compressed archives and manipulating XML. For further reading, visit:

In this article, I'm going to illustrate an SOA approach for generating Docx reports in a distributed environment with the necessity of having MS Office 2007 installed only on the developer machine (not the production server). The application is composed of the following parts:

  1. An ASP.NET web application
  2. An IIS-hosted WCF service
  3. A business tier
  4. A data access tier
  5. A database

The scope of this article is limited to the top two tiers. By using the Open XML SDK (now 2.0), it's possible to programmatically read and write inside Office Open XML packages - that means, reading and writing Office files without using Office COM objects. This approach is very fast, easy, light on resources, and stable. The WCF service in this application must be able to create Docx reports on the basis of an existing docx template and some database data serialized as XML. Docx files are constructed in a modular way. To be able to appreciate this, you can just rename a docx file, changing its extension to ".zip". To know more about how this archive is organized, visit http://msdn.microsoft.com/en-us/library/bb266220%28office.12%29.aspx. The part that we're interested in is called Custom XML (read http://msdn.microsoft.com/en-us/library/bb608618.aspx). The approach that's best to follow for manipulating data within a Docx file is binding content controls to custom XML parts.

The need for such a system arised in my company when the head office requested a data report accessible through the web which was supposed to be formatted exactly like they wanted. They supplied a sample docx document containing sample data, and they expected automatic generation of those reports. With this system, there's no need to painfully replicate a Word format in HTML, because the system input is the docx template itself. The future of this application involves PDF conversion of the docx reports, which totally eliminates the need of having MS Office installed anywhere in the system.

1. Generating a docx template document

The first thing to do is to build a docx document which defines the layout of the reports by using Word 2007 or above. In this document, there are going to be static parts (text-blocks, images, and so on), and dynamic parts which are going to be dependent on the data. At first, we build and format the docx file as we expect it to look with dynamic data on it. Then, when we're happy enough with the way it looks, it's time to add the content controls. On the Word ribbon, we need to go to the Developer tab (if you don't see it, click here to learn how to activate it). In this tab, we can find some content controls, such as rich text, plain text, image, etc. We now need to replace the sample static data that we've put into the document with the appropriate content controls.

Word ribbon

Word template

2. Creating custom XML parts

Using Word 2007, we're able to put Content Controls into a docx document, but we're not able to bind those controls to custom data. In order to do this, we either need to modify the XML files inside the docx archive "manually", or follow the much simpler approach of using a tool like Word 2007 Content Control Toolkit. At this point, our docx document doesn't contain any custom XML parts. We can create these by using WCCT. Open the docx document inside WCCT. On the right panel, click on "Create a new Custom XML part". The custom XML part will be created and we'll be able to see it from the "Bind view" tab. On the left part of the window, we will be able to see references to the content controls that we've inserted in the file. Clicking on the "Edit view" tab of the right panel, it's possible to edit the XML. The XML structure that we need to create has to be valid, and needs to correspond to the content controls in the page. For example:

<documentData>
    <title alias="Title">document title</title>
    <body alias="Body">document body</body>
</documentData>

WCCT 2

When we've finished creating the XML, it's always good to get the XML syntax checked by WCCT by clicking on the "Check Syntax" button. We're now ready to go back to the "Bind View". We will now be able to see the XML nodes we've just inserted in a tree-like structure, and the fun part is about to begin. We'll now bind the XML nodes to the content controls, and this is as easy as drag-and-drop. Select one of the nodes on the right panel, and drag it on the reference to one of the content controls of the document. Repeat this operation for all of the XML nodes until all the content controls have been bound to data. When you're done, save the file and click on the Preview button to open the document using Word. Notice how the custom XML data has replaced the text inside the content controls.

WCCT 3

WCCT 4

3. Building the WCF service

The WCF service will replace the custom XML inside the docx template with business logic XML data. Using the Open XML SDK, this is actually very easy. Here's the replaceCustomXML method:

/// <summary>
/// Replaces the custom XML part inside a docx file with the specified customXML
/// </summary>
/// <param name="docxTemplate">Docx file to modify</param>
/// <param name="customXML">Custom XML part with the 
/// data to insert in the docx document</param>
private void replaceCustomXML(string docxTemplate, string customXML)
{
    try
    {
        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(
            docxTemplate, true))
        {
            MainDocumentPart mainPart = wordDoc.MainDocumentPart;
           
            mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
            //Add a new customXML part and then add content
            CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(
                 CustomXmlPartType.CustomXml);
            //copy the XML into the new part...
            using (StreamWriter ts = new StreamWriter(customXmlPart.GetStream()))
                ts.Write(customXML);
        }
    }
    catch (Exception ex)
    {
        throw new FaultException("WCF error!\r\n" + ex.Message);
    }
}

4. Building the ASP.NET client

The ASP.NET client will have a template.xml file which replicates the structure of the custom XML part in the server's docx template. Ideally, there would be a web page which automatically generates web controls for inputting data which mirrors the structure of the XML template file. After the data is inputted, the web client must compose an XML document which follows the structure of the existing template.xml but replaces the data with those inputted by the user. The XML string is then sent to the WCF service which returns the bytes of the docx file. These bytes can then either be saved as a docx file on the server, or sent directly to the client through HTTP.

ASP.NET client 1

ASP.NET client 2

5. Points of interest

Using the Office Open XML SDK 2.0 is a piece of cake, and it's a revolutionary approach to generating MS Office based reports.

The best approach to inputting custom data in a docx document is to bind content controls to XML. Actually, Microsoft, in the beginning, took another alternative approach which permitted more flexibility (associating an XML schema to documents), but it was stopped due to patent infringement issues. (Go here if you're interested to read more about this.) Word Content Control Toolkit makes life a lot easier when it comes to binding custom XML to content controls.

The Office Open XML format gives the possibility of generating MS Office documents without needing to interface to MS Office components. This gives the possibility of building distributed applications.

More to come: the future of this demo application includes adding PDF conversion. By doing this, the need of having MS Office installed somewhere in the system is totally eliminated, because PDF becomes the document exchange format.

Part 2: A SOA Approach to Dynamic DOCX-PDF Report Generation

License

This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)

Share

About the Author

Erion Pici
Software Developer (Senior)
Italy Italy
I've been involved in object-oriented software development since 2006, when I graduated in Information and TLC Engineering at the Università degli Studi di Perugia, in Italy. I've been working for several software companies / departments, mainly on Microsoft and Sun technologies. My favourite programming language is C#, next comes Java.
I love design patterns and when I need to resolve a problem, I try to get the best solution, which is often not the quickest one.
 
"On the best teams, different individuals provide occasional leadership, taking charge in areas where they have particular strengths. No one is the permanent leader, because that person would then cease to be a peer and the team interaction would begin to break down. The structure of a team is a network, not a hierarchy ..."
My favourite team work quotation by DeMarco - Lister in Peopleware

Comments and Discussions

 
Generalhandling images Pinmember@vish12-Aug-10 11:20 
GeneralRe: handling images PinmemberErion Pici16-Aug-10 8:53 
I need to do some research on that, because unlike text, images get compressed and packed into the docx archive. The reference to the image ends up in the custom xml part.
GeneralRe: handling images Pinmember@vish16-Aug-10 9:04 
GeneralRe: handling images PinmemberErion Pici16-Aug-10 9:10 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141223.1 | Last Updated 23 Aug 2010
Article Copyright 2010 by Erion Pici
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid