Click here to Skip to main content
Click here to Skip to main content

table2xml - Unflatten XML tabular data

, 22 Jan 2007 LGPL3
Rate this:
Please Sign up or sign in to vote.
Generic conversion of tabular data to XML by reverting Excel's flattener methodology.

Introduction

table2xml is an algorithm having two functions that allow the conversion of tabular data to XML without using XSLT. This is achieved by reverting the "Flattener" methodology used by Excel to convert the XML tree format to a two-dimensional table (see Opening XML Files in Excel and INFO: Microsoft Excel 2002 and XML).

This reversion is achieved by:

  1. (possibly) modifying the flattened table a bit to enable a simpler processing of the data,
  2. sequentially processing the data column- and row wise.

The whole algorithm is done without the aid of any XML library (e.g., MSXML), so it lends itself to easy translation into other environments and languages (I'm currently working on a Java version, a Perl version is available here, a VB/VBScript solution should be straightforward).

Background

A solid understanding of the Excel Flattener is helpful to understand the requirements for arranging the columns respecting the restrictions on the data. Good introductory material on this are the two MSDN articles above, but the best way to understand the Flattener is to import sample XML files (be aware that this XML directive need to be present: "<?xml version="1.0"?>") into Excel and see the effects.

Using the code

In the Excel add-in table2xml.xla, there are two functions:

Function parseHeader(givenRootNodeName As String, header As Variant) _<BR>                                                                 As Boolean

which is used to read the header information into the four "assisting" global arrays:

  1. colPaths (the path information in the header, excluding the attribute name)

  2. attrNames (the attribute name if the column path specifies an attribute (../@name))

  3. isIDCol (whether the current column is an "id" column (../#id))

  4. specialCommonSibling (specifies the columns marked with "//" at the beginning)

and

Function writeLine(lineData As Variant) As String

which is used to return the XML for the current row (called for each row, XML has to be collected), the row data being contained in lineData.

To actually use the code, add a reference to table2xml.xla (Testsheet.xls demonstrates in the Workbook.open procedure how to do that programmatically), prepare the range/table as described next, and use the functions as demonstrated in Producing the XML.

Preparing the Range/Table

  • Open the target XML file in Excel (don't forget the XML directive there: "<?xml version="1.0"?>")

  • remove all #agg node columns (will produce an error in parseHeader)

  • move the common root (or the common subnode) siblings leftmost of the root (or resp. Subnode)

Examples:

For nested common sibling nodes (e.g., <root><a><b>test</b></a><otherData>…<root> or <root><a><b>test1</b><c>test2</c></a><otherData>…<root>), write a double slash ("//") at the beginning of the last node within the nested sibling.

Example (also includes column moving as in the examples above):

 

For a first column of a subnode list that is not being a "primary key" column (i.e., having empty cells or continuous equal values), introduce an artificial #id column.

Examples:

 

Producing the XML

First, we need to invoke parseHeader, using the (header) line with the path information.

After parsing the header info, the table data can be processed row by row by calling writeLine, providing the "assisting" arrays as references. The current data row is provided in lineData, returning the XML for the current row (needs to be collected)

A final call to writeLine restores the static variables and finalizes the (collected) XML string (closes any open tags).

A complete procedure example to "unflatten" an Excel range (into a string variable testXML) produced by the XML-flattener is as follows:

Function testXML() As String

  ' retrieve the root node and select the proper
  ' header and data region (below root node)
  ActiveCell.CurrentRegion.Select
  rootNodeName = Replace(Selection.Cells(1, 1).Value, "/", "")
  ActiveSheet.Range(Selection.Cells(2, 1), _
        Selection.Cells(Selection.Rows.Count, _
        Selection.Columns.Count)).Select
  ' parse column path headers for attributenames,
  ' id columns and special sibling mark ("//")
  headerLine = _
   WorksheetFunction.Transpose(
     WorksheetFunction.Transpose(Selection.Rows(1)))
  singleCell = (TypeName(headerLine) = "String")
  result = parseHeader(rootNodeName, IIf(singleCell, _
             Array("", headerLine), headerLine))
  ' walk through the whole data to build
  ' the actual XML string (into testXML)
  resultXML = ""
  For i = 2 To Selection.Rows.Count
    lineData = WorksheetFunction.Transpose(
                 WorksheetFunction.Transpose(Selection.Rows(i)))
    resultXML = resultXML & writeLine(IIf(singleCell, _<BR>                                      Array("", lineData), lineData))
  Next
  ' finally finish the XML and reset the static vars
  resultXML = resultXML & writeLine(Null)
End Function

IIf(singleCell, Array("", lineData), lineData) is used to overcome the special treatment of single cells in WorksheetFunction.Transpose.

Points of Interest

TestSheet

TestSheet.xls is both a demonstration tool and also a collection of "unit" tests for the algorithm, the idea is to add XML examples (flattened), correctly preparing it to be processed back into XML and then run tests.

The way to add tests:

  1. Add the flattened and corrected table to the TestSheet.xls in a sheet ending in "Tests", having at least one empty row between the previous and the next test case.

  2. Create a comment in the root node name's cell (left/topmost cell) that contains the original XML. This is then compared against the produced XML, which is also stored in the original XML neighbour cell's comment.

All tests are run by calling the runTestsheets procedure (accessible also with the running man icon in the test sheet toolbar):

The dustbin is for clearing the produced XML comments from the current sheet.

Single test cases can be executed in the current range (select root node name cell) by pressing Ctrl+r, a whole sheet of test cases can be executed by pressing Ctrl+R.

Ctrl+b pressed while having selected a cell containing an original XML text will create a test case in a new workbook, which can then be appended to the test sheet.

The sheet PubsQueryTests contains queries that will probably fail in your environment, these demonstrate the production of XML out of database queries (using MS SQL Server and the pubs demo database). Please update or delete accordingly.

Limitations

Generally, pay close attention to the ordering of columns and constraints on the data as described above, since the algorithm in writeLine doesn't check for validity, thus producing invalid XML in case of failing to follow preparation steps correctly.

In mixed content nodes, the only way to correctly (re)produce the XML is with the content being right after the node name. There's currently no way to produce mixed content nodes with more than one text node (e.g., <node>text1<subnode>Test</subnode>text2</node> and the like).

Same sequential parent nodes are "factored" out by the flattener, so the unflattening algorithm treats them as being factored out, which means there is no way to exactly reproduce (<a><b>test1</b></a><a><b>test2</b></a>, this would be processed as <a><b>test1</b><b>test2</b></a>, which is semantically equal, but not the same...).

When converting Excel content with the current test procedure (testXML, see above), date values become numbers because the processed row is converted into an array with WorksheetFunction.Transpose. If you want to create XML from date values, either convert them to text fields before (as I did for testing), or implement a different array creation method (i.e., cell by cell).

History

2007-01-07: Updated add-in to produce unicode and immediately write XML (no collection in strXML anymore -> memory issues). Also there's no need to give the header context anymore, I decided to put that into global variables.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

Share

About the Author

rkapl
Software Developer (Senior) Austrian Federal Financing Agency
Austria Austria
No Biography provided

Comments and Discussions

 
QuestionThe macro is not working in mac excel 2011 PinmemberBharath L Narasimman8-Jul-14 0:36 
GeneralUnderstand the code in layman terms PinmemberLeoLee9115-Feb-11 22:57 
GeneralEmpty elements PinmemberLuis Agudelo2-Mar-10 1:59 
GeneralQuestion... PinmemberRaghutoo20-Feb-07 10:14 
GeneralRe: Question... Pinmemberrkapl21-Feb-07 10:44 
GeneralRe: Question... PinmemberRaghutoo28-Feb-07 3:22 
QuestionNeed a help PinmemberNikhil Badgujar5-Feb-07 9:27 
First of all thank you for giving an extra ordinary guiding.
Still i need your help, Confused | :confused:
- to parse & fetch the data from various MS-Excel file
- store it in database
- And place in to a final MS-Excel file (according to appropriate cell)which is
viewing the reports.
 
Please reply me...
nikhilbadgujar@gmail.com
 
Nikhil Badgujar.
AnswerRe: Need a help Pinmemberrkapl7-Feb-07 11:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.1411019.1 | Last Updated 22 Jan 2007
Article Copyright 2006 by rkapl
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid