table2xml is an algorithm having two functions that allow the conversion of tabular data to XML without using XSLT. This is achieved by reverting the "Flattener" methodology used by Excel to convert the XML tree format to a two-dimensional table (see Opening XML Files in Excel and INFO: Microsoft Excel 2002 and XML).
This reversion is achieved by:
- (possibly) modifying the flattened table a bit to enable a simpler processing of the data,
- sequentially processing the data column- and row wise.
The whole algorithm is done without the aid of any XML library (e.g., MSXML), so it lends itself to easy translation into other environments and languages (I'm currently working on a Java version, a Perl version is available here, a VB/VBScript solution should be straightforward).
A solid understanding of the Excel Flattener is helpful to understand the requirements for arranging the columns respecting the restrictions on the data. Good introductory material on this are the two MSDN articles above, but the best way to understand the Flattener is to import sample XML files (be aware that this XML directive need to be present: "
<?xml version="1.0"?>") into Excel and see the effects.
Using the code
In the Excel add-in table2xml.xla, there are two functions:
Function parseHeader(givenRootNodeName As String, header As Variant) _<BR> As Boolean
which is used to read the header information into the four "assisting" global arrays:
colPaths (the path information in the header, excluding the attribute name)
attrNames (the attribute name if the column path specifies an attribute (../@name))
isIDCol (whether the current column is an "id" column (../#id))
specialCommonSibling (specifies the columns marked with "//" at the beginning)
Function writeLine(lineData As Variant) As String
which is used to return the XML for the current row (called for each row, XML has to be collected), the row data being contained in
To actually use the code, add a reference to table2xml.xla (Testsheet.xls demonstrates in the
Workbook.open procedure how to do that programmatically), prepare the range/table as described next, and use the functions as demonstrated in Producing the XML.
Preparing the Range/Table
Open the target XML file in Excel (don't forget the XML directive there: "<?xml version="1.0"?>")
#agg node columns (will produce an error in
move the common root (or the common subnode) siblings leftmost of the root (or resp. Subnode)
For nested common sibling nodes (e.g.,
<root><a><b>test1</b><c>test2</c></a><otherData>…<root>), write a double slash ("//") at the beginning of the last node within the nested sibling.
Example (also includes column moving as in the examples above):
For a first column of a subnode list that is not being a "primary key" column (i.e., having empty cells or continuous equal values), introduce an artificial
Producing the XML
First, we need to invoke
parseHeader, using the (header) line with the path information.
After parsing the header info, the table data can be processed row by row by calling
writeLine, providing the "assisting" arrays as references. The current data row is provided in
lineData, returning the XML for the current row (needs to be collected)
A final call to
writeLine restores the static variables and finalizes the (collected) XML string (closes any open tags).
A complete procedure example to "unflatten" an Excel range (into a string variable
testXML) produced by the XML-flattener is as follows:
Function testXML() As String
rootNodeName = Replace(Selection.Cells(1, 1).Value, "/", "")
ActiveSheet.Range(Selection.Cells(2, 1), _
headerLine = _
singleCell = (TypeName(headerLine) = "String")
result = parseHeader(rootNodeName, IIf(singleCell, _
Array("", headerLine), headerLine))
resultXML = ""
For i = 2 To Selection.Rows.Count
lineData = WorksheetFunction.Transpose(
resultXML = resultXML & writeLine(IIf(singleCell, _<BR> Array("", lineData), lineData))
resultXML = resultXML & writeLine(Null)
IIf(singleCell, Array("", lineData), lineData) is used to overcome the special treatment of single cells in
Points of Interest
TestSheet.xls is both a demonstration tool and also a collection of "unit" tests for the algorithm, the idea is to add XML examples (flattened), correctly preparing it to be processed back into XML and then run tests.
The way to add tests:
Add the flattened and corrected table to the TestSheet.xls in a sheet ending in "Tests", having at least one empty row between the previous and the next test case.
Create a comment in the root node name's cell (left/topmost cell) that contains the original XML. This is then compared against the produced XML, which is also stored in the original XML neighbour cell's comment.
All tests are run by calling the
runTestsheets procedure (accessible also with the running man icon in the test sheet toolbar):
The dustbin is for clearing the produced XML comments from the current sheet.
Single test cases can be executed in the current range (select root node name cell) by pressing Ctrl+r, a whole sheet of test cases can be executed by pressing Ctrl+R.
Ctrl+b pressed while having selected a cell containing an original XML text will create a test case in a new workbook, which can then be appended to the test sheet.
The sheet PubsQueryTests contains queries that will probably fail in your environment, these demonstrate the production of XML out of database queries (using MS SQL Server and the pubs demo database). Please update or delete accordingly.
Generally, pay close attention to the ordering of columns and constraints on the data as described above, since the algorithm in
writeLine doesn't check for validity, thus producing invalid XML in case of failing to follow preparation steps correctly.
In mixed content nodes, the only way to correctly (re)produce the XML is with the content being right after the node name. There's currently no way to produce mixed content nodes with more than one text node (e.g.,
<node>text1<subnode>Test</subnode>text2</node> and the like).
Same sequential parent nodes are "factored" out by the flattener, so the unflattening algorithm treats them as being factored out, which means there is no way to exactly reproduce (
<a><b>test1</b></a><a><b>test2</b></a>, this would be processed as
<a><b>test1</b><b>test2</b></a>, which is semantically equal, but not the same...).
When converting Excel content with the current test procedure (
testXML, see above), date values become numbers because the processed row is converted into an array with
WorksheetFunction.Transpose. If you want to create XML from date values, either convert them to text fields before (as I did for testing), or implement a different array creation method (i.e., cell by cell).
2007-01-07: Updated add-in to produce unicode and immediately write XML (no collection in strXML anymore -> memory issues). Also there's no need to give the header context anymore, I decided to put that into global variables.