DataSets, XML Schemas, and Excel - Oh My!






4.66/5 (15 votes)
This article demonstrates the amazing transformation power of the DataSet with agnostic XML schemas, XML, and Excel Read/Write (all with colourful XML rendering using XmlBrowser).
Introduction
This article explores the power of the DataSet
in a way that you might not be familiar with. Two important (and very helpful) aspects of the DataSet
are uncovered here:
- A
DataSet
can be used to write to Excel and read from Excel using anOleDbConnection
, very easily. (Note: I use these functions in my data-driven unit tests.) - A
DataSet
(or XML document), in conjunction with an agnostic XML schema (not aDataSet
XML schema) can be used to shape/reshape XML data.
Background
I remember the DAO (Data Access Objects) days and the ADO Recordset
days. I also remember being skeptical when Microsoft introduced yet another library to manipulate data. That was when .NET initially came out. My skepticism waned away very quickly when I started to delve under the covers of the DataSet
. Microsoft introduced a very powerful tool with the DataSet
(even if it isn't sold the way I'm going to sell it to you).
Going back a few years ago, I had designed/developed an Enterprise Management Reporting system for a global company that required MSMQ driven server-side automation of Office. That was an unadvisable task, according to Microsoft - but it was achieved nonetheless (with macros and the like).
Then, I discovered the beauty of writing to and reading from Excel using anOleDbConnection
. This meant I could limit the execution time to just the time required to running macros for the manipulation of data (I didn't need Excel to write to Excel anymore). That greatly improved the efficiency, and reduced the risk of problems because the Excel instance was used for a much shorter time.
I also remember a time when I had to use an XmlWriter
or the FOR XML
clause in SQL Server to build XML data. Using the Relations
collection of ADO.NET to build formatted XML? - also a thing of the past. But, those were my younger days, I rejoiced when I found that I could read XML schemas into a DataSet
and out came the formatted XML!
And, if that wasn't enough to pique your interest, how about the ability (in the attached code) to take an XML document in one format and have it loaded into a DataSet
with XML schema in another format? Who would have thought? A DataSet
as a Transformation Engine - you're kidding right? (I'm not!)
Now, you might say that .NET already provides the ability to shape XML with Strongly-Typed DataSets. But, the problem is that you don't necessarily want .NET DataSet
XML schemas. I have been on a few projects where the tool used to develop XML schema was not VS2005. Admit it, most times, you need to use agnostic XML schemas. The development team isn't necessarily responsible for XML schemas. They don't own the XML schemas. After all, best practice requires us to separate the presentation of the data from the retrieval of data. So, how can we implement a best practice here?
By relaxing the constraints, we can use the XML schema to build the data, then use the XML schema to validate the data. Kill two birds with one stone - so to speak. Enough of the sales pitch - let's take a look at what I am talking about.
Using the code
There are two main modules in this project.
Name | Description |
DataSetMethods |
This class has three methods:
|
ExcelFunctions |
This class contains two overloads to ReadFromExcel to a DataSet and two overloads to WriteToExcel from a DataSet . |
DataSetMethods
code:
/// <summary />
/// This method gathers Table\Column information from a DataSet.
/// It also gathers Relation information from a DataSet.
/// </summary />
public static DataSet gatherMetaData(DataSet ds)
{
XmlDocument xd = new XmlDocument();
xd.Load(@"Metadata Xsd\MetaData.xsd");
DataSet metaData = setupDataSet(xd, new DataSet(),false);
for (int i = 0; i < ds.Tables.Count; i++)
{
for (int j = 0; j < ds.Tables[i].Columns.Count; j++)
{
DataRow dr = metaData.Tables["Table"].NewRow();
dr["Name"] = ds.Tables[i].TableName;
dr["Column"] = ds.Tables[i].Columns[j].ColumnName;
metaData.Tables["Table"].Rows.Add(dr);
}
}
for (int i = 0; i < ds.Relations.Count; i++)
{
for (int j = 0; j < ds.Relations[i].ParentColumns.Length; j++)
{
for (int k = 0; k < ds.Relations[i].ChildColumns.Length; k++)
{
DataRow dr = metaData.Tables["Relations"].NewRow();
dr["ParentTable"] = ds.Relations[i].ParentTable;
dr["ParentColumn"] = ds.Relations[i].ParentColumns[j].ColumnName;
dr["ChildTable"] = ds.Relations[i].ChildTable;
dr["ChildColumn"] = ds.Relations[i].ChildColumns[j].ColumnName;
metaData.Tables["Relations"].Rows.Add(dr);
}
}
}
return metaData;
}
/// <summary />
/// Set up a dataset with an Xml Schema that is DataSet friendly
/// </summary />
public static DataSet setupDataSet(XmlDocument xmlSchema, DataSet ds, bool isDataSet)
{
XmlAttribute msdataAttribute = null;
XmlNode schemaNode = null;
StringReader sr = null;
XmlReader xr = null;
try
{
if (isDataSet)
{
msdataAttribute = xmlSchema.CreateAttribute("xmlns:msdata",
"http://www.w3.org/2000/xmlns/");
msdataAttribute.Value = "urn:schemas-microsoft-com:xml-msdata";
schemaNode = xmlSchema.SelectSingleNode(
"descendant::*[local-name() = 'schema']");
schemaNode.Attributes.Append(msdataAttribute);
msdataAttribute = xmlSchema.CreateAttribute("msdata:IsDataSet",
"urn:schemas-microsoft-com:xml-msdata");
msdataAttribute.Value = "true";
schemaNode.SelectSingleNode("*[local-name() = 'element']"
).Attributes.Append(msdataAttribute);
}
sr = new StringReader(xmlSchema.OuterXml);
xr = XmlReader.Create(sr);
ds.ReadXmlSchema(xr);
ds.EnforceConstraints = false;
}
finally
{
msdataAttribute = null;
schemaNode = null;
sr = null;
xr = null;
}
return ds;
}
/// <summary />
/// Load a Dataset (with XmlSchema applied) with data from an XmlDocument
/// </summary />
public static DataSet LoadXmlData(XmlDocument xmlData, DataSet ds)
{
DataSet tempds = null;
StringReader sr = null;
XmlReader xmlDataReader = null;
try
{
tempds = new DataSet();
sr = new StringReader(xmlData.OuterXml);
xmlDataReader = XmlReader.Create(sr);
tempds.ReadXml(xmlDataReader);
ds.Merge(tempds, false, MissingSchemaAction.Ignore);
}
finally
{
tempds = null;
sr = null;
xmlDataReader = null;
}
return ds;
}
ExcelFunctions
code:
#region Constants
/// <summary />
/// string to use for setting up connection string to Excel
/// </summary />
private const string _excelConnectionString =
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";User ID=" +
"Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";
/// <summary />
/// select statement to read from Excel
/// </summary />
private const string _excelSelect = "select * from [{0}]";
/// <summary />
/// tablename column for DataRow
/// </summary />
private const string _tableName = "TABLE_NAME";
/// <summary />
/// CREATE TABLE Template
/// </summary />
private const string _tableCreate = "CREATE TABLE [{0}] (";
/// <summary />
/// COLUMN Template for CREATE TABLE
/// </summary />
private const string _tableColumn = "[{0}] {1}{2}";
#endregion
#region Private Methods
/// <summary />
/// Very simple function to specify Excel DataType mapping.
/// </summary />
private static string getColumnType(DataColumn dc)
{
string columnType = "TEXT";
switch (dc.DataType.ToString())
{
case "System.Int64" :
case "System.Double":
case "System.Int32" :
columnType = "NUMERIC";
break;
default:
columnType = "TEXT";
break;
}
return columnType;
}
#endregion
#region Public Methods
#region WriteToExcel(DataSet ds)
/// <summary />
/// Write data from a dataset to a new filename.
/// </summary />
public static void WriteToExcel(DataSet ds)
{
WriteToExcel(ds,ds.DataSetName + ".xls",false);
}
#endregion
#region WriteToExcel(DataSet ds, String fileName, bool append)
/// <summary />
/// Write data from a dataset to a filename.
/// This method can either create a new file or append to
/// an existing Excel file. If append is specified and file does
/// not exist, the file will be created.
/// </summary />
public static void WriteToExcel(DataSet ds, String fileName, bool append)
{
string excelConnectionString = string.Format(_excelConnectionString, fileName);
OleDbConnection excelFile = null;
OleDbCommand excelCmd = null;
OleDbDataAdapter excelDataAdapter = null;
OleDbCommandBuilder excelCommandBuilder = null;
StringBuilder sb = null;
try
{
GC.Collect();
if (File.Exists(fileName) && !append) File.Delete(fileName);
excelFile = new OleDbConnection(excelConnectionString);
excelFile.Open();
// write each DataTable to Excel Spreadsheet
foreach (DataTable dt in ds.Tables)
{
// file does not exist or we don't want to append
if (!File.Exists(fileName) || !append)
{
// build the CREATE TABLE statement
sb = new StringBuilder();
sb.AppendFormat(_tableCreate, dt.TableName);
foreach (DataColumn dc in ds.Tables[dt.TableName].Columns)
{
sb.AppendFormat(_tableColumn, dc.ColumnName,
getColumnType(dc)
, (dc.Ordinal == dt.Columns.Count - 1 ?
")" : ","));
}
excelCmd = new OleDbCommand(sb.ToString(), excelFile);
excelCmd.ExecuteNonQuery();
}
// use the command builder to generate insert
// command for DataSet Update to work
excelDataAdapter =
new OleDbDataAdapter(string.Format(_excelSelect,dt.TableName), excelFile);
excelCommandBuilder = new OleDbCommandBuilder(excelDataAdapter);
excelCommandBuilder.QuotePrefix = "[";
excelCommandBuilder.QuoteSuffix = "]";
try
{
excelDataAdapter.InsertCommand =
excelCommandBuilder.GetInsertCommand();
excelDataAdapter.Update(ds, dt.TableName);
}catch {}
}
}
finally
{
sb = null;
if(excelDataAdapter != null) excelDataAdapter.Dispose();
excelDataAdapter = null;
excelCommandBuilder = null;
if(excelCmd != null) excelCmd.Dispose();
excelCmd = null;
if (excelFile != null)
{
excelFile.Close();
excelFile.Dispose();
}
excelFile = null;
}
}
#endregion
#region ReadFromExcel(string fileName)
/// <summary />
/// Read from an Excel file into a new DataSet
/// </summary />
public static DataSet ReadFromExcel(string fileName)
{
return ReadFromExcel(fileName, new DataSet());
}
#endregion
#region ReadFromExcel(string fileName, DataSet ds)
/// <summary />
/// Read from an Excel file into an existing DataSet
/// </summary />
public static DataSet ReadFromExcel(string fileName, DataSet ds)
{
string excelConnectionString = string.Format(_excelConnectionString, fileName);
OleDbConnection excelFile = null;
DataTable schemaTable;
OleDbDataAdapter excelDataAdapter = null;
try
{
excelFile = new OleDbConnection(excelConnectionString);
excelFile.Open();
schemaTable = excelFile.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
new object[] { null, null, null, "TABLE" });
// Read each DataTable (i.e. Excel Spreadsheet) into the DataSet
foreach (DataRow dr in schemaTable.Rows)
{
excelDataAdapter = new OleDbDataAdapter(dr[_tableName].ToString(), excelFile);
excelDataAdapter.SelectCommand.CommandType = CommandType.TableDirect;
excelDataAdapter.AcceptChangesDuringFill = false;
string table = dr[_tableName].ToString().Replace("$",
string.Empty).Replace("'", string.Empty);
if (dr[_tableName].ToString().Contains("$"))
excelDataAdapter.Fill(ds, table);
}
excelFile.Close();
}
finally
{
if(excelDataAdapter != null) excelDataAdapter.Dispose();
excelDataAdapter = null;
schemaTable = null;
if (excelFile != null)
{
excelFile.Close();
excelFile.Dispose();
}
excelFile = null;
}
return ds;
}
#endregion
#endregion
Points of interest
This project uses the XmlBrowser
control that I developed. It can be found in another CodeProject article, here. I also originally wrote the Excel functions on my blog a couple of years ago. I fixed the code a bit to make it better. It still has the occasional problem when overwriting the same Excel file multiple times (about 10 or more times). You can find the original version of my code here.
Using the application
When you select an XML schema from the dropdown (and click on Render and then the Save buttons), it will create a file called MetaData.xls. This contains all the information that you would need to determine what columns are required by the DataSet
for shaping to work. Clicking on the Save Data button will save the output DataSet
to a file called OutputData.xls. You have to click on the Render button to see the transformation result.
History
- 19-Mar-2008
- 20-Mar-2008
- 26-Mar-2008
- 27-Mar-2008
- 04-Apr-2008
- 07-Apr-2008
- 14-Apr-2008
Initial release.
Separated out the writing of metadata to the DataGrid
/Excel file. This will be done when the user clicks on the Get button. Added a dropdown for XML files. If the user selects an XML file, this is used as the data source instead of reading from Excel.
Added a tab control with three tabs: Input XML, Target Schema, and Output XML, so that the user can see what the transformation is doing. Added a Render button so that the user can click this after selecting Input XML and XML Schema from the dropdowns. Added the isDataSet
parameter to the setupDataSet
method. Added the Statistics table to capture DataTable
names and DataTable
row count.
Added Save Data button to save the resulting DataSet
to OutputData.xls. Created a project folder called Excel Files so the user can add more than one Excel file for rendering. Made minor fixes to Excel read/write functions to handle special characters.
Modified the XmlBrowser
control, and included in a new version (2.0) of this library which requires SAXON. This is an alternate parser with really cool functionality.
Added the Vanilla XML option which will render large XML documents really fast using the Microsoft parser. Doesn't render namespace or CDATA
nodes.
Modified XmlRender
to render namespace and CDATA
nodes really fast using the Microsoft parser. Allows to save an empty data.xls when only selecting XML schema (this allows the user to build an Excel data source for use with an XML schema to test transformations).