Click here to Skip to main content
Click here to Skip to main content
Go to top

DataSets, XML Schemas, and Excel - Oh My!

, 24 Apr 2008
Rate this:
Please Sign up or sign in to vote.
This article demonstrates the amazing transformation power of the DataSet with agnostic XML schemas, XML, and Excel Read/Write (all with colourful XML rendering using XmlBrowser).

Introduction

This article explores the power of the DataSet in a way that you might not be familiar with. Two important (and very helpful) aspects of the DataSet are uncovered here:

  • A DataSet can be used to write to Excel and read from Excel using an OleDbConnection, very easily. (Note: I use these functions in my data-driven unit tests.)
  • A DataSet (or XML document), in conjunction with an agnostic XML schema (not a DataSet XML schema) can be used to shape/reshape XML data.

DataSetMagic_30.JPG

Background

I remember the DAO (Data Access Objects) days and the ADO Recordset days. I also remember being skeptical when Microsoft introduced yet another library to manipulate data. That was when .NET initially came out. My skepticism waned away very quickly when I started to delve under the covers of the DataSet. Microsoft introduced a very powerful tool with the DataSet (even if it isn't sold the way I'm going to sell it to you).

Going back a few years ago, I had designed/developed an Enterprise Management Reporting system for a global company that required MSMQ driven server-side automation of Office. That was an unadvisable task, according to Microsoft - but it was achieved nonetheless (with macros and the like).

Then, I discovered the beauty of writing to and reading from Excel using an OleDbConnection. This meant I could limit the execution time to just the time required to running macros for the manipulation of data (I didn't need Excel to write to Excel anymore). That greatly improved the efficiency, and reduced the risk of problems because the Excel instance was used for a much shorter time.

I also remember a time when I had to use an XmlWriter or the FOR XML clause in SQL Server to build XML data. Using the Relations collection of ADO.NET to build formatted XML? - also a thing of the past. But, those were my younger days, I rejoiced when I found that I could read XML schemas into a DataSet and out came the formatted XML!

And, if that wasn't enough to pique your interest, how about the ability (in the attached code) to take an XML document in one format and have it loaded into a DataSet with XML schema in another format? Who would have thought? A DataSet as a Transformation Engine - you're kidding right? (I'm not!)

Now, you might say that .NET already provides the ability to shape XML with Strongly-Typed DataSets. But, the problem is that you don't necessarily want .NET DataSet XML schemas. I have been on a few projects where the tool used to develop XML schema was not VS2005. Admit it, most times, you need to use agnostic XML schemas. The development team isn't necessarily responsible for XML schemas. They don't own the XML schemas. After all, best practice requires us to separate the presentation of the data from the retrieval of data. So, how can we implement a best practice here?

By relaxing the constraints, we can use the XML schema to build the data, then use the XML schema to validate the data. Kill two birds with one stone - so to speak. Enough of the sales pitch - let's take a look at what I am talking about.

Using the code

There are two main modules in this project.

Name Description
DataSetMethods This class has three methods:
  • gatherMetaData - used to gather table/column information and relations within the DataSet.
  • setupDataSet - takes an agnostic XML schema and makes it DataSet-ready.
  • LoadXmlData - used to load XML into a DataSet with a target XML schema already applied to it.
ExcelFunctions This class contains two overloads to ReadFromExcel to a DataSet and two overloads to WriteToExcel from a DataSet.

DataSetMethods code:

/// <span class="code-SummaryComment"><summary /></span>
/// This method gathers Table\Column information from a DataSet.
/// It also gathers Relation information from a DataSet.
/// <span class="code-SummaryComment"></summary /></span>

public static DataSet gatherMetaData(DataSet ds)
{
    XmlDocument xd = new XmlDocument();
    xd.Load(@"Metadata Xsd\MetaData.xsd");
    DataSet metaData = setupDataSet(xd, new DataSet(),false);

    for (int i = 0; i < ds.Tables.Count; i++)
    {
        for (int j = 0; j < ds.Tables[i].Columns.Count; j++)
        {
            DataRow dr = metaData.Tables["Table"].NewRow();
            dr["Name"] = ds.Tables[i].TableName;
            dr["Column"] = ds.Tables[i].Columns[j].ColumnName;
            metaData.Tables["Table"].Rows.Add(dr);
        }
    }

    for (int i = 0; i < ds.Relations.Count; i++)
    {
        for (int j = 0; j < ds.Relations[i].ParentColumns.Length; j++)
        {
            for (int k = 0; k < ds.Relations[i].ChildColumns.Length; k++)
            {
                DataRow dr = metaData.Tables["Relations"].NewRow();
                dr["ParentTable"] = ds.Relations[i].ParentTable;
                dr["ParentColumn"] = ds.Relations[i].ParentColumns[j].ColumnName;
                dr["ChildTable"] = ds.Relations[i].ChildTable;
                dr["ChildColumn"] = ds.Relations[i].ChildColumns[j].ColumnName;
                metaData.Tables["Relations"].Rows.Add(dr);
            }
        }
    }
    return metaData;
}

/// <span class="code-SummaryComment"><summary /></span>
/// Set up a dataset with an Xml Schema that is DataSet friendly
/// <span class="code-SummaryComment"></summary /></span>
public static DataSet setupDataSet(XmlDocument xmlSchema, DataSet ds, bool isDataSet)
{
    XmlAttribute msdataAttribute = null;
    XmlNode schemaNode = null;
    StringReader sr = null;
    XmlReader xr = null;
    try
    {
        if (isDataSet)
        {
            msdataAttribute = xmlSchema.CreateAttribute("xmlns:msdata",
                "http://www.w3.org/2000/xmlns/");

            msdataAttribute.Value = "urn:schemas-microsoft-com:xml-msdata";
            schemaNode = xmlSchema.SelectSingleNode(
                "descendant::*[local-name() = 'schema']");

            schemaNode.Attributes.Append(msdataAttribute);
            
            msdataAttribute = xmlSchema.CreateAttribute("msdata:IsDataSet",
                "urn:schemas-microsoft-com:xml-msdata");
            msdataAttribute.Value = "true";

            schemaNode.SelectSingleNode("*[local-name() = 'element']"
                ).Attributes.Append(msdataAttribute);
        }
        sr = new StringReader(xmlSchema.OuterXml);
        xr = XmlReader.Create(sr);
        ds.ReadXmlSchema(xr);
        ds.EnforceConstraints = false;
    }
    finally
    {
        msdataAttribute = null;
        schemaNode = null;
        sr = null;
        xr = null;
    }
    return ds;
}

/// <span class="code-SummaryComment"><summary /></span>
/// Load a Dataset (with XmlSchema applied) with data from an XmlDocument
/// <span class="code-SummaryComment"></summary /></span>
public static DataSet LoadXmlData(XmlDocument xmlData, DataSet ds)
{
    DataSet tempds = null;
    StringReader sr = null;
    XmlReader xmlDataReader = null;
    try
    {
        tempds = new DataSet();
        sr = new StringReader(xmlData.OuterXml);
        xmlDataReader = XmlReader.Create(sr);
        tempds.ReadXml(xmlDataReader);
        ds.Merge(tempds, false, MissingSchemaAction.Ignore);
    }
    finally
    {
        tempds = null;
        sr = null;
        xmlDataReader = null;
    }
    return ds;
}

ExcelFunctions code:

#region Constants

/// <span class="code-SummaryComment"><summary /></span>
/// string to use for setting up connection string to Excel
/// <span class="code-SummaryComment"></summary /></span>
private const string _excelConnectionString = 
    "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";User ID=" + 
    "Admin;Password=;Extended Properties=\"Excel 8.0;HDR=YES\";";

/// <span class="code-SummaryComment"><summary /></span>
/// select statement to read from Excel
/// <span class="code-SummaryComment"></summary /></span>
private const string _excelSelect = "select * from [{0}]";

/// <span class="code-SummaryComment"><summary /></span>
/// tablename column for DataRow
/// <span class="code-SummaryComment"></summary /></span>
private const string _tableName = "TABLE_NAME";

/// <span class="code-SummaryComment"><summary /></span>
/// CREATE TABLE Template
/// <span class="code-SummaryComment"></summary /></span>
private const string _tableCreate = "CREATE TABLE [{0}] (";

/// <span class="code-SummaryComment"><summary /></span>
/// COLUMN Template for CREATE TABLE
/// <span class="code-SummaryComment"></summary /></span>
private const string _tableColumn = "[{0}] {1}{2}";
#endregion

#region Private Methods
/// <span class="code-SummaryComment"><summary /></span>
/// Very simple function to specify Excel DataType mapping.
/// <span class="code-SummaryComment"></summary /></span>
private static string getColumnType(DataColumn dc)
{
    string columnType = "TEXT";
    switch (dc.DataType.ToString())
    {
        case "System.Int64" :
        case "System.Double":
        case "System.Int32" :
            columnType = "NUMERIC";
            break;
        default:
            columnType = "TEXT";
            break;
    }
    return columnType;
}

#endregion

#region Public Methods

#region WriteToExcel(DataSet ds)

/// <span class="code-SummaryComment"><summary /></span>
/// Write data from a dataset to a new filename.
/// <span class="code-SummaryComment"></summary /></span>
public static void WriteToExcel(DataSet ds)
{
    WriteToExcel(ds,ds.DataSetName + ".xls",false);
}
#endregion

#region WriteToExcel(DataSet ds, String fileName, bool append)
/// <span class="code-SummaryComment"><summary /></span>
/// Write data from a dataset to a filename.
/// This method can either create a new file or append to
/// an existing Excel file. If append is specified and file does
/// not exist, the file will be created.
/// <span class="code-SummaryComment"></summary /></span>

public static void WriteToExcel(DataSet ds, String fileName, bool append)
{
    string excelConnectionString = string.Format(_excelConnectionString, fileName);
    OleDbConnection excelFile = null;
    OleDbCommand excelCmd = null;
    OleDbDataAdapter excelDataAdapter = null;
    OleDbCommandBuilder excelCommandBuilder = null;
    StringBuilder sb = null;
    try
    {
        GC.Collect();
        if (File.Exists(fileName) && !append) File.Delete(fileName);
        excelFile = new OleDbConnection(excelConnectionString);
        excelFile.Open();
        // write each DataTable to Excel Spreadsheet
        foreach (DataTable dt in ds.Tables)
        {
            // file does not exist or we don't want to append
            if (!File.Exists(fileName) || !append)
            {
                // build the CREATE TABLE statement
                sb = new StringBuilder();
                sb.AppendFormat(_tableCreate, dt.TableName);
                foreach (DataColumn dc in ds.Tables[dt.TableName].Columns)
                {
                    sb.AppendFormat(_tableColumn, dc.ColumnName, 
                        getColumnType(dc)
                        , (dc.Ordinal == dt.Columns.Count - 1 ?
                        ")" : ","));
                }
                excelCmd = new OleDbCommand(sb.ToString(), excelFile);
                excelCmd.ExecuteNonQuery();
            }
            // use the command builder to generate insert
            // command for DataSet Update to work
            excelDataAdapter = 
              new OleDbDataAdapter(string.Format(_excelSelect,dt.TableName), excelFile);
            excelCommandBuilder = new OleDbCommandBuilder(excelDataAdapter);
            excelCommandBuilder.QuotePrefix = "[";
            excelCommandBuilder.QuoteSuffix = "]";
            try
            {
                excelDataAdapter.InsertCommand = 
                       excelCommandBuilder.GetInsertCommand();
                excelDataAdapter.Update(ds, dt.TableName);
            }catch {}
        }
    }
    finally
    {
        sb = null;
        if(excelDataAdapter != null) excelDataAdapter.Dispose();
        excelDataAdapter = null;
        excelCommandBuilder = null;
        if(excelCmd != null) excelCmd.Dispose();
        excelCmd = null;
        if (excelFile != null)
        {
            excelFile.Close();
            excelFile.Dispose();
        }
        excelFile = null;
    }
}

#endregion

#region ReadFromExcel(string fileName)

/// <span class="code-SummaryComment"><summary /></span>
/// Read from an Excel file into a new DataSet
/// <span class="code-SummaryComment"></summary /></span>
public static DataSet ReadFromExcel(string fileName)
{
    return ReadFromExcel(fileName, new DataSet());
}
#endregion

#region ReadFromExcel(string fileName, DataSet ds)
/// <span class="code-SummaryComment"><summary /></span>
/// Read from an Excel file into an existing DataSet
/// <span class="code-SummaryComment"></summary /></span>
public static DataSet ReadFromExcel(string fileName, DataSet ds)
{
    string excelConnectionString = string.Format(_excelConnectionString, fileName);
    OleDbConnection excelFile = null;
    DataTable schemaTable;
    OleDbDataAdapter excelDataAdapter = null;
    try
    {
        excelFile = new OleDbConnection(excelConnectionString);
        excelFile.Open();
        schemaTable = excelFile.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
        new object[] { null, null, null, "TABLE" });
        // Read each DataTable (i.e. Excel Spreadsheet) into the DataSet
        foreach (DataRow dr in schemaTable.Rows)
        {
            excelDataAdapter = new OleDbDataAdapter(dr[_tableName].ToString(), excelFile);
            excelDataAdapter.SelectCommand.CommandType = CommandType.TableDirect;
            excelDataAdapter.AcceptChangesDuringFill = false;
            string table = dr[_tableName].ToString().Replace("$", 
                           string.Empty).Replace("'", string.Empty);
            if (dr[_tableName].ToString().Contains("$"))
                excelDataAdapter.Fill(ds, table);
        }
        excelFile.Close();
    }
    finally
    {
        if(excelDataAdapter != null) excelDataAdapter.Dispose();
        excelDataAdapter = null;
        schemaTable = null;
        if (excelFile != null)
        {
            excelFile.Close();
            excelFile.Dispose();
        }
        excelFile = null;
    }
    return ds;
}
#endregion

#endregion

Points of interest

This project uses the XmlBrowser control that I developed. It can be found in another CodeProject article, here. I also originally wrote the Excel functions on my blog a couple of years ago. I fixed the code a bit to make it better. It still has the occasional problem when overwriting the same Excel file multiple times (about 10 or more times). You can find the original version of my code here.

Using the application

When you select an XML schema from the dropdown (and click on Render and then the Save buttons), it will create a file called MetaData.xls. This contains all the information that you would need to determine what columns are required by the DataSet for shaping to work. Clicking on the Save Data button will save the output DataSet to a file called OutputData.xls. You have to click on the Render button to see the transformation result.

History

  • 19-Mar-2008
  • Initial release.

  • 20-Mar-2008
  • Separated out the writing of metadata to the DataGrid/Excel file. This will be done when the user clicks on the Get button. Added a dropdown for XML files. If the user selects an XML file, this is used as the data source instead of reading from Excel.

  • 26-Mar-2008
  • Added a tab control with three tabs: Input XML, Target Schema, and Output XML, so that the user can see what the transformation is doing. Added a Render button so that the user can click this after selecting Input XML and XML Schema from the dropdowns. Added the isDataSet parameter to the setupDataSet method. Added the Statistics table to capture DataTable names and DataTable row count.

  • 27-Mar-2008
  • Added Save Data button to save the resulting DataSet to OutputData.xls. Created a project folder called Excel Files so the user can add more than one Excel file for rendering. Made minor fixes to Excel read/write functions to handle special characters.

  • 04-Apr-2008
  • Modified the XmlBrowser control, and included in a new version (2.0) of this library which requires SAXON. This is an alternate parser with really cool functionality.

  • 07-Apr-2008
  • Added the Vanilla XML option which will render large XML documents really fast using the Microsoft parser. Doesn't render namespace or CDATA nodes.

  • 14-Apr-2008
  • Modified XmlRender to render namespace and CDATA nodes really fast using the Microsoft parser. Allows to save an empty data.xls when only selecting XML schema (this allows the user to build an Excel data source for use with an XML schema to test transformations).

References

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

George Zabanah
Architect
Australia Australia
George Zabanah has been architecting and developing solutions commercially from start to finish for over 11 years. He has been programming since he was 11 (for over 25 years now!) using many different technologies (almost all Microsoft). George is proficient in many languages and employs best practices wherever possible using design patterns, .NET, XSLT, XML, Regular Expressions, various flavours of SQL (to name just a few). His favourite tools, however, are the whiteboard and Sparx Enterprise Architect/Powerpoint. Many waking moments have been spent by George thinking about solutions in his head (even while watching tv). His best moments are the "Eureka" moments when he wakes up from a good sleep (or after watching tv) to find that his latest challenge has been solved!

Comments and Discussions

 
GeneralThanks! Pinmembermuunkky25-May-09 13:22 
GeneralNot able to insert data to the Excel Pinmemberhl6a21-Jul-08 4:09 
GeneralRe: Not able to insert data to the Excel -Ignore it. Pinmemberhl6a21-Jul-08 4:38 
QuestionRe: Not able to insert data to the Excel -Ignore it. PinmemberGeorge Zabanah22-Jul-08 16:33 
AnswerRe: Not able to insert data to the Excel -Ignore it. PinmemberRegentZ7-Dec-08 19:36 
GeneralRe: Not able to insert data to the Excel -Ignore it. PinmemberGeorge Zabanah8-Dec-08 0:46 
AnswerRe: Not able to insert data to the Excel -Ignore it. PinmemberRegentZ8-Dec-08 15:02 
GeneralRe: Not able to insert data to the Excel -Ignore it. PinmemberGeorge Zabanah14-Dec-08 23:50 
GeneralRe: Not able to insert data to the Excel -Ignore it. PinmemberDrDave20-Nov-09 6:57 
QuestionAm I missing something? Pinmemberhike344-Jun-08 1:13 
AnswerRe: Am I missing something? PinmemberGeorge Zabanah4-Jun-08 1:41 
GeneralRe: Am I missing something? Pinmemberhike344-Jun-08 4:09 
QuestionRe: Am I missing something? [modified] Pinmemberhike344-Jun-08 21:08 
QuestionSystem.ArgumentException PinmemberNescient29-May-08 10:14 
AnswerRe: System.ArgumentException PinmemberGeorge Zabanah4-Jun-08 1:54 
QuestionThe referenced component 'XmlRender' could not be found. PinmemberNescient28-May-08 9:42 
Question'XmlRender' is now found! :) ...but 'MSXML2' is missing... :( [modified] PinmemberNescient28-May-08 10:32 
AnswerRe: 'XmlRender' is now found! :) ...but 'MSXML2' is missing... :( PinmemberGeorge Zabanah29-May-08 1:39 
AnswerBack to square one. [modified] PinmemberNescient29-May-08 10:01 
Generalstop using UP YOURS! in your titles PinmemberThanks for all the fish25-Apr-08 3:02 
GeneralexcelDataAdapter.Update() fails for lack of key column information PinmemberBig Dog18-Apr-08 14:40 
GeneralRe: excelDataAdapter.Update() fails for lack of key column information PinmemberGeorge Zabanah18-Apr-08 20:57 
Questionset height and width for webpart when move one webpart from one zone to another zone Pinmembervinnuvineet8-Apr-08 2:11 
GeneralRe: set height and width for webpart when move one webpart from one zone to another zone PinmemberGeorge Zabanah8-Apr-08 13:33 
GeneralError Question Pinmemberpolock_112328-Mar-08 10:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140922.1 | Last Updated 24 Apr 2008
Article Copyright 2008 by George Zabanah
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid