Click here to Skip to main content
Click here to Skip to main content

Convert Excel to XML file/XML Schema, and validate Excel against XML Schema

By , 2 Jun 2005
 

Sample Image - Excel2XML.jpg

Introduction

Often we are in a need to convert Excel data to XML stream/XML file which can be used as a feed to various applications like web services or middle tiers such as BizTalk 2004. There will be many situations where we need to validate the format of Excel data sheet against a specified XML schema. We will also be required to generate XML schema based on an Excel Work Sheet. This utility along with the library will help you to accomplish the same.

The following are the salient features of this library:

  1. Usage of Microsoft Jet Engine to connect to Excel.
  2. Conversion of Excel Worksheet/Workbook to XML file and XML Schema.
  3. Generation of XML file and XML Schema based on provided range.
  4. Validation of Excel Worksheet/Workbook against the provided XML Schema.
  5. Provision of batch processing capability.

In this article, we will discuss the implementation of the library functions. The library contains the core functionality to access and manipulate Excel data.

The utility will merely call the appropriate functions from the library. In this way, one can use this same library in ASP.NET applications also with minute changes.

Connectivity Options

There are two ways to manipulate an Excel file. It can be done either by using Microsoft Office Component (check out here) or with Microsoft Jet Engine.

As per Microsoft recommendation, it is not advisable to use Office components on the server. It means that if you want to use this library for a server application, it’s not a good idea to use the Office component. So the connection will be done using Jet Engine.

Connection to Excel using Jet Engine

To connect to Excel, one can use OleDb objects that will treat Excel as a database, and then the required information can be easily fetched by using SQL queries. The important steps that have to be considered while connecting to Excel are as follows:

  • Connection String:

    The connection string should be set to the OleDbConnection object. This is very critical as Jet Engine might not give a proper error message if the appropriate details are not given.

    Syntax: Provider=Microsoft.Jet.OLEDB.4.0;Data Source=<Full Path of Excel File>; Extended Properties="Excel 8.0; HDR=No; IMEX=1".

  • Definition of Extended Properties:
    • Excel = <No>

      One should specify the version of Excel Sheet here. For Excel 2000 and above, it is set it to Excel 8.0 and for all others, it is Excel 5.0.

    • HDR= <Yes/No>

      This property will be used to specify the definition of header for each column. If the value is ‘Yes’, the first row will be treated as heading. Otherwise, the heading will be generated by the system like F1, F2 and so on.

    • IMEX= <0/1/2>

      IMEX refers to IMport EXport mode. This can take three possible values.

      • IMEX=0 and IMEX=2 will result in ImportMixedTypes being ignored and the default value of ‘Majority Types’ is used. In this case, it will take the first 8 rows and then the data type for each column will be decided.
      • IMEX=1 is the only way to set the value of ImportMixedTypes as Text. Here, everything will be treated as text.

For more info regarding Extended Properties, check this out.

Loading of data in to Dataset

After successfully connecting to Excel using Jet Engine, it is easy to the load the data in to DataSet. One has to write a query similar to ANSI-92 with the only changes being that each Excel sheet will be treated as a table and the table name will be the sheet name with “$”. The range can also be specified after the “$” sign.

public DataTable GetDataTable(string strSheetName)
{
  try
  {
    string strComand;

    if(strSheetName.IndexOf("|") > 0)
    {
      /* if Range is provided.*/
      SheetName = strSheetName.Substring(0,strSheetName.IndexOf("|"));
      Range = strSheetName.Substring(strSheetName.IndexOf("|")+1);
      strComand="select * from ["+SheetName+"$"+Range+"]"; 
    }
    else
    {
      strComand="select * from ["+strSheetName+"$]"; 
    }

    daAdapter=new OleDbDataAdapter(strComand,cn);
    DataTable dt=new DataTable(strSheetName);
    daAdapter.FillSchema(dt,SchemaType.Source);
    daAdapter.Fill(dt); 
    cn.Close(); 
    return dt ;
  }

  catch(Exception ex)
  {
    throw new Exception(ex.Message); 
  }
}

Conversion of Dataset to XML data and XML Schema

After loading the data in to DataSet, GetXML and GetXMLSchema functions will be used to generate XML data as well as XML Schema. This tool can also be used to generate a single file with XML Schema and XML data together. But this should not be done as XML schema is also an XML data. The following code snippet shows the same which should be avoided at any cost.

public string GetXML(string strSheetName, Boolean _blnSchema)
{
  DataSet ds=new DataSet(); 
  ds.Tables.Add(this.GetDataTable(strSheetName)); 
  if(_blnSchema)
    return ds.GetXmlSchema()+ds.GetXml(); 
  else
    return ds.GetXml(); 
}

Validation against XML Schema

Once the data is generated in the form of XML, we can validate the same using the XmlValidatingReader object from System.Xml.Schema.

public string ValidateXML(string strSchemaFile,string WorkSheet)
{
  strParseError="";
  XmlParserContext context = new XmlParserContext(null, 
                   new XmlNamespaceManager(new NameTable()), 
                   null, XmlSpace.None);
  XmlTextReader xmlReader = new XmlTextReader(this.GetXML(WorkSheet, 
                false), XmlNodeType.Element, context);
  XmlValidatingReader objValidator = new XmlValidatingReader(xmlReader);

  /* set the validation type to use an XSD schema */
  objValidator.ValidationType = ValidationType.Schema;
  XmlSchemaCollection objSchemaCol = new XmlSchemaCollection();
  objSchemaCol.Add("", strSchemaFile);
  objValidator.Schemas.Add(objSchemaCol);
  objValidator.ValidationEventHandler += 
     new ValidationEventHandler(ShowCompileErrors);

  try
  {
    while (objValidator.Read()) { }
  }
  catch(Exception objError)
  {
    throw new Exception(objError.Message); 
  }
  finally
  {
    xmlReader.Close();
  }

  return strParseError;
}

private void ShowCompileErrors(object sender, ValidationEventArgs args)
{
  strParseError+="::"+args.Message+"\r\n" ; 
}

Batch Process

Batch process will take the array of strings, and then by using the above functions will generate XML file and Schema per Workbook.

Future Enhancements

Generation of Excel Worksheet based on given XML Schema.

Download Binaries

You can download the working application here [9.48 Kb].

History

  • Version 1.0 released on 3rd June, 2005.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Gaurang Desai
Web Developer
United States United States
Member
Currently he is working as Senior Soft. Engineer at Cognizant Technology Solution.He is involved in various project activities like System Architecture, Design, and Development. He is fond of conduction training for various technologies. He has have worked on various language and platforms. He is Microsoft and Oracle Certified professional. He is spending quantity and quality time in .Net world. He had also spoiled his hand with java, too.
If work is not demanding, he spends good time with his wife, Purvi.He
blogs
at WebDevs.com.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionExcellent Utilitymemberinfovik15 Feb '13 - 23:24 
This is very useful utility for uploading Data from Excel sheet. Great work.
QuestiontargetNamespace parameter name errormembervsrawat116 Oct '12 - 20:17 
I downloaded and unzipped, I ran the exe file, gave my xls file name, selected sheet, left range blank, it could upload the data successfully.
 
however, when I give my xsd file, and ask to validate, it says
"targetNamespace parameter '' should be the same value as the targetNamespace 'aca' of the schema."
 
'aca' is the targetNamespace in my xsd schema.
 
I am at a loss, is there any error within the xsd file, or some setting I am missing. How to resolve?
 
Thanks.
GeneralMy vote of 1memberMember 41834903 Oct '12 - 0:32 
kjhk
Questionconvert excel to xml schema and xmlmemberjpp99918 Mar '12 - 5:41 
I have downloaded, unzipped the source file.
How do I get it run to with my excel file? Do I have to integrate it with Excel 2007 or what steps need to be done?
-------------------------------
Convert Excel to XML file/XML Schema, and validate Excel against XML Schema
By Gaurang Desai | 3 Jun 2005
.NET1.0.NET1.1Win2KWinXPWin2003VS.NET2003C#XMLDevIntermediate
Provides a utility and code library to convert an Excel file to XML file as well as generate XML Schema. It validates an Excel file against XML Schema.
 
Download source files - 5.84 Kb
Download demo files - 8.37 Kb
Generalerror in excel uploadingmemberspydeehunk6 May '11 - 1:49 
its working fine when in my excel if any column have heterogeneous data then it show null for that distinct data in cell.
let suppose i have excel sheet data like below
 

EmpCode Title
999312 mr
34ert Miss
so for this excel i get null value in cell first column second row
can you help me out
GeneralRe: error in excel uploadingmemberspydeehunk8 May '11 - 23:21 
just check out the connection string for oledb connection object. If IEMX Property is not set to 1 please do it because it ignore the datatype of column and take heterogeneous datatype .
GeneralCan u please helpmemberMember 788609730 Apr '11 - 20:57 
Hi can anyone please tell me how to use this?
 
I have an xsd file and using this schema i wanna convert my excel sheet into xml...
 

How can I do this using this software??
GeneralMy vote of 5memberkiranmayi pinupa15 Feb '11 - 21:03 
This was very useful.Thanks
GeneralGreat Job!memberLuca1009014 Jun '10 - 22:30 
great job.. usefull. thanks a lot
GeneralMy vote of 1memberTheArchitectmc1 Apr '10 - 2:24 
There is no demo file in the demo, no executables!
GeneralMy vote of 1memberCarson Palmer4 Jan '10 - 10:34 
Doesn't work
GeneralExcel files .xlsx is not supported !!memberWaleedS25 Aug '09 - 0:08 
Hello Gaurang,
 
I try to open one of the new xlsx files created by Excel2007 but it said "Externetable not in the excpected format".
 
How can I correct this to work with the excel2007 files..
 
Thanks, Best Regards
Waleed Seada

GeneralRe: Excel files .xlsx is not supported !!membergg423714 Oct '09 - 4:31 
You could try GemBox Excel .NET Component.
Amongst others formats like XLS, CSV, ODS it also supports XLSX.
You could easly export your Excel file to DataSet.
Feature to export Excel files to HTML is extensible so you could override its behavior and create your custom XML files from Excel files.
Questionfrom where i get ExcelXML reference?memberYour Friend Manoj22 Jun '09 - 18:55 
Hi,
 
i tried to run your application but got error of ExcelXML reference. could you please tell me what i have to do?
 
-Manoj
 
Manoj Patel
SI-Engineer
CMC Ltd.
Mumbai

GeneralIt's Cool!memberchinaontology26 Nov '08 - 20:42 
Cool! It help me a lot, thanks from china. Smile | :)
GeneralConnection to Excel using Jet EnginememberRebecca Fox28 Jul '08 - 10:26 
I have downloaded the application and sample data. I am confused as to how to connect to Excel using the Jet Engine.
 
How do I get to the screen that's shown in the example that shows the "Excel to XML v.1" screenshot where I assume I'll be able to identify the Excel Spreadsheet rows and columns to convert to XML Scheme.
 
My only problem is being able to access this screen shown in the example...how do I get in???
 
Anyone that can help me with this question and resolve my problem will be my Hero for the day!
 
A prompt reply will be greatly appreciated!Confused | :confused:
QuestionI am not able to open the binary code .memberSavvy16 May '08 - 0:58 
Confused | :confused: Can i use the code for converting a xsl sheet to xml in its simple form.Can it be batch processed.
Can you guide me the usage of this tool .
NewsRe: I am not able to open the binary code .membercnataraj24 May '08 - 2:15 
Frown | :( Me too.. Not only this page, from all over the site, i can't download the source code
GeneralIt's Great ArticlememberDEEPCHAND_KOSTA17 Apr '07 - 1:31 
How we can convert any type of file Like(Txt.,TabSaparated,etc..) in to XML FILE OR XML Schemas

 
DCKOSTA

Generaljava code to generate xml scehma from xml filememberSameer_Iqbal3 Feb '07 - 7:25 
can any one please help me to generate the XMl schema and DTD schema from an xml file using java in IDE net beans, jdk1.5
GeneralGeneral Conversion of tabular data into XML (VBA)memberrkapl9 Jan '07 - 5:11 
Hi Gaurang!
 
I implemented another solution that is able to convert to XML without (almost) any limitations: Table2XML
 
It works by reverting the "flattener" methodology that is applied when Excel loads XML files. With a few changes to column order (and adding/deleting some columns) you have a very flexible way to convert tabular data into XML. The algorithm doesn't need any add-in (as MSXML), so it's easy to rewrite it in any other environment/language (I've already created a perl version here).
 
-regards
 
Roland
 

GeneralMissing some data from first rowmemberQAKiani24 Nov '06 - 22:27 
Hi,
 
Nice article to read.
 
I was going through the application and i found out that When "Get Data" is clicked it doesnot retrieves all the data from the Excel sheet into the DataGrid, rather it misses all the cell's data of first row beside first cell. Same thing is observed in the XML.... So there is definitely something wrong, E.g. if the file contains the data like
 
Name TelNo Address
AAA 111 A1
BBB 222 B2
CCC 333 C3
 
but the application reads it like
 
Name (null) (null)
AAA 111 A1
BBB 222 B2
CCC 333 C3
 
I dont know that whether other users have faced the same problem or its only me who is missing out something. So any idea to solve this prob.
 
Regards,
QAKiani.
GeneralRe: Missing some data from first rowmemberQAKiani25 Nov '06 - 10:22 
After viewing the code i realized that the connection string to the Excel was not using IMEX option and if IMEX=1 is used then the problem is resolved and then the program reads the data correctly.
 
Just thought to share it.
 
Regards,
QAKiani.
GeneralRe: Missing some data from first rowmemberahuitt5 Mar '12 - 10:28 
I found that my data had column headings in row 1. So lets say my data looked like:
 
ID First Last
1 Joe Smith
2 Jane Smith
 
When you click "get data" is brings in the first row as data - not as column headings (or element names). Instead, it uses the column headers "F1" and "F2" and considers anything in the first row as data. It uses all the data in each column to determine the data type - in my case, it considers column one to be "double" and the next two columns as "string".
 
Here's the problem. When it considers column one to be "double" but then brings in row one, which has "ID" (a string value) then is basically "errors" out and just puts "(null)" in because it won't convert "ID" into a double type value.
 
I'm guessing something like this is happening in your case as well.
 
Hope this helps someone. D'Oh! | :doh:
GeneralCould not find installable ISAM OleDB.NetmemberHameer Saleem2 Sep '06 - 23:53 
Hi,
 
Nice article dude.
I have a small problem. I was doing something similar and got the exception:
 
"Could not find installable ISAM OleDB.Net"
 
when oleDBDataAdapter.Fill line is reached.
 
Can you tell me why is it coming up.
 

 

 
Regards,
Hameer Saleem
GeneralRe: Could not find installable ISAM OleDB.NetmemberGaurang Desai7 Sep '06 - 18:51 
Check my blog for this error. http://blogs.wdevs.com/gaurang/archive/2005/06/15/5112.aspx#14464
 
Hope that solve you prob..
 

Thanks
Gaurang
Generalhelpmemberzhanglei.net200417 Apr '06 - 2:11 
hi:
 
it's a good job,but when i debug it ,there's always an error occured.
 
private ExcelXML.ExcelToXML obj;
 
Error:The referenced project 'ExelXML' does not exist. ShowExcelAndXML
 
why ?
 
Thank u

 
i am studying C#,wish to discuss with u
GeneralRe: helpmemberQaisar Aftab24 Nov '06 - 20:48 
I think that you havenot added the reference of ExcelXML library in the References section of the proj. Add the reference and you then you will be fine.
 
Regards.
QAKiani.
GeneralRe: helpmemberzhanglei.net20047 Jan '07 - 14:53 
thank u for your answer,i had seen.
 
i am studying C#,wish to discuss with u

GeneralA Caution about Mixed Data TypesmemberAleks K.11 Jan '06 - 15:01 
thanks, good job, but my excel sheet was exported not correctly, i founded
 
http://support.microsoft.com/default.aspx?scid=kb;en-us;257819[^]
 
As stated previously, ADO must guess at the data type for each column in your Excel worksheet or range. (This is not affected by Excel cell formatting settings.) A serious problem can arise if you have numeric values mixed with text values in the same column. Both the Jet and the ODBC Provider return the data of the majority type, but return NULL (empty) values for the minority data type. If the two types are equally mixed in the column, the provider chooses numeric over text.
 
To work around this problem for read-only data, enable Import Mode by using the setting "IMEX=1" in the Extended Properties section of the connection string
 

And it works mit imex=1 !

GeneralRe: A Caution about Mixed Data TypesmemberGaurang Desai11 Jan '06 - 16:44 
Thanks Alerks for you comments... yes that's true... if your column has mix data you need to set IMEX=1. I guess i have mention the same point in my article and i fear you miss it.. anyway... thanks again for sharing above URL.
GeneralFile Path InformationmemberStryder123 Nov '05 - 8:17 

Might want to look at the System.IO.Path object. You will find it much easier and more accurate than:
string strFile = strFileName.Substring(strFileName.LastIndexOf("\\")+1)
 
Thanks for the article! It has been a big help with the project I am currently working on.
 

 
Nathan Hartley
http://iLothlorien.com
GeneralIt would be nice to see....memberPreky17 Nov '05 - 3:05 
Fast generation of large DataTable/DataSet to Excel Worksheet Smile | :)
Latly I was doing something about that and still searching for solution how to convert DataTable fast enough to Excel worksheet.
I'm talking about 10,000 rows or even much more.
 
I tried this for 300 rows and 70 columns:
1)DataSet->XML with XSLT -> Excel xsl file.
it last for about 8 minutes, with full formating all cells.
 
2)using Excel API's it lasted about 1 minute, but without formating, just filling cell by cell in memory, not showing filling the cells -> but i didn't tried 10,000 rows it would last for a very long period of time...
 

Any suggestions?
 
P.S. I thought of using Xml Schema to get XML data rows representation, but didn't find (yet) anything about it.
 

 
Preky
GeneralRe: It would be nice to see....memberGaurang Desai19 Nov '05 - 16:08 
Hi Preky.. sorry for delayed reply..
 
if the source for DataSet it Database and i guess it should be database only as it is matter of 10000 records... so in this case
 
i would consider datareder and XMLdocument object instead of dataset->xml... i guess it will give you better performance...
 
but i have not given a try....
 
best of luck..
 
gaurang

GeneralRe: It would be nice to see....memberPreky20 Nov '05 - 0:48 
I have tried that but after all you would need XSLT to traslate to Excel. Finaly the best solution was creating by "hands" the proper XML Excel format row by row and cell by cell. I works the best after testing different ways of implementation.
 
Preky
Questioncan you helpmembertejas merh29 Aug '05 - 23:34 
hi gaurang
 
let me first of all warn you that I don't even know the P of progamming but I need some help. I came across this site thru a google search for conversion from Excel to XML. I need to convert a list of fields which are in a step form, i.e. one under the other from Excel to xml in a particular format. Can you help me?
If you don't understand what I am saying, I can actually show you the xml file how I want the data and the excel sheet of the raw data.
 
tejas
Generalsheet list bugmemberspneumann4 Aug '05 - 5:30 
there is a small flaw in the code that gathers the sheet names. if a sheet name has embedded spaces, the following code won't remove the $ correctly. The sheet name is returned inside single quotes....
 
excelSheets[i] = strSheetTableName.Substring(0,strSheetTableName.Length-1);

GeneralRe: sheet list bugmemberTrondR23 Nov '05 - 5:03 
Additonally it might be a good idea to check that the '$' character is actually present at the end of the name. The OleDbSchemaGuid.Tables may contain other objects than Sheets. The following fix this and the embeded space flaw:
 
...
ArrayList excelSheets = new ArrayList();
 
// Add the sheet name to the string array.
foreach(DataRow row in dataTable.Rows)
{
string strSheetTableName = row["TABLE_NAME"].ToString();
if(strSheetTableName.EndsWith("$"))
{
excelSheets.Add(strSheetTableName.Substring(0,strSheetTableName.Length-1));
}
else if(strSheetTableName.EndsWith("$'"))
{
excelSheets.Add(strSheetTableName.Substring(1,strSheetTableName.Length-3));
}
}
...
 
TrondRSmile | :)
 
-- modified at 11:52 Wednesday 23rd November, 2005
GeneralRe: sheet list bugmemberwalnut081221 Sep '09 - 19:23 
thanks TrondR
very helpful!! ^^
GeneralSpreadsheetMLmemberGabhan Berry7 Jun '05 - 22:43 
How does this compare to Excel's XML file format, SpreadsheetML? You can convert Excel files into an XML spreadsheet by simply saving the spreadsheet in XML format (Excel XP and above).
 
If you need the XML to adhere to a specific schema, could you then use XSLT to transform Excel's XML file into the required XML structure?
GeneralRe: SpreadsheetMLmemberGaurang Desai8 Jun '05 - 0:20 
yeap...
 
but i have few question to you..
 
1. Can u convert multiple files as the same time...That is Batch Process
2. Can u verify the submitted Excel file against specified Schema.
3. Can u generate the XML schema that contains only range of Cells.
 
and my main intension is to develop xml2xml library that can be called/used by any process that need to manipulate Excel file without Excel Application object, and send it to some application that only understand xml.
 
this is not possible without this library...
 
sorry for my bad English...

GeneralRe: SpreadsheetMLmemberGabhan Berry9 Jun '05 - 21:53 
Gaurang Desai wrote:
Can u convert multiple files as the same time...That is Batch Process
 
- Yes but you'd have to code that yourself (easy to do though)
 

Gaurang Desai wrote:
Can u verify the submitted Excel file against specified Schema
 
- yes
 

Gaurang Desai wrote:
Can u generate the XML schema that contains only range of Cells.
 
- you can save a range of cells as spreadsheetML, but it can be a little tricky to do.
 
The Excel API is fairly comprehensive, but it can be tricky to use. However, of course, to do what I am talking about here you would need programmatic access to the Excel API meaning that it would have to be installed on the computer running the code.
 
One other point to keep in mind is that spreadsheetML will contain formatting information i.e. what the colours of the cells are, what font is used etc.. I presuming that this is not something that your program would keep?
 


GeneralRe: SpreadsheetMLmemberGaurang Desai9 Jun '05 - 23:43 
hi gabhan,
 
frankly speaking I am not that much aware of spreadsheetML. but it's part of Excel Office Application. as I said in article, we should not use Office application on server. as Office application is developed targeting Desktop Environment. and I guess it works only with Office 2003.
 
Second thing, this library is developed to work at middle tier and pass the generated XML to destination layer, such as BizTalk.
 
anyway, thanks for pointing me to spreadsheetML, it might be good area to explore.

GeneralVery goodmemberKant3 Jun '05 - 4:23 
My 5 Cool | :cool:
 

Smoking kills. If you're killed, you've lost a very important part of your life. -- Brooke Shields, during an interview to become spokesperson for federal anti-smoking campaign.

This signature was created by "Code Project Quoter".
GeneralRe: Very goodmemberGaurang Desai7 Jun '05 - 2:36 
Thanks...Smile | :)
GeneralRe: Very goodmemberIrish.Kevin17 Jul '06 - 11:22 
I gave this a 5 too. I have a dataset that has 778 lines with 53 items per line, and I was wondering how to write some code to convert it into an XML format and voila- it's already been done. I love the web!
 
Thanks for a good article and the good code.
 
Cheers,
Kevin

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 3 Jun 2005
Article Copyright 2005 by Gaurang Desai
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid