Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

How to Read and Write .xlsx (Excel 2007) file - Part I

0.00/5 (No votes)
27 Jun 2011 1  
Read and Write XLSX file via C# .NET - Part I

Introduction

Few days ago, I was trying to write an xlsx file. I thought it would be as easy as writing an xls file when you just need to write the html to Response with correct ContentEncoding, so I tried but failed. Actually, it created the xlsx file but when I tried to open it I got an error. Hmmmm, what did I do wrong? Anyway, I started Googling and found a few facts about xlsx.

Xlsx Facts

Xlsx is actually a zip file which contains a set of XML files, some files define the data, some define the metadata, some define style sheets. If you want to see it, you just need to change the extension of Excel file from .xlsx to .zip file and then extract it. You will see the following folders and a file at the root level.

  1. _rels
  2. docProps
  3. xl
  4. [Content_Types].xml

Before digging into the code, let's see what these files are for. Currently I am not going to cover all Excel features, so I am just starting it with the basics. Well, I have an xlsx file which is as follows:

1.JPG

As you can see, the above file is a very simple xlsx file containing two rows and three columns and every cell is representing data. As you can see, A1 and A2 are representing numeric type data while cells B1, B2, C1, C2 are representing string type data. Now, let's see what happens if we convert it to zip and extract it.

Currently we are focusing on two important files that need to be understood.

  1. sheet1.xml (or "yoursheetname.xml" in my case sheet name is "sheet1") (resides in xl\worksheets)
  2. sharedStrings.xml (resides in xl folder)

Sheet1.XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
  xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
 <dimension ref="A1:C2" />
 <sheetViews>
   <sheetView tabSelected="1" workbookViewId="0">
   <selection />
   </sheetView>
 </sheetViews>
 <sheetFormatPr defaultRowHeight="15" />
 <sheetData>
   <row r="1" spans="1:3">
     <c r="A1"><v>1</v></c>
     <c r="B1" t="s"><v>0</v>
     </c><c r="C1" t="s"><v>1</v></c>
   </row>
   <row r="2" spans="1:3">
     <c r="A2"><v>2</v></c>
     <c r="B2" t="s"><v>2</v></c>
     <c r="C2" t="s"><v>1</v></c>
   </row>
 </sheetData>
 <pageMargins left="0.7" right="0.7" top="0.75" 
	bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

Well, there is a lot to understand in sheet1.xml but currently we are focusing on a few.

  1. WorkSheet (it is the parent node or root node)
  2. dimension (the value of attribute named "ref" would be the starting and ending cell number that contains the data of Excel file. Excel file that I am using in this example contains data from A1 to C2 as you can see in the Excel file image.
  3. sheetData (that is the main node which contains all the information about rows, columns and data)

Sheet Data contains child nodes named "row" which represent the rows in Excel file where the value of attribute "r" represents the name of row. Our Excel file contains two rows and sheet1.xml file contains two rows tags. Now let's look at how it represents cell information.

Under the row node, there is a node named c representing cell information.
where value of:

  • attribute "r" represents name of cell,
  • attribute "t" represents type of data that cell contains

Under the c node, there is node v which represents the value that cell contains. But the question is if it represents the value of cell, then what happened to the value of cell of B1, B2, C1, C2. Here comes sharedstring.xml into play.

sharedString.XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    count="3" uniqueCount="3">
 <si><t>My First Name</t></si>
 <si><t>My Last Name</t></si>
 <si><t>Your FirstName</t></si>
</sst>

Shared string file contains string values that reside in Excel file (contains only unique string values), here in our Excel file you can see that we defined My Last Name twice but it shows only once in sharedstring.xml.

si (it is the node that contains string value).

Relation b/w sheet.xml and sharedString.xml

4.jpg

As the image shows, c node has value "s" of attribute t represents the data of string type and v node actually contains the index of data which resides in sharedString.xml.

Next Part

In the next part, we will see the code for how we are going to parse these two XML files to write and read Excel file, and few other methods.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here