Click here to Skip to main content
15,881,027 members
Articles / Web Development / ASP.NET

How to Read and Write .xlsx (Excel 2007) file - Part I

Rate me:
Please Sign up or sign in to vote.
4.61/5 (18 votes)
27 Jun 2011CPOL3 min read 125.4K   61   18
Read and Write XLSX file via C# .NET - Part I

Introduction

Few days ago, I was trying to write an xlsx file. I thought it would be as easy as writing an xls file when you just need to write the html to Response with correct ContentEncoding, so I tried but failed. Actually, it created the xlsx file but when I tried to open it I got an error. Hmmmm, what did I do wrong? Anyway, I started Googling and found a few facts about xlsx.

Xlsx Facts

Xlsx is actually a zip file which contains a set of XML files, some files define the data, some define the metadata, some define style sheets. If you want to see it, you just need to change the extension of Excel file from .xlsx to .zip file and then extract it. You will see the following folders and a file at the root level.

  1. _rels
  2. docProps
  3. xl
  4. [Content_Types].xml

Before digging into the code, let's see what these files are for. Currently I am not going to cover all Excel features, so I am just starting it with the basics. Well, I have an xlsx file which is as follows:

1.JPG

As you can see, the above file is a very simple xlsx file containing two rows and three columns and every cell is representing data. As you can see, A1 and A2 are representing numeric type data while cells B1, B2, C1, C2 are representing string type data. Now, let's see what happens if we convert it to zip and extract it.

Currently we are focusing on two important files that need to be understood.

  1. sheet1.xml (or "yoursheetname.xml" in my case sheet name is "sheet1") (resides in xl\worksheets)
  2. sharedStrings.xml (resides in xl folder)

Sheet1.XML

XML
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
  xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
 <dimension ref="A1:C2" />
 <sheetViews>
   <sheetView tabSelected="1" workbookViewId="0">
   <selection />
   </sheetView>
 </sheetViews>
 <sheetFormatPr defaultRowHeight="15" />
 <sheetData>
   <row r="1" spans="1:3">
     <c r="A1"><v>1</v></c>
     <c r="B1" t="s"><v>0</v>
     </c><c r="C1" t="s"><v>1</v></c>
   </row>
   <row r="2" spans="1:3">
     <c r="A2"><v>2</v></c>
     <c r="B2" t="s"><v>2</v></c>
     <c r="C2" t="s"><v>1</v></c>
   </row>
 </sheetData>
 <pageMargins left="0.7" right="0.7" top="0.75" 
	bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

Well, there is a lot to understand in sheet1.xml but currently we are focusing on a few.

  1. WorkSheet (it is the parent node or root node)
  2. dimension (the value of attribute named "ref" would be the starting and ending cell number that contains the data of Excel file. Excel file that I am using in this example contains data from A1 to C2 as you can see in the Excel file image.
  3. sheetData (that is the main node which contains all the information about rows, columns and data)

Sheet Data contains child nodes named "row" which represent the rows in Excel file where the value of attribute "r" represents the name of row. Our Excel file contains two rows and sheet1.xml file contains two rows tags. Now let's look at how it represents cell information.

Under the row node, there is a node named c representing cell information.
where value of:

  • attribute "r" represents name of cell,
  • attribute "t" represents type of data that cell contains

Under the c node, there is node v which represents the value that cell contains. But the question is if it represents the value of cell, then what happened to the value of cell of B1, B2, C1, C2. Here comes sharedstring.xml into play.

sharedString.XML

XML
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    count="3" uniqueCount="3">
 <si><t>My First Name</t></si>
 <si><t>My Last Name</t></si>
 <si><t>Your FirstName</t></si>
</sst>

Shared string file contains string values that reside in Excel file (contains only unique string values), here in our Excel file you can see that we defined My Last Name twice but it shows only once in sharedstring.xml.

si (it is the node that contains string value).

Relation b/w sheet.xml and sharedString.xml

4.jpg

As the image shows, c node has value "s" of attribute t represents the data of string type and v node actually contains the index of data which resides in sharedString.xml.

Next Part

In the next part, we will see the code for how we are going to parse these two XML files to write and read Excel file, and few other methods.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Writer
United States United States
<Profile>
<Profession>
I am a Software Engineer from Karachi Pakistan, Being a programmer obviously I love to code but I love to code that adds something new in my knowledge else I do copy paste usually Smile | :)
</Profession>
<Education>
Have done Master and Bachelors of Computer Science from Karachi University Pakistan
</Education>
<Interests>
Anything I found interesting technically or non-technically so nothing specific Wink | ;)
</Interests>
</Profile>

Comments and Discussions

 
GeneralMy vote of 4 Pin
Wooters10-Jun-11 5:51
Wooters10-Jun-11 5:51 
GeneralRe: My vote of 4 Pin
M I developer10-Jun-11 10:56
professionalM I developer10-Jun-11 10:56 
GeneralRe: My vote of 4 [modified] Pin
M I developer28-Jun-11 2:23
professionalM I developer28-Jun-11 2:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.