Click here to Skip to main content
Click here to Skip to main content

How to Read and Write .xlsx (Excel 2007) file - Part I

, 27 Jun 2011 CPOL
Rate this:
Please Sign up or sign in to vote.
Read and Write XLSX file via C# .NET - Part I

Introduction

Few days ago, I was trying to write an xlsx file. I thought it would be as easy as writing an xls file when you just need to write the html to Response with correct ContentEncoding, so I tried but failed. Actually, it created the xlsx file but when I tried to open it I got an error. Hmmmm, what did I do wrong? Anyway, I started Googling and found a few facts about xlsx.

Xlsx Facts

Xlsx is actually a zip file which contains a set of XML files, some files define the data, some define the metadata, some define style sheets. If you want to see it, you just need to change the extension of Excel file from .xlsx to .zip file and then extract it. You will see the following folders and a file at the root level.

  1. _rels
  2. docProps
  3. xl
  4. [Content_Types].xml

Before digging into the code, let's see what these files are for. Currently I am not going to cover all Excel features, so I am just starting it with the basics. Well, I have an xlsx file which is as follows:

1.JPG

As you can see, the above file is a very simple xlsx file containing two rows and three columns and every cell is representing data. As you can see, A1 and A2 are representing numeric type data while cells B1, B2, C1, C2 are representing string type data. Now, let's see what happens if we convert it to zip and extract it.

Currently we are focusing on two important files that need to be understood.

  1. sheet1.xml (or "yoursheetname.xml" in my case sheet name is "sheet1") (resides in xl\worksheets)
  2. sharedStrings.xml (resides in xl folder)

Sheet1.XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 
  xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
 <dimension ref="A1:C2" />
 <sheetViews>
   <sheetView tabSelected="1" workbookViewId="0">
   <selection />
   </sheetView>
 </sheetViews>
 <sheetFormatPr defaultRowHeight="15" />
 <sheetData>
   <row r="1" spans="1:3">
     <c r="A1"><v>1</v></c>
     <c r="B1" t="s"><v>0</v>
     </c><c r="C1" t="s"><v>1</v></c>
   </row>
   <row r="2" spans="1:3">
     <c r="A2"><v>2</v></c>
     <c r="B2" t="s"><v>2</v></c>
     <c r="C2" t="s"><v>1</v></c>
   </row>
 </sheetData>
 <pageMargins left="0.7" right="0.7" top="0.75" 
	bottom="0.75" header="0.3" footer="0.3" />
</worksheet>

Well, there is a lot to understand in sheet1.xml but currently we are focusing on a few.

  1. WorkSheet (it is the parent node or root node)
  2. dimension (the value of attribute named "ref" would be the starting and ending cell number that contains the data of Excel file. Excel file that I am using in this example contains data from A1 to C2 as you can see in the Excel file image.
  3. sheetData (that is the main node which contains all the information about rows, columns and data)

Sheet Data contains child nodes named "row" which represent the rows in Excel file where the value of attribute "r" represents the name of row. Our Excel file contains two rows and sheet1.xml file contains two rows tags. Now let's look at how it represents cell information.

Under the row node, there is a node named c representing cell information.
where value of:

  • attribute "r" represents name of cell,
  • attribute "t" represents type of data that cell contains

Under the c node, there is node v which represents the value that cell contains. But the question is if it represents the value of cell, then what happened to the value of cell of B1, B2, C1, C2. Here comes sharedstring.xml into play.

sharedString.XML

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    count="3" uniqueCount="3">
 <si><t>My First Name</t></si>
 <si><t>My Last Name</t></si>
 <si><t>Your FirstName</t></si>
</sst>

Shared string file contains string values that reside in Excel file (contains only unique string values), here in our Excel file you can see that we defined My Last Name twice but it shows only once in sharedstring.xml.

si (it is the node that contains string value).

Relation b/w sheet.xml and sharedString.xml

4.jpg

As the image shows, c node has value "s" of attribute t represents the data of string type and v node actually contains the index of data which resides in sharedString.xml.

Next Part

In the next part, we will see the code for how we are going to parse these two XML files to write and read Excel file, and few other methods.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

M I developer
Technical Lead 10pearls
United States United States
<Profile>
<Profession>
I am a Software Engineer from Karachi Pakistan, Being a programmer obviously I love to code but I love to code that adds something new in my knowledge else I do copy paste usually Smile | :)
</Profession>
<Education>
Have done Master and Bachelors of Computer Science from Karachi University Pakistan
</Education>
<Interests>
Anything I found interesting technically or non-technically so nothing specific Wink | ;)
</Interests>
</Profile>

Comments and Discussions

 
QuestionWhat about the values of A1 and A2 PinmemberMember 106415158-Aug-14 0:17 
QuestionMessage Automatically Removed PinmemberJamie Calops20-Oct-13 23:46 
GeneralMy vote of 5 PinmemberVitorHugoGarcia23-Apr-13 7:15 
GeneralRe: My vote of 5 PinmemberM I developer24-Apr-13 2:29 
QuestionEncoding formula ??? PinmemberMember 966179217-Dec-12 0:35 
Questioncreate excel 2003 file with exellibrary.dll Pinmembermike231528-Jun-11 3:38 
GeneralReady Libraries PinmemberSarafian11-Jun-11 0:30 
GeneralRe: Ready Libraries PinmemberRichard Deeming14-Jun-11 7:47 
GeneralRe: Ready Libraries PinmemberDaveRRR14-Jun-11 8:48 
ExcelDataReader. There's a clue in the name. It can only read Excel files. It cannot write them. Its latest release was almost 2 years ago and there appear to be quite a few bugs still left unresolved. The source code is not available so we cannot fix those bugs either.
 
The Microsoft Open XML SDK 2.0 is very powerful but it has 2 shortcomings. It is quite low-level so we have to write quite a bit of code around it to achieve anything. It also uses the Packaging API which I believe is the reason why writing anything more than a few 1,000 rows (of mostly string values) is grindingly slow.
 
EPPlus and OpenExcel are fairly new so I haven't tried them out yet. They both sit on top of the Microsoft Open XML SDK, which should make it much easier to use. However, neither of them have made the source code available. This could leave us open to significant problems if we encounter a bug. They both claim to be quick so I will have to try them out. I suspect they will be found wanting because the Packaging API will cause performance issues.
GeneralRe: Ready Libraries Pinmemberfrankazoid28-Jun-11 1:31 
GeneralRe: Ready Libraries PinmemberDharmesh Hadiyal28-Jan-14 1:56 
GeneralMessage Automatically Removed Pinmemberclaudiosindl7-Sep-12 2:55 
GeneralRe: Ready Libraries PinmemberSarafian7-Sep-12 3:46 
GeneralMy vote of 4 PinmemberWooters10-Jun-11 6:51 
GeneralRe: My vote of 4 PinmemberM I developer10-Jun-11 11:56 
GeneralRe: My vote of 4 [modified] PinmemberM I developer28-Jun-11 3:23 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141223.1 | Last Updated 27 Jun 2011
Article Copyright 2011 by M I developer
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid