Few days ago, I was trying to write an xlsx file. I thought it would be as easy as writing an xls file when you just need to write the html to
Response with correct
ContentEncoding, so I tried but failed. Actually, it created the xlsx file but when I tried to open it I got an error. Hmmmm, what did I do wrong? Anyway, I started Googling and found a few facts about xlsx.
Xlsx is actually a zip file which contains a set of XML files, some files define the data, some define the metadata, some define style sheets. If you want to see it, you just need to change the extension of Excel file from .xlsx to .zip file and then extract it. You will see the following folders and a file at the root level.
Before digging into the code, let's see what these files are for. Currently I am not going to cover all Excel features, so I am just starting it with the basics. Well, I have an xlsx file which is as follows:
As you can see, the above file is a very simple xlsx file containing two rows and three columns and every cell is representing data. As you can see, A1 and A2 are representing numeric type data while cells B1, B2, C1, C2 are representing string type data. Now, let's see what happens if we convert it to zip and extract it.
Currently we are focusing on two important files that need to be understood.
- sheet1.xml (or "yoursheetname.xml" in my case sheet name is "
sheet1") (resides in xl\worksheets)
- sharedStrings.xml (resides in xl folder)
="1.0" ="utf-8" ="yes"
<dimension ref="A1:C2" />
<sheetView tabSelected="1" workbookViewId="0">
<sheetFormatPr defaultRowHeight="15" />
<row r="1" spans="1:3">
<c r="B1" t="s"><v>0</v>
</c><c r="C1" t="s"><v>1</v></c>
<row r="2" spans="1:3">
<c r="B2" t="s"><v>2</v></c>
<c r="C2" t="s"><v>1</v></c>
<pageMargins left="0.7" right="0.7" top="0.75"
bottom="0.75" header="0.3" footer="0.3" />
Well, there is a lot to understand in sheet1.xml but currently we are focusing on a few.
WorkSheet (it is the parent node or root node)
dimension (the value of attribute named "ref" would be the starting and ending cell number that contains the data of Excel file. Excel file that I am using in this example contains data from A1 to C2 as you can see in the Excel file image.
sheetData (that is the main node which contains all the information about rows, columns and data)
Sheet Data contains child nodes named "
row" which represent the rows in Excel file where the value of attribute "
r" represents the name of row. Our Excel file contains two rows and sheet1.xml file contains two rows tags. Now let's look at how it represents cell information.
Under the row node, there is a node named c representing cell information.
where value of:
- attribute "r" represents name of cell,
- attribute "t" represents type of data that cell contains
Under the c node, there is node v which represents the value that cell contains. But the question is if it represents the value of cell, then what happened to the value of cell of B1, B2, C1, C2. Here comes sharedstring.xml into play.
="1.0" ="utf-8" ="yes"
<si><t>My First Name</t></si>
<si><t>My Last Name</t></si>
Shared string file contains
string values that reside in Excel file (contains only unique
string values), here in our Excel file you can see that we defined My Last Name twice but it shows only once in sharedstring.xml.
si (it is the node that contains string value).
Relation b/w sheet.xml and sharedString.xml
As the image shows, c node has value "s" of attribute t represents the data of
string type and v node actually contains the index of data which resides in sharedString.xml.
In the next part, we will see the code for how we are going to parse these two XML files to write and read Excel file, and few other methods.