Click here to Skip to main content
15,881,882 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Excel
Tip/Trick

C# How To Read .xlsx Excel File With 3 Lines of Code

Rate me:
Please Sign up or sign in to vote.
4.95/5 (91 votes)
29 Jul 2014CPOL3 min read 3.3M   48.5K   133   214
Read rows and cells from an xlsx-file: quick, dirty, effective.

Introduction

Our customers often have to import data from very simple Excel *.xslx-files into our software product: with some relevant rows and cells in rather primitive worksheets of an Excel workbook, and that's it. But we do not want to use large DLL's or third party software for that. Therefore we produced a small solution for our needs. It could be useful for you, too: 

Using the code

Download the "Excel.dll" (12 kByte, you need .net 4.5!) and add it as a reference to your project. Or adapt the compact source code before - with only the relevant classes "Workbook", "Worksheet", "Row" and "Cell" and 45 C# code lines doing important work in total. Then read the worksheets, rows and cells from any Excel *.xlsx-file in your program like so:

C#
foreach (var worksheet in Workbook.Worksheets(@"C:\ExcelFile.xlsx")
    foreach (var row in worksheet.Rows)
        foreach (var cell in row.Cells)
            // if (cell != null) // Do something with the cells

Here you iterate through the worksheets, the rows (and the cells of each row) of the Excel file within three lines of code.

Points of Interest

This article (written by M I developer) describes all the theoretical background, if you are interested in it. We based our solution on the integrated ZIP-library in .net 4.5 and the standard XML-serializer of .net.

If you want to adapt our solution to your needs: edit the simple source code for the Excel.dll. This is how it works:

Maybe you did not know that xlsx-files are ZIP-files. And the text strings of the Excel cells of all worksheets per workbook are always stored in a file named "xl/sharedStrings.xml", while the worksheets are called "xl/worksheets/sheet[1...n].xml".

So we have to unzip and deserialize the relevant XML files in the Excel xlsx-file:

C#
using System.IO.Compression;

public static IEnumerable<worksheet> Worksheets(string ExcelFileName)
{
    worksheet ws;

    using (ZipArchive zipArchive = ZipFile.Open(ExcelFileName, ZipArchiveMode.Read))
    {
        SharedStrings = DeserializedZipEntry<sst>(GetZipArchiveEntry(zipArchive, @"xl/sharedStrings.xml"));
        foreach (var worksheetEntry in (WorkSheetFileNames(zipArchive)).OrderBy(x => x.FullName))
        {
            ws = DeserializedZipEntry<worksheet>(worksheetEntry);
            ws.ExpandRows();
            yield return ws;
        }
    }
}

As you see, we also have to find all worksheets of the workbook at the beginning. We filter the ZIP-archive entries for that:

C#
private static IEnumerable<ziparchiveentry> WorkSheetFileNames(ZipArchive ZipArchive)
{
    foreach (var zipEntry in ZipArchive.Entries)
        if (zipEntry.FullName.StartsWith("xl/worksheets/sheet"))
            yield return zipEntry;
}

For deserialization of each XML formatted ZIP-entry (see also this article written by Md. Rashim uddin) we use this generic method:

C#
private static T DeserializedZipEntry<T>(ZipArchiveEntry ZipArchiveEntry)
{
    using (Stream stream = ZipArchiveEntry.Open())
        return (T)new XmlSerializer(typeof(T)).Deserialize(XmlReader.Create(stream));
}

Therefore the XML-structures have to be reflected in our classes. Here you see the "sst"-class and the "SharedString"-class for the XML in the "shared strings table":

XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="72" uniqueCount="6">
  <si>
    <t>Text A</t>
  </si>
  <si>
    <t>Text B</t>
  </si>
</sst>
C#
public class sst
{
    [XmlElement("si")]
    public SharedString[] si;

    public sst()
    {
    }
}

public class SharedString
{
    public string t;
}

The same strategy we also use for the "worksheet" -XML-file in the ZIP-file. There we focus on the XML-elements and -attributes "row", "c", "v", "r" and "t". All the work is done again by the XmlSerializer:

XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<dimension ref="A1:F12"/>
<sheetViews>
  <sheetView workbookViewId="0"></sheetView>
</sheetViews>
<sheetFormatPr baseColWidth="10" defaultRowHeight="15"/>
<sheetData>
  <row r="1">
    <c r="A1" t="s">
      <v>0</v>
    </c>
    <c r="B1" t="s">
      <v>1</v>
    </c>
    <c r="C1" t="s">
      <v>2</v>
    </c>
  </row>
</sheetData>
</worksheet>
C#
public class worksheet
{
    [XmlArray("sheetData")]
    [XmlArrayItem("row")]
    public Row[] Rows;

    public class worksheet
    {
    }
}
public class Row
{
    [XmlElement("c")]
    public Cell[] FilledCells;
}
public class Cell
{
    [XmlAttribute("r")]
    public string CellReference;
    [XmlAttribute("t")]
    public string tType = "";
    [XmlElement("v")]
    public string Value;
}

Of course we have to do a little bit in order to convert the usual Excel cell references like "A1", "B1" and so on to column indices. That is done via the setter of "CellReference" in the "Cell"-class (here we also derive the maximum column index for the whole worksheet) ...

C#
[XmlAttribute("r")]
public string CellReference
{
    get
    {
        return ColumnIndex.ToString();
    }
    set
    {
        ColumnIndex = worksheet.GetColumnIndex(value);
        if (ColumnIndex > worksheet.MaxColumnIndex)
            worksheet.MaxColumnIndex = ColumnIndex;
    }
}

... and a small method named "GetColumnIndex()":

C#
private int GetColumnIndex(string CellReference)
{
    string colLetter = new Regex("[A-Za-z]+").Match(CellReference).Value.ToUpper();
    int colIndex = 0;

    for (int i = 0; i < colLetter.Length; i++)
    {
        colIndex *= 26;
        colIndex += (colLetter[i] - 'A' + 1);
    }
    return colIndex - 1;
}

The last challenge has to do with the fact, that the Excel file does not contain empty Excel cells. So the tiny methods "ExpandRows()" and "ExpandCells()" handle that problem:

C#
public void ExpandRows()
{
    foreach (var row in Rows)
        row.ExpandCells(NumberOfColumns);
}

public void ExpandCells(int NumberOfColumns)
{
    Cells = new Cell[NumberOfColumns];
    foreach (var cell in FilledCells)
        Cells[cell.ColumnIndex] = cell;
    FilledCells = null;
}

In the end we have an array of all rows and an array of all cells for each row representing all columns of the specific Excel worksheet. Empty cells are null in the array, but the ColumnIndex of each cell in "Row.Cells[]" corresponds with the actual Excel column of each cell.

Unfortunately the XML format is not very clear about how to interpret the value of the Excel cells. We tried to do it like so, but any hint for improvement would be appreciated:

C#
if (tType.Equals("s"))
{
    Text = Workbook.SharedStrings.si[Convert.ToInt32(_value)].t;
    return;
}
if (tType.Equals("str"))
{
    Text = _value;
    return;
}
try
{
    Amount = Convert.ToDouble(_value, CultureInfo.InvariantCulture);
    Text = Amount.ToString("#,##0.##");
    IsAmount = true;
}
catch (Exception ex)
{
    Amount = 0;
    Text = String.Format("Cell Value '{0}': {1}", _value, ex.Message);
}

Besides, when you know that an Excel cell contains a date as its value, you can use this method for conversion:

C#
public static DateTime DateFromExcelFormat(string ExcelCellValue)
{
    return DateTime.FromOADate(Convert.ToDouble(ExcelCellValue));
}

Let me know how the total Excel.DLL works in your environment - and have fun with it!

History

26.7.2014 - Posted initially.

26.7.2014 - Files uploaded here.

26.7.2014 - Explanations added, formatting improved.

28.7.2014 - More explanation added. Deserialization slightly improved.

29.7.2014 - Date conversion from Excel format to DateTime via DateTime.FromOADate()

29.7.2014 - Essence of class "worksheet" added.

31.7.2014 - Comments added in the source code, XML-documentation added to the dll

31.7.2014 - The program now reads all worksheets from a workbook, not only the first one

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect www.schoder.uk
United Kingdom United Kingdom
I am a software architect and electronic musician.

Comments and Discussions

 
QuestionCan we modify the excel after reading ? Pin
Member 1180247530-Dec-23 23:47
Member 1180247530-Dec-23 23:47 
QuestionIs there a way to open the Excel file if it is opened by another process Pin
Member 1056115622-Apr-21 9:54
Member 1056115622-Apr-21 9:54 
QuestionTop Pin
Member 1403273615-Dec-20 0:26
Member 1403273615-Dec-20 0:26 
SuggestionIn Memory Processing Pin
JunoDev2-Oct-18 6:33
JunoDev2-Oct-18 6:33 
QuestionNull reference error: a solution Pin
OriginalGriff15-Dec-17 23:45
mveOriginalGriff15-Dec-17 23:45 
BugException with the given xlsx Pin
ndruha13-Oct-17 4:58
ndruha13-Oct-17 4:58 
GeneralRe: Exception with the given xlsx Pin
OriginalGriff15-Dec-17 23:45
mveOriginalGriff15-Dec-17 23:45 
QuestionDidnt work on Loadrunner Pin
Member 1142843329-Sep-17 13:19
Member 1142843329-Sep-17 13:19 
QuestionNames of worksheets Pin
NikolaB11-Sep-17 6:35
NikolaB11-Sep-17 6:35 
Praisenice code Pin
Andres A. Avila17-Apr-17 18:44
Andres A. Avila17-Apr-17 18:44 
Questionxlsx-File in use after reading cells in C#? Pin
Member 1302114710-Apr-17 4:45
Member 1302114710-Apr-17 4:45 
Hi Dietmar,
I have used your excel.dll and it works great!
There is only one problem I have. after I have read some Cell-values in my programm, and even closing my program, this excel-file is locked for using in excel. When I try to open it, there comes a warning that I can only read this file because it is still used by another program. After I have shut down my computer and restart it, the excel-file is writeble again (until I use it again im my program).
Even if I only do this part of my program, it is locked:
foreach (var worksheet in Workbook.Worksheets(FileName))
{
break;
}
Can you tell me what I can do to solve this?
AnswerRe: xlsx-File in use after reading cells in C#? Pin
Member 1170666325-Apr-17 7:58
Member 1170666325-Apr-17 7:58 
AnswerRe: xlsx-File in use after reading cells in C#? Pin
vaynenick19-Jun-17 20:13
vaynenick19-Jun-17 20:13 
QuestionCan i R/W to excel using this ?? Pin
Member 1269891723-Aug-16 21:42
Member 1269891723-Aug-16 21:42 
BugIt does not work Pin
voquanghoa4-Jul-16 7:19
voquanghoa4-Jul-16 7:19 
GeneralRe: It does not work Pin
dietmar paul schoder4-Jul-16 8:20
professionaldietmar paul schoder4-Jul-16 8:20 
QuestionSelecting Worksheets Pin
branta.canadensis27-Jun-16 3:33
branta.canadensis27-Jun-16 3:33 
AnswerRe: Selecting Worksheets Pin
dietmar paul schoder27-Jun-16 3:40
professionaldietmar paul schoder27-Jun-16 3:40 
QuestionCode skipped Pin
Member 1259965523-Jun-16 2:45
Member 1259965523-Jun-16 2:45 
AnswerRe: Code skipped Pin
dietmar paul schoder23-Jun-16 3:28
professionaldietmar paul schoder23-Jun-16 3:28 
QuestionHow could I read defined names or list of defined names in Name Manager? Pin
LeonOX21-Jun-16 1:43
LeonOX21-Jun-16 1:43 
AnswerRe: How could I read defined names or list of defined names in Name Manager? Pin
dietmar paul schoder21-Jun-16 3:03
professionaldietmar paul schoder21-Jun-16 3:03 
AnswerRe: How could I read defined names or list of defined names in Name Manager? Pin
LeonOX22-Jun-16 0:11
LeonOX22-Jun-16 0:11 
QuestionNull value returned on specific <si> elements. Pin
Member 120772587-Feb-16 7:56
Member 120772587-Feb-16 7:56 
AnswerRe: Null value returned on specific <si> elements. Pin
dietmar paul schoder7-Feb-16 12:06
professionaldietmar paul schoder7-Feb-16 12:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.