Click here to Skip to main content
11,635,788 members (76,777 online)
Click here to Skip to main content

Fast Excel file reader with basic functionality

, 1 Jul 2006 397.6K 11.6K 208
Rate this:
Please Sign up or sign in to vote.
A set of managed classes for native reading of data from MS Excel files

Introduction

When I was writing one of the projects (multi-format data parser) I had to deal with Microsoft Excel files.

First of all I tried to read data via COM interop, but this way was very slow (and sometimes input files have up to 100,000 rows and even more) and required Excel to be installed on the target machine.

Then I tried to read data via Excel's OleDb driver. When DataReader ran through data rows without reading them (i.e. while (rd.Read());) it was quite fast, but retrieving data was still very slow.

When I had tried out one of commercial products (TMS FlexCel Studio), it showed rather good results (several seconds for loading 17 MB file, 2 sheets, ~ 80 000 rows), but access to them was not very fast (at least not suitable for displaying data using DataGridView in virtual mode).

So, I decided to write my own reader for Excel files.

Background

First of all I decided that the reader will produce a DataSet object, with sheets represented as DataTables within it. Access to DataTable is rather fast, and its TableName property is suitable for sheet title.

To read an Excel file, we need to get BIFF stream from xls OLE2 container. Working with OLE2 container is implemented in the following classes:

  • XlsHeader (file header representation)
  • XlsFat (support for file system in OLE2 container)
  • XlsRootDirectory (directory tree representation)
  • XlsDirectoryEntry, and
  • XlsStream (file stream)

Note that implementation of OLE2 container object model is limited (e.g. I didn't write any MiniFAT support code) and Excel-specific (XlsHeader supports only MS Excel header field values, any other will produce InvalidHeaderException).

Now we can parse BIFF stream with spreadsheet data. For handy reading data from stream we have a XlsBiffStream class derived from XlsStream. It provides a set of methods for reading XlsBiffRecord common structures. And then, each structure is converted to the appropriate type. Note that only a few record types are supported. All other structures are read as basic XlsBiffRecord and are ignored.

My implementation of reader ignores any text formatting, graphics, embedded objects and charts. Only cell values are read. When a cell contains a formula, the last calculated formula value is used as the cell value. Also, only indexed sheets are read (i.e. sheets with index data) - I didn't write not-indexed sheets support due to my laziness (I didn't come across any file without an index yet).

Using the code

Now, it's very easy to use the parser. All you need is a Stream with an Excel file.

FileStream fs = new FileStream(@"c:\file.xls", 
        FileMode.Open, FileAccess.Read);
ExcelDataReader rd = new ExcelDataReader(fs);
fs.Close();

// Now we can access data:
DataSet data = rd.WorkbookData;

Points Of Interest

Latest versions of Excel use SST (Shared String Table) to store string information, and text cells only reference string by index. Also, the size of one BIFF section is limited, and SST can be rather big, so it uses the so-called CONTINUE sections. Every string in SST can be saved as Unicode (2 byte) or as ANSI (single byte per symbol). But when the string is broken with the continue section, it can change its encoding from one to another! This took me a lot of debugging, and I wish nobody has to do the same again...

Results

This implementation of reader is rather nonoptimal (I was pressed for time), but it showed a good speed. It surely beats any interop or OLE method and even some commercial analogs.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Messir
Web Developer
Russian Federation Russian Federation
Alex
.NET Developer
Russian Federation
rakemaker@gmail.com

You may also be interested in...

Comments and Discussions

 
QuestionInvalid signature error Pin
Member 110195831-Oct-14 2:25
memberMember 110195831-Oct-14 2:25 
Question.xlsx files Pin
Nikolaj jensen25-Sep-14 3:37
memberNikolaj jensen25-Sep-14 3:37 
QuestionSkip hidden rows Pin
ecunha15-Sep-14 5:34
memberecunha15-Sep-14 5:34 
QuestionReading Big Excel Files Pin
Rubal Walia8-Aug-14 3:17
professionalRubal Walia8-Aug-14 3:17 
QuestionSee also this: Pin
Dietmar Schoder29-Jul-14 5:28
professionalDietmar Schoder29-Jul-14 5:28 
Bug16 Columns problem Pin
MarcoAurelio10-Jul-12 23:19
memberMarcoAurelio10-Jul-12 23:19 
Questionsolution problem only read 16 columns Pin
linhphuongaard12-Jun-12 1:38
memberlinhphuongaard12-Jun-12 1:38 
QuestionHow to read many columns Pin
linhphuongaard11-Jun-12 6:53
memberlinhphuongaard11-Jun-12 6:53 
QuestionLabels read wrong Pin
dmitry_andrianov13-Feb-12 7:54
memberdmitry_andrianov13-Feb-12 7:54 
AnswerDateTime value are stored as floating point Pin
edvox113813-Oct-11 19:49
memberedvox113813-Oct-11 19:49 
QuestionOnly 16 columns can be imported Pin
Osman Kazi30-Sep-11 11:38
memberOsman Kazi30-Sep-11 11:38 
QuestionTrying to open an invalid file keeps the file open Pin
airadier23-Aug-11 2:58
memberairadier23-Aug-11 2:58 
GeneralMy vote of 4 Pin
stuckne12-Feb-11 5:55
memberstuckne12-Feb-11 5:55 
GeneralRe: My vote of 4 Pin
Nguyen Thanh Hai26-Jul-11 6:20
memberNguyen Thanh Hai26-Jul-11 6:20 
GeneralOops! Trying to read stream from FAT area. Pin
referee213-Dec-10 20:00
memberreferee213-Dec-10 20:00 
GeneralRe: Oops! Trying to read stream from FAT area. Pin
flash6655-Apr-11 22:36
memberflash6655-Apr-11 22:36 
Generalseems does not support multiple worksheet Pin
bluesky44851-Nov-10 2:08
memberbluesky44851-Nov-10 2:08 
GeneralDownload source Pin
sutter1828-Sep-10 22:37
membersutter1828-Sep-10 22:37 
GeneralRe: Download source Pin
blam202021-Jan-11 14:35
memberblam202021-Jan-11 14:35 
GeneralThis library doesn't seem to include the top row as headers or support xlsx format Pin
akantro8-Aug-10 3:14
memberakantro8-Aug-10 3:14 
GeneralRe: This library doesn't seem to include the top row as headers or support xlsx format Pin
kaferkopmetluise15-Apr-12 22:43
memberkaferkopmetluise15-Apr-12 22:43 
GeneralFor excel 2007 & 2010 Pin
Member 440278925-Jul-10 21:35
memberMember 440278925-Jul-10 21:35 
GeneralHidden sheets Pin
referee28-Jul-10 0:34
memberreferee28-Jul-10 0:34 
QuestionUpdate Code Pin
Joel Lovison13-Oct-09 12:25
memberJoel Lovison13-Oct-09 12:25 
GeneralHyperlinks Pin
nKognito6-Oct-09 21:39
membernKognito6-Oct-09 21:39 
GeneralError executing library in MONO Pin
eyanson26-Sep-09 12:43
membereyanson26-Sep-09 12:43 
GeneralRe: Error executing library in MONO Pin
gg423714-Oct-09 4:16
membergg423714-Oct-09 4:16 
Generalplease explain the code to me please its urgent............. Pin
prasanth.shenoy25-Jun-09 18:12
memberprasanth.shenoy25-Jun-09 18:12 
GeneralFailed to read column 16 Pin
alper ebicoglu11-Jun-09 22:34
memberalper ebicoglu11-Jun-09 22:34 
GeneralRe: Failed to read column 16 Pin
Richard at BDG20-Aug-09 4:30
memberRichard at BDG20-Aug-09 4:30 
QuestionReading a protected workbook Pin
Hervé29-Mar-09 12:41
memberHervé29-Mar-09 12:41 
QuestionDate Is not Code Pin
jatin_chauhan26-Mar-09 0:08
memberjatin_chauhan26-Mar-09 0:08 
GeneralRe: Date Is not Code Pin
Girijesh116-Apr-09 5:12
memberGirijesh116-Apr-09 5:12 
AnswerRe: Date Is not Code Pin
Murat YILMAZ3-May-09 22:34
memberMurat YILMAZ3-May-09 22:34 
GeneralRe: Date Is not Code Pin
Abu Bakr El-seddiq4-Apr-10 6:08
memberAbu Bakr El-seddiq4-Apr-10 6:08 
GeneralRe: Date Is not Code Pin
Glaukos26-Apr-10 13:42
memberGlaukos26-Apr-10 13:42 
QuestionExcel Reader does not give any result with some excel files Pin
Sandeep Maharana25-Feb-09 22:21
memberSandeep Maharana25-Feb-09 22:21 
AnswerRe: Excel Reader does not give any result with some excel files Pin
npcomplete125-Mar-09 7:35
membernpcomplete125-Mar-09 7:35 
QuestionLots of 'NOT FOUND #XXXXX' Pin
RG1-Feb-09 1:10
memberRG1-Feb-09 1:10 
AnswerRe: Lots of 'NOT FOUND #XXXXX' [modified] Pin
npcomplete12-Feb-09 2:09
membernpcomplete12-Feb-09 2:09 
GeneralRe: Lots of 'NOT FOUND #XXXXX' Pin
RG3-Feb-09 2:37
memberRG3-Feb-09 2:37 
GeneralRe: Lots of 'NOT FOUND #XXXXX' Pin
外城25-Jan-10 16:12
member外城25-Jan-10 16:12 
General'NOT FOUND #xx' proposed solution Pin
npcomplete19-Dec-08 15:53
membernpcomplete19-Dec-08 15:53 
GeneralRe: 'NOT FOUND #xx' proposed solution Pin
linuxsuperfans13-Dec-08 19:56
memberlinuxsuperfans13-Dec-08 19:56 
GeneralRe: 'NOT FOUND #xx' proposed solution Pin
npcomplete117-Dec-08 13:06
membernpcomplete117-Dec-08 13:06 
GeneralRe: 'NOT FOUND #xx' proposed solution Pin
npcomplete121-Dec-08 14:24
membernpcomplete121-Dec-08 14:24 
Questiononly can read digital Pin
linuxsuperfans5-Dec-08 22:10
memberlinuxsuperfans5-Dec-08 22:10 
AnswerRe: only can read digital Pin
linuxsuperfans6-Dec-08 2:00
memberlinuxsuperfans6-Dec-08 2:00 
AnswerRe: only can read digital Pin
npcomplete19-Dec-08 7:24
membernpcomplete19-Dec-08 7:24 
AnswerRe: only can read digital Pin
npcomplete19-Dec-08 16:06
membernpcomplete19-Dec-08 16:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.150728.1 | Last Updated 1 Jul 2006
Article Copyright 2006 by Messir
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid