Click here to Skip to main content
Click here to Skip to main content

Faster MS Excel Reading using Office Interop Assemblies

By , 31 Mar 2005
 

Contents

Summary:

OleDB provides fast read access to Excel data, but it didn't meet my specific needs, which included accessing only certain columns and data validation. While this article will not get into these specifics, it does explain the concepts used to read Excel data in a fairly quick manner. My first pass accessed each cell one by one, which is slow and used a lot of CPU. So in looking for a better way to accomplish my goals with reasonable CPU and speed, I experimented with the Office 2003 Interop Assemblies. I found, in my opinion, a decent way to accomplish the needed speed. CPU usage can still be high, but at an acceptable trade off for my needs. My attempts at finding an article to address this situation came up short, therefore, I am writing one.

The Implementation:

Setup

We first need to setup a project with references to the Interop Assemblies. Lars-Inge Tnnessen has written a good article about this here: An introduction on how to control Excel 2003 with J#.NET. It's for J# but should translate to C# without too much effort.

Once referenced, you can add the following using statement:

    using Microsoft.Office.Interop.Excel;

I've created a console application and just kept all the code within the main method. I've done this to make it a bit easier to follow. Next, we need to setup the objects that we'll be working with.

    ApplicationClass app = new ApplicationClass(); // the Excel application.
    // the reference to the workbook,
    // which is the xls document to read from.
    Workbook book = null;
    // the reference to the worksheet,
    // we'll assume the first sheet in the book.
    Worksheet sheet = null;
    Range range = null;
    // the range object is used to hold the data
    // we'll be reading from and to find the range of data.

The following options will help speed up the Excel application. They can also be set to true which will help us see what's going on with the document while debugging.

    app.Visible = false;
    app.ScreenUpdating = false;
    app.DisplayAlerts = false;

Now that the application setup is out of the way, we can open an XLS document and get the first worksheet in the workbook. Excel seems to prefer a full path to the document. As such, I get the current executing directory and move up two directories to the XLS document.

    string execPath = 
         Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);

    book = app.Workbooks.Open(execPath + @"\..\..\Book1.xls", 
           Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value);
    sheet = (Worksheet)book.Worksheets[1];

Finding the range of data

Next, we need to get an initial range to work with. We'll start with A1, you can start with the row your data starts on, to exclude the header information.

    range = sheet.get_Range("A1", Missing.Value);

Now that we have a range to work with, we can use the get_End method of the Range object and the XlDirection enumeration to specify which direction to find the end. We'll go to the right first and down second. The get_End stops at the first empty cell. And works on the first row or column in the range. So based on our initial range selection of A1, it will look for the first empty cell in row 1 moving to the right from column A.

    range = range.get_End(XlDirection.xlToRight);

In this example, it will find cell F1 is empty, and return E1 as the end range. We'll use this range, meaning cell E1, to find the end of the data moving down.

    range = range.get_End(XlDirection.xlDown);

This will get us to cell E20. Using this method, we get the bottom right cell with data. Now we can obtain the full range of data using the starting cell, A1, and the ending cell, E20. In order to get the address of the cell from the Range, we use the get_Address method of the Range object. The XlReferenceStyle specifies the format of the address returned. We want xlA1 because the get_Range method expects that format. The following returns a string containing E20:

    string downAddress = range.get_Address(
        false, false, XlReferenceStyle.xlA1, 
        Type.Missing, Type.Missing);

We'll use the get_Range method to get a range from A1 to E20.

    range = sheet.get_Range("A1", downAddress);

We now have a reference to the data.

Reading the data

Range objects will return their data in a two dimensional array of objects with the Value2 property. Dimension one represents the rows, while dimension two represents the columns. This is much faster than reading the data cell by cell.

    object[,] values = (object[,])range.Value2;

    Console.WriteLine("Row Count: " + values.GetLength(0).ToString());
    Console.WriteLine("Col Count: " + values.GetLength(1).ToString());

With the values object array, all we need to do is loop through to get the data. We'll start by writing out the column numbers.

    Console.Write("");
    for (int j = 1; j <= values.GetLength(1); j++) {
        Console.Write("{0}", j);
    }

What we really want is the data, so we'll need to loop through the rows, then the columns to access each value in values.

    Console.WriteLine();
    for (int i = 1; i <= values.GetLength(0); i++) {
        Console.Write("{0}", i);
        for (int j = 1; j <= values.GetLength(1); j++) {
            Console.Write("{0}", values[i, j]);
        }
        Console.WriteLine();
    }

Clean up

In order for the GC to collect the objects, which can have a large memory footprint, we want to set the references to null, close the workbook, and quit the Excel application.

    range = null;
    sheet = null;
    if (book != null)
        book.Close(false, Missing.Value, Missing.Value);
    book = null;
    if (app != null)
        app.Quit();
    app = null;

Timed access

Included in the sample code is a timed access class which gets the range and then reads it two different ways (see screen shot at top). First it reads it using the method described above. Secondly, it loops through the rows and columns, reading each value from a Range object. Running this will illustrate the difference in time between the methods.

Conclusion:

The interop assemblies provide a lot of options for working with Excel data, both reading and writing. Some experimenting with the object model can yield decent performance and expose some very useful options which are not obvious from reading the documentation. Writing to Excel can be done fairly quickly using the same technique. See An introduction on how to control Excel 2003 with J#.NET for more detail in writing to Excel using this method.

Revision History:

  • 2005-03-27 - Initial writing.

References:

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Dusty Candland
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralDetermining range - slight improvement [modified]membercivil_monkey76 Jul '12 - 6:04 
If I'm still allowed to add / reply to such an old post Smile | :) My apologies if the below code has already been posted.
 
The code:
range = range.get_End(XlDirection.xlToRight);
range = range.get_End(XlDirection.xlDown);
 
which returns the location of the first empty cell in both the right hand and down direction,
 
can be replaced by:
range = range.SpecialCells(XlCellType.xlCellTypeLastCell);
 
Which selects the right- and downmost used cell. Slight difference, where the latter snippet allows you to work with tables with empty rows and/or columns.
 
Tested with Visual Studio and Excel 2010 on win7 machine.

modified 16 Aug '12 - 10:33.

QuestionCommentmemberJanith Kalhara24 Apr '12 - 2:42 
A great explanation. thnkz
Janith

GeneralMy vote of 5membertec-goblin17 Dec '10 - 0:04 
Very helpful. It works with an open file AND is faster than reading cell by cell Smile | :)
GeneralMy vote of 5memberSteve van Niman19 Aug '10 - 6:54 
Amazingly faster - Took a 1:30 second process to under a second. Watch out though, returned array is NOT zero based and is rows, then columns
Questionhow to read namedRange from Excel?memberPerlDev5 Nov '09 - 7:34 
If the worksheet has some namedRange, how to read it out?
NewsGet Rangememberdroka19844 Aug '09 - 0:15 
I have a way to get range value faster than you:
 
object oMissing = System.Reflection.Missing.Value;
Excel.Range range = sheet.get_Range("A1",oMissing);
string address = range.CurrentRegion.get_Address(oMissing, oMissing, XlReferenceStyle.xlA1, oMissing, oMissing);
range = sheet.get_Range (address,oMissing);
 
Confused | :confused:
GeneralAlternativememberFilipKrnjic8 Jul '09 - 23:46 
Hi,
 
there is much faster and easier way to deal with Excel files from .NET then Interop. Try using GemBox spreadsheet component which is free for commercial use if you need to work with less then 150 rows.
 
Filip
GemBox.Spreadsheet for .NET - Easily read and write Excel (XLS, XLSX or CSV) files or export to HTML files from your .NET apps.
RantRe: Alternativememberauxcom18 Mar '10 - 5:43 
why everywhere I go I always see you!!!
GeneralError when running TYPE_E_INVDATAREADmemberGigavisto7 Jul '08 - 1:03 
I had this exception when running the code
System.Runtime.InteropServices.COMException
Exception from HRESULT: 0x80028018 (TYPE_E_INVDATAREAD)
The solution is here:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=384846&SiteID=1
or here:
http://support.microsoft.com/kb/320369
Generalcreating excel sheet using asp.netmembers.mehrait22 Oct '07 - 2:38 
i m creating an excel sheet using asp.net (vb). i need to use data validation
using a list in excel which will be created dynamically. how can i create it ? plz help me out of this problem. i m explaining the problem as follows:
i have to create an excel having fields:
firstname,lastname,country, city
firstname and last name will be entered by user in the excel sheet
but country and city will have to be populated from the database.
later on after filling the excel sheet i have to import this excel and all the values will be saved in the database table.
how can i do it...........

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 31 Mar 2005
Article Copyright 2005 by Dusty Candland
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid