Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Faster MS Excel Reading using Office Interop Assemblies

0.00/5 (No votes)
31 Mar 2005 2  
An article on reading Excel documents using C# and the MS Office 2003 Interop Assemblies.

Contents

Summary:

OleDB provides fast read access to Excel data, but it didn't meet my specific needs, which included accessing only certain columns and data validation. While this article will not get into these specifics, it does explain the concepts used to read Excel data in a fairly quick manner. My first pass accessed each cell one by one, which is slow and used a lot of CPU. So in looking for a better way to accomplish my goals with reasonable CPU and speed, I experimented with the Office 2003 Interop Assemblies. I found, in my opinion, a decent way to accomplish the needed speed. CPU usage can still be high, but at an acceptable trade off for my needs. My attempts at finding an article to address this situation came up short, therefore, I am writing one.

The Implementation:

Setup

We first need to setup a project with references to the Interop Assemblies. Lars-Inge Tnnessen has written a good article about this here: An introduction on how to control Excel 2003 with J#.NET. It's for J# but should translate to C# without too much effort.

Once referenced, you can add the following using statement:

    using Microsoft.Office.Interop.Excel;

I've created a console application and just kept all the code within the main method. I've done this to make it a bit easier to follow. Next, we need to setup the objects that we'll be working with.

    ApplicationClass app = new ApplicationClass(); // the Excel application.

    // the reference to the workbook,

    // which is the xls document to read from.

    Workbook book = null;
    // the reference to the worksheet,

    // we'll assume the first sheet in the book.

    Worksheet sheet = null;
    Range range = null;
    // the range object is used to hold the data

    // we'll be reading from and to find the range of data.

The following options will help speed up the Excel application. They can also be set to true which will help us see what's going on with the document while debugging.

    app.Visible = false;
    app.ScreenUpdating = false;
    app.DisplayAlerts = false;

Now that the application setup is out of the way, we can open an XLS document and get the first worksheet in the workbook. Excel seems to prefer a full path to the document. As such, I get the current executing directory and move up two directories to the XLS document.

    string execPath = 
         Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);

    book = app.Workbooks.Open(execPath + @"\..\..\Book1.xls", 
           Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value, Missing.Value, 
           Missing.Value, Missing.Value, Missing.Value);
    sheet = (Worksheet)book.Worksheets[1];

Finding the range of data

Next, we need to get an initial range to work with. We'll start with A1, you can start with the row your data starts on, to exclude the header information.

    range = sheet.get_Range("A1", Missing.Value);

Now that we have a range to work with, we can use the get_End method of the Range object and the XlDirection enumeration to specify which direction to find the end. We'll go to the right first and down second. The get_End stops at the first empty cell. And works on the first row or column in the range. So based on our initial range selection of A1, it will look for the first empty cell in row 1 moving to the right from column A.

    range = range.get_End(XlDirection.xlToRight);

In this example, it will find cell F1 is empty, and return E1 as the end range. We'll use this range, meaning cell E1, to find the end of the data moving down.

    range = range.get_End(XlDirection.xlDown);

This will get us to cell E20. Using this method, we get the bottom right cell with data. Now we can obtain the full range of data using the starting cell, A1, and the ending cell, E20. In order to get the address of the cell from the Range, we use the get_Address method of the Range object. The XlReferenceStyle specifies the format of the address returned. We want xlA1 because the get_Range method expects that format. The following returns a string containing E20:

    string downAddress = range.get_Address(
        false, false, XlReferenceStyle.xlA1, 
        Type.Missing, Type.Missing);

We'll use the get_Range method to get a range from A1 to E20.

    range = sheet.get_Range("A1", downAddress);

We now have a reference to the data.

Reading the data

Range objects will return their data in a two dimensional array of objects with the Value2 property. Dimension one represents the rows, while dimension two represents the columns. This is much faster than reading the data cell by cell.

    object[,] values = (object[,])range.Value2;

    Console.WriteLine("Row Count: " + values.GetLength(0).ToString());
    Console.WriteLine("Col Count: " + values.GetLength(1).ToString());

With the values object array, all we need to do is loop through to get the data. We'll start by writing out the column numbers.

    Console.Write("");
    for (int j = 1; j <= values.GetLength(1); j++) {
        Console.Write("{0}", j);
    }

What we really want is the data, so we'll need to loop through the rows, then the columns to access each value in values.

    Console.WriteLine();
    for (int i = 1; i <= values.GetLength(0); i++) {
        Console.Write("{0}", i);
        for (int j = 1; j <= values.GetLength(1); j++) {
            Console.Write("{0}", values[i, j]);
        }
        Console.WriteLine();
    }

Clean up

In order for the GC to collect the objects, which can have a large memory footprint, we want to set the references to null, close the workbook, and quit the Excel application.

    range = null;
    sheet = null;
    if (book != null)
        book.Close(false, Missing.Value, Missing.Value);
    book = null;
    if (app != null)
        app.Quit();
    app = null;

Timed access

Included in the sample code is a timed access class which gets the range and then reads it two different ways (see screen shot at top). First it reads it using the method described above. Secondly, it loops through the rows and columns, reading each value from a Range object. Running this will illustrate the difference in time between the methods.

Conclusion:

The interop assemblies provide a lot of options for working with Excel data, both reading and writing. Some experimenting with the object model can yield decent performance and expose some very useful options which are not obvious from reading the documentation. Writing to Excel can be done fairly quickly using the same technique. See An introduction on how to control Excel 2003 with J#.NET for more detail in writing to Excel using this method.

Revision History:

  • 2005-03-27 - Initial writing.

References:

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here