Click here to Skip to main content
Click here to Skip to main content

Different Ways to Access Excel 2003 Workbooks using C#

, 21 Feb 2012
Rate this:
Please Sign up or sign in to vote.
Comparison of the Excel Object Library and the native C# library NPOI to extract information from .xls files.

Excel 2003 and Visual Studio 2010

The need to extract information from existing Excel 2003 Workbooks aroused (.xls file extension) within one of our projects. Using the newer XML based formats was not an option on our customer’s side. The Visual Studio Tools for Office of the selected IDE Visual Studio 2010 does not support Excel 2003 anymore (see Features Available by Office Application and Project Type, http://msdn.microsoft.com/en-us/library/aa942839.aspx).

Therefore COM Interop of the Microsoft Excel 11.0 Object Library was the remaining way while focusing on Microsoft provisions. But this solution depends on the installed version of Excel. So to be able to use and test the Excel 2003 binding, it is necessary to install Excel 2003 first.

The attached download (Compare_COM_vs_NPOI_v11_20120221.zip - 1 MB) of this article provides a sample Excel file too. It has some sheets with content and calculations.

322469/screenshot_excel.png

Access an Excel 2003 file using the Excel 11.0 Object Library

The following code snippet shows a simple traversal of the whole content of an Excel 2003 file. At the beginning, an instance of Excel Application is used to open the .xls file. Afterwards, all the Worksheets are traversed row by row and column by column. The value and the comment of each of the cells is printed to the console screen.

A comprehensive step by step tutorial on the COM interop side is provided in the article "Opening and Navigating Excel with C#" (http://www.codeproject.com/Articles/5123/Opening-and-Navigating-Excel-with-C).

try
{
    xlApp = new Excel.Application();

    if (null == xlApp)
    {
        Console.WriteLine("Excel could not be started. Check that your " + 
          "office installation and project references are correct.");
        return;
    }

    bool openReadOnly = true;
    xlWorkbook = xlApp.Workbooks.Open(extractFile,
        0, openReadOnly, 5, "", "", false, 
        Excel.XlPlatform.xlWindows, "",
        true, false, 0, true, false, false);
    if (null == xlWorkbook)
    {
        Console.WriteLine(string.Format(
          "Excel Workbook '{0}' could not be opened.", extractFile));
    }

    Console.WriteLine();
    foreach (Excel.Worksheet xlSheet in xlWorkbook.Worksheets)
    {
        if (null != xlSheet)
        {
            Console.WriteLine();
            Console.WriteLine("***   Worksheet " + xlSheet.Name + "   ***");

            Excel.Range usedRange = xlSheet.UsedRange;
            if ((null != usedRange) && (null != usedRange.Cells))
            {
                xlCells = usedRange.Cells;
                for (int rowNumber = 1; rowNumber <= usedRange.Rows.Count; ++rowNumber)
                {
                    if (1 == (rowNumber % 10)) { Console.WriteLine(); }
                    Console.Write(string.Format("Row {0,2}: ", rowNumber));
                    for (int colNumber = 1; colNumber <= usedRange.Columns.Count; ++colNumber)
                    {
                        dynamic cell = xlCells[rowNumber, colNumber];
                        if (null != cell)
                        {
                            string value = GetValue(cell);
                            string comment = GetComment(cell);
                            Console.Write(string.Format("{0}{1};",
                                string.IsNullOrWhiteSpace(value) ? "" : value,
                                string.IsNullOrWhiteSpace(comment) ? "" : " [" + comment + "]"));
                        }
                        else
                        {
                            Console.Write(";");
                        }
                    }
                    Console.WriteLine();
                }
            }
            Marshal.FinalReleaseComObject(xlSheet);
            Console.WriteLine();
        }
    }
    xlWorkbook.Close(Type.Missing, Type.Missing, Type.Missing);
    xlApp.Quit();
}
catch (Exception ex)
{
    Console.WriteLine("Exception: " + ex.Message);
    if (null != ex.InnerException)
    {
        Console.WriteLine("Inner Exception: " + ex.InnerException.Message);
    }
}
finally
{
    if (null != xlUsedRange) { Marshal.FinalReleaseComObject(xlUsedRange); xlUsedRange = null; }
    if (null != xlCells) { Marshal.FinalReleaseComObject(xlCells); xlCells = null; }
    if (null != xlWorkbook) { Marshal.FinalReleaseComObject(xlWorkbook); xlWorkbook = null; }
    if (null != xlApp) { Marshal.FinalReleaseComObject(xlApp); xlApp = null; }
}

To extract the content from a cell, the method GetValue is used.

private static string GetValue(dynamic cell)
{
    string ret = string.Empty;
    if (null == cell) { return ret; }
    Excel.Range singleCell = cell as Excel.Range;
    if (null == singleCell) { return ret; }
    if (null != singleCell.Text)
    {
        ret = singleCell.Text as string;
    }
    if (null == ret) { return string.Empty; }
    return ret.Replace("\n", " "); // remove line break
}

The comment of the cell is extracted with the method GetComment.

private static string GetComment(dynamic cell)
{
    string ret = string.Empty;
    if ((null == cell) || (null == cell.Comment)) { return ret; }
    ret = cell.Comment.Text as string;
    if (null == ret) { return string.Empty; }
    return ret.Replace("\n", " "); // remove line break
}

Access Excel 2003 File using the NPOI Library - Step by Step

An alternative access method is provided by the library NPOI. This library is provided on Microsoft CodePlex (http://npoi.codeplex.com/) and developed on GitHub (https://github.com/tonyqus/npoi). The implementation is performed in C# only and provides therefore a pure managed access to Excel files.

The NPOI library itself is ported from a Java project named Apache POI (Apache POI - the Java API for Microsoft Documents, http://poi.apache.org/). Both libraries are mature and actively developed.

The following code snippets use the NPOI library to print the whole content of an Excel 2003 file. Initially, the standard stream reader is used to read the XLS file. The usual standard flags can be applied to the stream reader as necessary. The stream is read from the POIFSFileSystem class. This class reads and writes Microsoft's OLE 2 Compound document format. This is the underlying base format of the Excel file format. Finally, HSSFWorkbook reflects the structure of the Excel file that is visible to the Excel user.

using (StreamReader input = new StreamReader(extractFile))
{
    IWorkbook workbook = new HSSFWorkbook(new POIFSFileSystem(input.BaseStream));
    if (null == workbook)
    {
        Console.WriteLine(string.Format("Excel Workbook '{0}' could not be opened.", extractFile));
        return;
    }
    ...
}

Next, the formulas of the Workbook are evaluated and an instance of a data formatter is created.

IFormulaEvaluator formulaEvaluator = new HSSFFormulaEvaluator(workbook);
DataFormatter dataFormatter = new HSSFDataFormatter(new CultureInfo("en-US"));

The individual sheets of the Workbook are accessible by number, name, or using an iterator. This is the iterator based example.

foreach (ISheet sheet in workbook)
{
    Console.WriteLine("\n\n***   Worksheet " + sheet.SheetName + "   ***");
    ...
}

Accessing the rows of a given sheet is done using an iterator, again. The row number is gathered from the IRow type variable and printed to console output.

foreach (IRow row in sheet)
{
    Console.Write(string.Format("Row {0,2}: ", row.RowNum + 1));
}

Now the content from each cell of the row is extracted, formatted, and written to the console. The same applies for the comments that are associated with the cell.

foreach (ICell cell in row)
{
    string value = GetValue(cell, dataFormatter, formulaEvaluator);
    string commentText = GetComment(cell);
    Console.Write(string.Format("{0}{1};",
        string.IsNullOrWhiteSpace(value) ? "" : value,
        string.IsNullOrWhiteSpace(commentText) ? "" : " [" + commentText + "]"));
}

Access Excel 2003 File using the NPOI Library - Complete Example

This is the complete code snippet to compare it with the COM Interop snippet above.

try
{
    using (StreamReader input = new StreamReader(extractFile))
    {
        IWorkbook workbook = new HSSFWorkbook(new POIFSFileSystem(input.BaseStream));
        if (null == workbook)
        {
            Console.WriteLine(string.Format("Excel Workbook '{0}' could not be opened.", extractFile));
            return;
        }
        // calculates/updates the formulas on the given workbook
        IFormulaEvaluator formulaEvaluator = new HSSFFormulaEvaluator(workbook);
        DataFormatter dataFormatter = new HSSFDataFormatter(new CultureInfo("en-US"));

        Console.WriteLine();
        foreach (ISheet sheet in workbook)
        {
            Console.WriteLine("\n\n***   Worksheet " + sheet.SheetName + "   ***");
            foreach (IRow row in sheet)
            {
                if (0 == row.RowNum % 10) { Console.WriteLine(); }
                Console.Write(string.Format("Row {0,2}: ", row.RowNum + 1));
                foreach (ICell cell in row)
                {
                    string value = GetValue(cell, dataFormatter, formulaEvaluator);
                    string commentText = GetComment(cell);
                    Console.Write(string.Format("{0}{1};",
                        string.IsNullOrWhiteSpace(value) ? "" : value,
                        string.IsNullOrWhiteSpace(commentText) ? "" : " [" + commentText + "]"));
                }
                Console.WriteLine();
            }
        }
    }
}
catch (Exception ex)
{
    Console.WriteLine("Exception: " + ex.Message);
    if (null != ex.InnerException)
    {
        Console.WriteLine("Inner Exception: " + ex.InnerException.Message);
    }
}

For extracting the content from the cell, the method GetValue is used, again. But this time, it is the NPOI variant:

private static string GetValue(ICell cell, DataFormatter dataFormatter, 
                      IFormulaEvaluator formulaEvaluator)
{
    string ret = string.Empty;
    if (null == cell) { return ret; }
    ret = dataFormatter.FormatCellValue(cell, formulaEvaluator);
    return ret.Replace("\n", " "); // remove line break
}

The comment of the cell is extracted with the method GetComment.

private static string GetComment(ICell cell)
{
    string ret = string.Empty;
    if ((null == cell) || (null == cell.CellComment)) { return ret; }
    IRichTextString str = cell.CellComment.String;
    if (str != null && str.Length > 0)
    {
        ret = str.ToString();
    }
    return ret.Replace("\n", " "); // remove line break
}

Here is a screenshot showing some of the extracted information.

322469/screenshot_extract_beginning_600px.png

Comparing NPOI and the Excel Object Library

Comparing the two snippets, it becomes obvious that NPOI provides a modern interface, e.g., it is possible to iterate the instances using foreach or the resources are automatically and safely freed. Additionally, NPOI provides a strongly type based interface that most C# developers are expecting today. The library API is designed in a clean C# way. This is in contrast to the COM Interop where most parameter types are objects.

To our surprise, the NPOI implementation is way faster than the COM alternative. The programs in the accompanying archive contain the Stopwatch used for measurements. The test machine is a Core i7 860@2.8GHz, 3 GB RAM, Win7 32 bit. Executing both programs multiple times, COM Interop requires around 2500 - 4200 ms to complete whereas the NPOI program requires only 350 - 410 ms. Even larger samples used in our project still return nearly immediately. This is especially true when avoiding the lengthy console output. But try it with your own samples.

The documentation of the NPOI library methods are available at the same level provided by standard .NET classes including Intellisense help and navigation to the documented method. The COM Interop DLL provides only the names of the methods and a parameter list where the structured types are all of the meaningless base object type. The NPOI library is available with full source code. The source code is licensed under Apache License 2.0 and can be used in any project, even commercial and proprietary ones.

Opening the Excel file for extraction as a read only file becomes natural in NPOI because of the StreamReader. COM Interop requires the use of the Open method with a long parameter list. When extracting information from a customer provided .xls file, it prints the content as desired, but additionally a "Save changes" dialog pops up after each invocation when using COM interop. Opening the .xls file as a read only file does not avoid this unwanted dialog. The reason seems to be an update of the formulas during the access of some cells. This is in contrast to NPOI that does not change the .xls file.

Excel is always running in a single instance on the PC. Therefore, it must not be used in a server setting. This limitation does not exist for NPOI.

The download archive of this article provides a Visual Studio 2010 solution with three projects. Each of the projects extracts the whole information of an Excel file. The first named ReadExcelUsingCom uses the COM Interop DLL to perform the task. This is the project where the above example at the beginning of this article is extracted. The second one named ReadExcelUsingNPOI is the example using NPOI that is explained in detail above. This program shows all the content from the Excel file. But the COM Interop program shows empty rows, too. Therefore a third program was developed that produces exactly the same output as the COM example even with NPOI. This last program is contained in the project ReadExcelUsingNPOICloneCom.

In addition to the three programs, the solution files have an up to date version of the NPOI DLL included. So these examples are expected to compile and run just out of the box. Of course, Excel 2003 is missing and must be installed separately to get the COM Interop example working. To repeat it, it is required to install Excel 2003, other versions are not supported.

The presented comparison of the NPOI library and the COM Interop of Excel 11.0 Object Library can be summarized as follows:

  • NPOI provides a strongly type based interface.
  • The standalone NPOI library does not depend on the installed version of Excel.
  • NPOI is significantly faster even when tested with only three small sheets.
  • NPOI does not change .xls file when reading it.
  • Only the Excel 11.0 Object Library is supported by Microsoft.
  • The Open Source NPOI library provides full source code.

History

  • Initial article
  • Updated download and article to NPOI 1.2.5 pre release as of 2012-02-21, svn r309
  • Source code of the NPOI library moved to https://github.com/tonyqus/npoi. Fixed reference above

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Christian Leutloff
xox Industrie IT GmbH, Aachen, Germany
Germany Germany
I am a software engineer working on industrial measurement systems. Most of my software is written in C++ and running on QNX or Linux. Accompanying software for PCs is written in WPF/C#.

Comments and Discussions

 
QuestionThis could be interesting here: PinprofessionalDietmar Schoder29-Jul-14 10:36 
QuestionWriteProtectWorkbook Example PinmemberProgramm3r25-Aug-13 20:24 
AnswerRe: WriteProtectWorkbook Example PinmemberChristian Leutloff27-Aug-13 23:57 
Questionwrite to existing file Pinmembermbowles20129-Jan-13 7:02 
QuestionIs NPOI suitable for interactive applications? Pinmembermmclean07-Mar-12 12:00 
AnswerRe: Is NPOI suitable for interactive applications? PinmemberChristian Leutloff7-Mar-12 23:04 
QuestionI recommend NPOI too PinmemberDaveRRR27-Feb-12 7:10 
QuestionVery useful! PinmemberChristoff3301-Feb-12 5:07 
AnswerRe: Very useful! PinmemberP. Vasudeva Kiran6-Feb-12 12:49 
GeneralRe: Very useful! like it. Pinmembershacohs14-Mar-12 21:40 
AnswerRe: Very useful! PinmemberBruce Goodman5-Jun-12 21:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 21 Feb 2012
Article Copyright 2012 by Christian Leutloff
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid