Click here to Skip to main content
Click here to Skip to main content

Using OleDb to Import Text Files (tab, CSV, custom)

By , 15 Jul 2008
 

Introduction

I have been browsing the Web for a good and simple class to handle delimited file imports. My current assignment has an import option that needs to deal with that. However, the current implementation (using StreamReader) is not good enough. It doesn't handle all the exceptions you encounter with delimited files. I found a number of examples on the Internet, but none of them really suited my needs. What I really missed was a simple example that I could extend so that it would suit my needs. So, being the developer that I am, I created my own class to import delimited files. After this was completed, I though I'd share it with others as an example.

Using StreamReader

The easiest way to process delimited files is to use a StreamReader object. You then simply open the file, read each line and then use the split method to get the various column values. For example:

public void ImportDelimitedFile(string filename, string delimiter)
{
    using (StreamReader file = new StreamReader(filename))
    {
        string line;

        while ((line = file.ReadLine()) != null)
        {
            if (line.Trim().Length > 0)
            {
                string[] columns = line.Split(delimiter, StringSplitOptions.None);
         
                // Add code to process the columns
            }
        }
    }
}

In a lot of cases this works just fine, but there are limitations to this scenario:

  • It's difficult to split a line into columns. For example, when you use Comma Separated File (CSV) it is well possible that a comma is in one of the columns. Using a simple string.Split is therefore not an option.
  • When you only need certain columns or lines, you will need to scan all of them and handle all lines and filter what you need.
  • It's not possible to return to a previous line.

Using the Jet Engine

The above mentioned problems are eliminated when you use the Jet engine. The following code shows how a CSV file can be processed:

public void ImportCsvFile(string filename)
{
    FileInfo file = new FileInfo(filename);

    using (OleDbConnection con = 
            new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
            file.DirectoryName + "\";
            Extended Properties='text;HDR=Yes;FMT=Delimited(,)';"))
    {
        using (OleDbCommand cmd = new OleDbCommand(string.Format
                                  ("SELECT * FROM [{0}]", file.Name), con))
        {
            con.Open();
 
            // Using a DataReader to process the data
            using (OleDbDataReader reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    // Process the current reader entry...
                }
            }

            // Using a DataTable to process the data
            using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
            {
                DataTable tbl = new DataTable("MyTable");
                adp.Fill(tbl);

                foreach (DataRow row in tbl.Rows)
                {
                    // Process the current row...
                }
            }
        }
    }
} 

As you can see in the example, once you have the Command object, you have the option of using anything a command object will allow you to do. You could process the file using a DataReader object, create a DataTable object containing the data or even add a where clause to the CommandText of your Command object to specify better which data is to be imported.

Helper Class

Using this, and the information provided in this Microsoft article, I created a small class that allows you to import delimited files. The class is very basic, but can easily be extended to suit your specific needs. This class will solve the most important issues when you're going to use Jet as your import engine.

Listed below are some things you need to consider when you are importing delimited files, be it with this class or using custom code:

  • The Jet engine makes assumptions about the content of the file. This can result in incorrect imports. For example, it might think a column contains date values. But in fact, your file should treat the columns as a string. In these cases, you should create a Schema.Ini file that describes the type of value for each column. The class creates a Schema.Ini file before it opens the delimited file, but only to specify what the delimiter is. You may want to change this to use pre-defined INI files that describe your input file. Details on the Schema.Ini file can be found here.
  • The class uses an OleDbDataReader to read each line in the import file. But it is easily replaced with the option of adding the data into a DataSet or DataTable object. It's also possible to use SqlBulkCopy to instantly insert all the data into a SQL server database.
  • The above mentioned Microsoft article is the best starting point for this type of import. You might want to read that before you start building imports for delimited files, even if this class is useful to you. The article provides interesting background information and links to various Microsoft resources with more details and information.
  • The helper class uses an event to allow you to handle the information being read. You can, of course, also provide an overridable method.

Valuable Resources

The information I used to build this class was found on the Internet. I used the following resources:

Disclaimer

The code presented in the helper class is not an all-purpose import solution. It's just a basic class to help you build your own import class. If you need other import types, or a way to influence the content of the default Schema.ini file, you will need to do that yourself. If you find any problems, please feel free to point them out to me.

History

  • 15th July, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Jan Schreuder
Software Developer (Senior)
Netherlands Netherlands
Member
I'm a professional software developer for a large company in the Netherlands. I have been developing software since 1988 in C, Visual Basic and C#.
 
I also blog about my work and .Net related issues on my weblog: http://bloggingabout.net/blogs/jschreuder/

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Questiondoesn't work perfect..?memberFreddieH8530 Jun '09 - 1:58 
This was just what I was looking for, but...it doesn't seem to work with double quotes around the data fields. I just got a fieldcount of 1 back in my reader (See the results below).
 
Is there a way to fix this, because i can't seem to find a fix anywhere and I don't want to implement a complete library like FileHelper.
 
Thnx in advance.
 
Regards Menno
 
Results of the example program:
 
- Start TabDelimited ------------------------------
Text line 1 | 3-7-2008 0:00:00 | 879665
Text line 2 | 9-3-1963 0:00:00 | 402500
Text line 3 | 17-4-1967 0:00:00 | 280000
Text line 4 | 22-9-2005 0:00:00 | 0
- End TabDelimited --------------------------------
 
- Start CsvDelimited ------------------------------
Text, line 1 | Text, line 1
Text, line 2 | Text, line 2
Text, line 3 | Text, line 3
Text, line 4 | Text, line 4
- End CsvDelimited --------------------------------
 
modified on Tuesday, June 30, 2009 8:39 AM

AnswerRe: doesn't work perfect..?memberJan Schreuder30 Jun '09 - 11:30 
It should work with quotes around the fields. I'll check and let you know. If there's a fix, I'll implement it in the code.
GeneralRe: doesn't work perfect..?memberJan Schreuder30 Jun '09 - 11:37 
I checked the code and it works fine, at least, I think so. My demo application has a comma separated file with double quotes around the data fields. Could you add one or more lines to this thread so I can look at your data?
GeneralRe: doesn't work perfect..?memberFreddieH8530 Jun '09 - 22:45 
Thanx for your reaction.
 
I checked it some more and it seems that it was a problem with a reg-key of the Jet-engine. When i change this key to "Delimited(,)" or delete it the code works perfect, but when i have a csv-file seperated with semicolons I just got 1 column back (see code below, the delimiter is determined with reading the first line).
 
Any advise on how I can fix this? Thanx in advance.
 
Code:
 
FileInfo file = new FileInfo(FileName);
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName
+ "\";Extended Properties='text;HDR=NO;FMT=Delimited(" + delimiter[0] + ");'";
using (OleDbConnection con = new OleDbConnection(connString))
{
using (OleDbCommand cmd = new OleDbCommand(string.Format("SELECT * FROM [{0}]", file.Name), con))
{
con.Open();
Dt = new DataTable();
OleDbDataAdapter adp = new OleDbDataAdapter(cmd);
DataTable dummyDt = new DataTable();
adp.Fill(dummyDt);
 
//process the datatable
}
}
GeneralRe: doesn't work perfect..? Fixed itmemberFreddieH8530 Jun '09 - 23:59 
Just got the bug fixed.
 
For the semicolon-seperated file I now use a schema file where I define the deliter and for the comma-seperated file I don't use a schema file. It works perfect for these to files.
 
Thanx anyway for your effort to help me.
 
Regards, Menno
GeneralUsefulmembereamonnkelly6 Mar '09 - 0:57 
Just what I was looking for ... Thanks.
GeneralNice, clean and simple!memberthompsons16 Nov '08 - 11:27 
Jan,
Thanks for this article.
 
Regards,
Steve.
GeneralNice ClassmemberP.Joshi10 Oct '08 - 3:23 
Jan Schreuder,
 
I found this very helpful as my own code was failing due to coma used as part of data content in one of the column.
Thank you very much.
 
Paresh Joshi Wink | ;)
GeneralNot another onememberPIEBALDconsult15 Jul '08 - 4:57 
Does article add anything that this[^] one doesn't have?
GeneralRe: Not another onememberJan Schreuder15 Jul '08 - 5:53 
Well, yes and no. I'll start with the No first. The article you specify in your link (and which I list as a resource) describes the basic mechanism. So from that point of view, there is nothing new.
 
But yes, because the class I provide can be added to your set of libraries, or simply to your application. And from there, you can start using it. I also provide links to more information, which I found to be useful while building my applications around the class I included in this article.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 15 Jul 2008
Article Copyright 2008 by Jan Schreuder
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid