Click here to Skip to main content
Click here to Skip to main content

A Modified C# Implementation of Tony Selke's TextFieldParser

By , 27 Feb 2005
 

Application after parsing 1,500,000 fields.

Introduction

One day while browsing the Code Project, I found an excellent article by Tony Selke 'Wrapper Class for Parsing Fixed-Width or Delimited Text Files'. I decided that I would port the code to C# because that is my language of choice. While doing this, I also added a couple of features:

  1. I added the ability to put the schema for the text file in an XML file.
  2. I added the ability to parse the text file directly to a DataTable.

This is my first article submitted to the Code Project, so be gentle.

What can it do?

The library can import delimited or fixed width files while the developer decides what to do with each record by subscribing to the RecordFound event. The library can import delimited or fixed width files directly into a DataTable.

How does it work?

The developer sets up a schema either with code or in an XML schema file. This determines the data types that will be used in the DataTable. Based on the schema, the text values are parsed and converted to the respective data types and either put in a DataTable or simply passed to the calling object as an event.

Using the code

The first thing to do is, add a reference to the library. Then add the using statement at the top of your source file.

using WhaysSoftware.Utilities.FileParsers;

Create an instance of the TextFieldParser object.

TextFieldParser tfp = new TextFieldParser(filePath);

If you will be using an XML schema file, use the constructor that has the 'schemaFile' parameter.

TextFieldParser tfp = new TextFieldParser(filePath, schemaPath);

If using an XML schema file, the following is an example of how the XML schema file would look:

<TABLE Name="TEST" FileFormat="Delimited" ID="Table1">
    <FIELD Name="LineNumber" DataType="Int32" />
    <FIELD Name="Quoted String" DataType="String" Quoted="true" />
    <FIELD Name="Unquoted String" DataType="String" Quoted="false" />
    <FIELD Name="Double" DataType="Double" />
    <FIELD Name="Boolean" DataType="Boolean" />
    <FIELD Name="Decimal" DataType="Decimal" />
    <FIELD Name="DateTime" DataType="DateTime" />
    <FIELD Name="Int16" DataType="Int16" />
</TABLE>

I have included with the source code a complete description of the schema file attributes. Here is an example of the same thing, but done in code.

TextFieldCollection fields = new TextFieldCollection();
fields.Add(new TextField("Line Number", TypeCode.Int32));
fields.Add(new TextField("Quoted String", TypeCode.String, true));
fields.Add(new TextField("Unquoted String", TypeCode.String, false));
fields.Add(new TextField("Double", TypeCode.Double));
fields.Add(new TextField("Boolean", TypeCode.Boolean));
fields.Add(new TextField("Decimal", TypeCode.Decimal));
fields.Add(new TextField("DateTime", TypeCode.DateTime));
fields.Add(new TextField("Int16", TypeCode.Int16));
tfp.TextFields = fields;

Now you can either subscribe to the RecordFound event if you want to do something custom with the records...

tfp.RecordFound += new RecordFoundHandler(tfp_RecordFound);
tfp.ParseFile();
...
private void tfp_RecordFound(ref int CurrentLineNumber, 
                                         TextFieldCollection TextFields)
{
    //Do something with the TextFields parameter
}

or you can call ParseToDataTable to get the results in a DataTable.

DataTable dt = tfp.ParseToDataTable();

Note: Even when calling ParseToDataTable, the RecordFound event is still fired.

You can also subscribe to the RecordFailed event to get notification of when a record fails to parse. In the event handler, you can decide if you can continue or not.

tfp.RecordFailed += new RecordFailedHandler(tfp_RecordFailed);
...
private void tfp_RecordFailed(ref int CurrentLineNumber, 
        string LineText, string ErrorMessage, ref bool Continue)
{
    MessageBox.Show("Error: " + ErrorMessage + Environment.NewLine + 
                            "Line: " + LineText);
}

That's it. I look forward to comments, suggestions from you all.

History

  • 02/27/2005

    Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

WendellH
Web Developer
United States United States
Member
I have BA in Computer Science from a small college in Indiana. I have been programming for about 7 years - mostly business applications.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionError when single-field file missing end quotememberPeterGomis10 May '07 - 8:13 
I've come across a problem parsing a file that has a single field of quoted string values in it, such as:
 
"Value 1"
"Value 2"
"Value 2"
...
 
If one of the above values is missing the end quote, an IndexOutOfRangeException occurs. This happens in the RecombineQuotedFields(ref string[] fields) method on the following line within the do loop:
 
firstChar = fields[x][0];
 
What is the intended logic in the RecombineQuotedFields method?
 
Please advise if you can. Thanks.
AnswerRe: Error when single-field file missing end quote PinmemberWendellH10 May '07 - 10:04 
First of all, thank you for the comment.
 
Wow, hard for me to remember... I wrote this so long ago.
 
In the example you show with "Value 1", "Value 2", etc...
If one of the values is missing an ending quote, then I would say the file is not in correct CSV format. I will need to see what the CSV standards say about that.
Anyway if it is incorrect, it should fail gracefully, otherwise it should work.
I will update the application and library when I get a chance.
 
To answer your question about the logic...
 
The library uses the String.Split() method with the delimiter to split the lines based on the delimeter. Because a string field value can contain the delimiter, I have to see if the Split method split on a delimitter when it shouldn't have. If so, I need to combine the field values where they were split incorrectly.
 
I hope this helps. Smile | :)
GeneralRe: Error when single-field file missing end quote PinmemberPeterGomis11 May '07 - 7:48 
I spent some time looking at the code, and determined that the clode block was not handling a file with a single field. It seems that the RecombineQuotedFields method attempt to recombine field that might have the delimiter character in the data. When reading a field that does not end with a quote, it looks at the next field to see if it needs to combine the two. If the file only has one field, and it doesn't have an end-quote for some reason, the read-ahead to the next field blows up with the IndexOutOfRangeException.
 
All we did was add a test to see if we are the upper bound of the array or not. If we already are, don't bother looking to the next field to attempt to recombine.
 
Within the RecombineQuotedFields method, the do-loop now looks like this:
 
do
{
//skip to the next item
x += 1;
 
//is there another field to check against?
if (x <= fields.GetUpperBound(0))
{

//get the new "endpoints"
firstChar = fields[x][0];
lastChar = fields[x][fields[x].Length - 1];
//this field better not start with a quote
if (firstChar == m_TextFieldSchema.QuoteCharacter && fields[x].Length > 1)
throw new ApplicationException("There was an unclosed quotation mark.");
 
// recombine the items
fields.SetValue(String.Concat(fields[startIndex].ToString(), m_TextFieldSchema.Delimiter, fields[x].ToString()), startIndex);
//flush the unused array element
Array.Clear(fields, x, 1);
}
else
{
throw new ApplicationException("There was an unclosed quotation mark.");
}

} while (lastChar != quoteChar);

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 27 Feb 2005
Article Copyright 2005 by WendellH
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid