Click here to Skip to main content
15,881,248 members
Articles / Programming Languages / Visual Basic
Article

Wrapper Class for Parsing Fixed-Width or Delimited Text Files

Rate me:
Please Sign up or sign in to vote.
4.22/5 (15 votes)
15 Feb 2005CPOL4 min read 166.3K   663   41   40
This is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited) and add TextField members to the strongly-typed TextFieldCollection.

Introduction

XML files have become very common these days, but there are still many systems out there that require flat-file interfaces (especially in the financial industry). Many, many people have written code many, many times to read a line from the file, pick out the individual fields and perform some action with the retrieved data. Being faced with the prospect of doing this once again, I determined to find a better way of accomplishing this goal in a way that would be generic and reusable.

Summary

TextFieldParser is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited), and add TextField members to the strongly-typed TextFieldCollection. The TextFieldParser class will extract the desired fields from each line in the target file and is smart enough that if you are processing a delimited file and you specify that a field is quoted, it will ignore any delimiters that it may find within the quoted fields.

Each line in the text file will raise either a RecordFound or RecordFailed event, providing details appropriate to the event. If the record is a match, each TextField member of the TextFieldCollection will have its Value property updated to reflect the information parsed from the text file prior to raising the RecordFound event. If the record does not match the requested pattern, the RecordFailed event provides the line number and text of the offending record along with an error message and a boolean reference variable that you can use to abort or continue processing. In addition to verifying that each TextField object has a corresponding field in the text-file's record, the TextFieldParser will also do an explicit conversion of the text-file's value to the .NET CLR data-type specified in the TextField object. If there is a conversion error, this is trapped and returned to the caller through the use of the RecordFailed event.

The Interesting Bits

There is really nothing in this class library that is new or revolutionary. I just took some time to put it together in a way that would be useful to more than one project and more than one person. There are a lot of areas that are covered by this project, however, and they are worth pointing out so that anyone looking for simple examples of such implementations will know what to expect.

  • Enumerations
  • Delegates & Events
  • Overloaded Constructors and Methods
  • File Stream I/O
  • Strongly-Typed Collections

Usage Example

I haven't included any of the code itself because, quite frankly, there is little there that would require any sort of explanation and all of it is well commented. I have, however, included below an example of how you would instantiate, configure, call and handle call-backs from the TextFieldParser (included in the demo project is a text file that this code will process).

One thing to note is that the TextField object has properties for Length and Quoted. The Length property has meaning only if your FileFormat is FixedWidth and the Quoted property has meaning only if the FileFormat is Delimited. Otherwise, these values are ignored.

VB
Imports Utilities.Text.Parsing
Module Module1

    Dim WithEvents DataFile As New Utilities.Text.Parsing.TextFieldParser

    Sub Main()

        ' **********************
        ' Delimited File Parsing
        ' **********************

        ' Configure the base object properties
        DataFile.FileType = TextFieldParser.FileFormat.Delimited
        DataFile.FileName = "D:\DelimTestFile.txt"

        ' Add the TextField objects to the collection
        Dim quote As Boolean
        For i As Int32 = 1 To 80
            ' LineOne,"two",three,"four",five,"six" ...
            quote = (i Mod 2 = 0)
            DataFile.TextFields.Add(New TextField("Field" + _
                 i.ToString(), TypeCode.String, quote))
        Next i

        ' Parse the file
        DataFile.ParseFile()

        ' ************************
        ' Fixed Width File Parsing
        ' ************************

        ' Configure the base object properties
        DataFile.FileType = TextFieldParser.FileFormat.FixedWidth
        DataFile.FileName = "D:\FixedTestFile.txt"
        ' get rid of the old field definitions
        DataFile.TextFields.Clear()

        ' Add the TextField objects to the collection
        ' LineOne  onetwothreefourfivesixseveneightnine10 ...
        DataFile.TextFields.Add(New TextField("LineNumber", TypeCode.String, 9))
        For i As Int32 = 1 To 8
            DataFile.TextFields.Add(New TextField("Field1", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field2", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field3", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field4", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field5", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field6", TypeCode.String, 3))
            DataFile.TextFields.Add(New TextField("Field7", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field8", TypeCode.String, 5))
            DataFile.TextFields.Add(New TextField("Field9", TypeCode.String, 4))
            DataFile.TextFields.Add(New TextField("Field10", TypeCode.Int32, 2))
        Next i
        ' Parse the file
        DataFile.ParseFile()
    End Sub

    Public Sub RecordFoundHandler(ByRef CurrentLineNumber As Int32, _
                                  ByVal TextFields As TextFieldCollection) _
                                  Handles DataFile.RecordFound
        ' Do something with the field data for each record successfully matched
        For Each field As TextField In TextFields
            Console.WriteLine(field.Name + " = " + CType(field.Value, String))
        Next
        ' only process every other line in the file
        CurrentLineNumber += 2
    End Sub

    Public Sub RecordFailedHandler(ByRef CurrentLineNumber As Int32, _
                                   ByVal LineText As String, _
                                   ByVal ErrorMessage As String, _
                                   ByRef Continue As Boolean) _
                                   Handles DataFile.RecordFailed
        ' Do something with the field data for each record that fails to match
        Console.WriteLine("Num = " + CType(CurrentLineNumber, String) + _
                " : Text = " + LineText + " : Msg = " + ErrorMessage)
        Continue = True
    End Sub

End Module

Areas for Expansion

The most obvious and significant area to improve this class would be to incorporate the ability to pass in an XML stream or file that would populate and configure the TextFieldParser and TextFieldCollection automatically. This would allow you to store your configuration information in an external file that is more easily maintained. You can do this yourself, of course, but adding it into the TextFieldParser class would be ideal.

There may be other areas for improvement as well, but for now it meets my needs. If you happen to make use of this utility and have any improvements to offer, please forward them back to me. I'll be happy to incorporate those that make sense into the solution and update the article.

Update - Feb 15th, 2005

Okay, the more I looked at the issues at hand and spoke to some other developers, the more I decided that the Regular Expression method of text file parsing just wasn't going to ever perform well for wide text files. I have re-written the methods that handle the actual parsing to use more conventional methods. I use a String.Split() methodology for delimited files and a String.Substring() methodology for fixed-width files. The external interfaces remain unchanged (with the exception of some defensive argument checking and better error handling). The library still handles quoted strings in delimited files and now allows you to specify the character that you use for your quotes.

The new demo project and source code include two test files that are 80 fields wide. One is fixed-width and the other is delimited with a couple of delimiters embedded inside of quotes for good measure.

As always, please let me know what you think!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Wyvern Software
United States United States
Tony Selke is an independant consultant who has spent the last 20 years working with Microsoft technologies (VB, VC++, ASP, J++, C#, VB.NET, SQL Server, etc.) to develop solutions used in all kinds of market verticals (industrial, pharmaceutical, financial, marketing, multimedia, educational, telecommunications, etc.). He obtained his first MCSD certification in 1998 and his second in 2004, with an MCDBA in 2005. In addition, he has taught courses for MCSD certification students as well as programming classes at Penn State University.

Comments and Discussions

 
GeneralRe: New features added, but not yet released Pin
m3ntat_23-Nov-06 5:40
m3ntat_23-Nov-06 5:40 
GeneralRe: New features added, but not yet released Pin
Tony Selke24-Nov-06 2:43
Tony Selke24-Nov-06 2:43 
GeneralPoor performance on files with more than 10 fields Pin
DrewM7-Feb-05 10:25
DrewM7-Feb-05 10:25 
GeneralRe: Poor performance on files with more than 10 fields Pin
Tony Selke7-Feb-05 11:04
Tony Selke7-Feb-05 11:04 
GeneralRe: Poor performance on files with more than 10 fields Pin
Tony Selke8-Feb-05 3:02
Tony Selke8-Feb-05 3:02 
GeneralRe: Poor performance on files with more than 10 fields Pin
DrewM8-Feb-05 13:54
DrewM8-Feb-05 13:54 
GeneralRe: Poor performance on files with more than 10 fields Pin
Tony Selke15-Feb-05 5:34
Tony Selke15-Feb-05 5:34 
GeneralRe: Poor performance on files with more than 10 fields Pin
DrewM18-Feb-05 17:09
DrewM18-Feb-05 17:09 
Tony,
It worked great. Thanks for the assistance. I was just reading a blog that was tackling the same problem but was using Regular Expressions...here is a link if you are interested.

http://blogs.regexadvice.com/wayneking/archive/2004/01/12/318.aspx

Drew
Questionno header row? Pin
jyjohnson2-Feb-05 2:57
jyjohnson2-Feb-05 2:57 
AnswerRe: no header row? Pin
Tony Selke4-Feb-05 1:31
Tony Selke4-Feb-05 1:31 
AnswerRe: no header row? Pin
Tony Selke19-Feb-05 2:56
Tony Selke19-Feb-05 2:56 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.