XML files have become very common these days, but there are still many systems out there that require flat-file interfaces (especially in the financial industry). Many, many people have written code many, many times to read a line from the file, pick out the individual fields and perform some action with the retrieved data. Being faced with the prospect of doing this once again, I determined to find a better way of accomplishing this goal in a way that would be generic and reusable.
TextFieldParser is a utility class that will allow you to parse fixed-width or delimited text files in an event-driven model. Simply specify the file location, the format type (fixed vs. delimited), and add
TextField members to the strongly-typed
TextFieldParser class will extract the desired fields from each line in the target file and is smart enough that if you are processing a delimited file and you specify that a field is quoted, it will ignore any delimiters that it may find within the quoted fields.
Each line in the text file will raise either a
RecordFailed event, providing details appropriate to the event. If the record is a match, each
TextField member of the
TextFieldCollection will have its
Value property updated to reflect the information parsed from the text file prior to raising the
RecordFound event. If the record does not match the requested pattern, the
RecordFailed event provides the line number and text of the offending record along with an error message and a boolean reference variable that you can use to abort or continue processing. In addition to verifying that each
TextField object has a corresponding field in the text-file's record, the
TextFieldParser will also do an explicit conversion of the text-file's value to the .NET CLR data-type specified in the
TextField object. If there is a conversion error, this is trapped and returned to the caller through the use of the
The Interesting Bits
There is really nothing in this class library that is new or revolutionary. I just took some time to put it together in a way that would be useful to more than one project and more than one person. There are a lot of areas that are covered by this project, however, and they are worth pointing out so that anyone looking for simple examples of such implementations will know what to expect.
- Delegates & Events
- Overloaded Constructors and Methods
- File Stream I/O
- Strongly-Typed Collections
I haven't included any of the code itself because, quite frankly, there is little there that would require any sort of explanation and all of it is well commented. I have, however, included below an example of how you would instantiate, configure, call and handle call-backs from the
TextFieldParser (included in the demo project is a text file that this code will process).
One thing to note is that the
TextField object has properties for
Length property has meaning only if your
FixedWidth and the
Quoted property has meaning only if the
Delimited. Otherwise, these values are ignored.
Dim WithEvents DataFile As New Utilities.Text.Parsing.TextFieldParser
DataFile.FileType = TextFieldParser.FileFormat.Delimited
DataFile.FileName = "D:\DelimTestFile.txt"
Dim quote As Boolean
For i As Int32 = 1 To 80
quote = (i Mod 2 = 0)
DataFile.TextFields.Add(New TextField("Field" + _
i.ToString(), TypeCode.String, quote))
DataFile.FileType = TextFieldParser.FileFormat.FixedWidth
DataFile.FileName = "D:\FixedTestFile.txt"
DataFile.TextFields.Add(New TextField("LineNumber", TypeCode.String, 9))
For i As Int32 = 1 To 8
DataFile.TextFields.Add(New TextField("Field1", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field2", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field3", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field4", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field5", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field6", TypeCode.String, 3))
DataFile.TextFields.Add(New TextField("Field7", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field8", TypeCode.String, 5))
DataFile.TextFields.Add(New TextField("Field9", TypeCode.String, 4))
DataFile.TextFields.Add(New TextField("Field10", TypeCode.Int32, 2))
Public Sub RecordFoundHandler(ByRef CurrentLineNumber As Int32, _
ByVal TextFields As TextFieldCollection) _
For Each field As TextField In TextFields
Console.WriteLine(field.Name + " = " + CType(field.Value, String))
CurrentLineNumber += 2
Public Sub RecordFailedHandler(ByRef CurrentLineNumber As Int32, _
ByVal LineText As String, _
ByVal ErrorMessage As String, _
ByRef Continue As Boolean) _
Console.WriteLine("Num = " + CType(CurrentLineNumber, String) + _
" : Text = " + LineText + " : Msg = " + ErrorMessage)
Continue = True
Areas for Expansion
The most obvious and significant area to improve this class would be to incorporate the ability to pass in an XML stream or file that would populate and configure the
TextFieldCollection automatically. This would allow you to store your configuration information in an external file that is more easily maintained. You can do this yourself, of course, but adding it into the
TextFieldParser class would be ideal.
There may be other areas for improvement as well, but for now it meets my needs. If you happen to make use of this utility and have any improvements to offer, please forward them back to me. I'll be happy to incorporate those that make sense into the solution and update the article.
Update - Feb 15th, 2005
Okay, the more I looked at the issues at hand and spoke to some other developers, the more I decided that the Regular Expression method of text file parsing just wasn't going to ever perform well for wide text files. I have re-written the methods that handle the actual parsing to use more conventional methods. I use a
String.Split() methodology for delimited files and a
String.Substring() methodology for fixed-width files. The external interfaces remain unchanged (with the exception of some defensive argument checking and better error handling). The library still handles quoted strings in delimited files and now allows you to specify the character that you use for your quotes.
The new demo project and source code include two test files that are 80 fields wide. One is fixed-width and the other is delimited with a couple of delimiters embedded inside of quotes for good measure.
As always, please let me know what you think!