Wrapper Class for Parsing Fixed-Width, Multiple Section Files
An article describing a wrapper class to import very large multiple section reports, typically from a legacy system, into the modern SQL Server or other RDBMS.
Introduction
I recently had a requirement to develop an import parser for a series of mainframe reports, all of which 133 bytes wide, with a multi-line header on each page, followed by n number of detail rows, and a summary section of possibly more than one line, essentially, the raw ASCII data sent to the printer. I needed to develop a library that was capable of importing this into a database, and configurable such that it could be adapted to a variety of similar import requirements.
As with any project, I started by Googling and CodeProjecting, and found an excellent article by Tony Selke: 'Wrapper Class for Parsing Fixed-Width or Delimited Text Files'. A further search brought me to WendellH's 'A Modified C# Implementation of Tony Selke's TextFieldParser', in which he had already taken the time to port the code to C#, and most importantly, adding two very important features I leveraged extensively.
- Ability to put the schema for the text file in an XML file.
- Ability to parse the text file directly to a
DataTable
.
I made the following modifications:
- Removed support for all but fixed width data.
- Configuration can only be loaded in an XML schema file.
- Ability to import variable length master/detail/summary data. Another open source C# project, FileHelpers, includes the ability to import master/detail data, but I needed something that could do one step further. This tool can import multiple hierarchies of information, carrying over any foreign key fields with a single line in the configuration file.
- Sample console application demonstrating one use case for the library.
- Removed the
RecordFound
andRecordFailed
events. - Import procedure includes the use of transactions, and demonstrates how to use transactions with
SQLCommandBuilder
generatedUpdateCommand
s!
There are many great things about this code, and I had a lots of fun working on it. Most notable of the great things about this code is how it loads its object model from a configuration file, and then uses that object model to do useful work. It also demonstrates how to use collection classes to enable a developer to iterate their custom objects using the foreach
syntax, a real time saver and a more elegant method of programming.
Enhancements: A Slightly Better Approach
After writing much of this article, and the adjoining, ever-important functional sample, I ended having to rewrite much of the work. Although the sample may be sufficient for many cases, I have also included a non-functional sample of the second version of the code. The 2.0 version uses a DataRelation
object to maintain relationships in the imported dataset. The DataReleation
version programmatically creates a DataRelation
object based upon the configuration stored in the schema XML file, and uses that to maintain relationships on the imported data.
Although non-functional, I have included most of the code you would need to exercise this code in your own application, in a file entitled Sample Import Class (Non-Functional).cs. I would imagine anyone finding a use for this code should be able to use the included code and the explanation within the Schema.xml file, to implement this in their own application. The ReportParser 2.0 is in a folder in the download, named ReportParser2.0.
What can ReportParser do?
The library can import fixed width files into a multi-table DataSet
for further processing. Furthermore, the sample included with this article writes the imported DataSet
to XML, and includes SQL scripts to create tables in your MS SQL engine of choice, and rudimentary logic to both remove any existing entries from the database contained in the import file (in the case the same file is imported more than once) and import the new data from the DataSet
into a sample database. Additionally, in my case, many report detail sections span multiple sections, and a case has been written into the library to account for multi-line detail sections.
How does it work?
The developer sets up a schema either with code or in an XML schema file. The XML schema file defines everything about the dataset to be imported. This determines the data types that will be used in each DataTable
, the location of the field in the file, the line number in a given section of the file, the data type of the field, and based on the schema, the text values are parsed and converted to the respective data types, and either put in a DataTable
or simply passed to the calling object as an event.
Using the code
The first thing to do is, add a reference to the library in your Visual Studio project. Then add the using
statement at the top of your source file.
using ReportParser;
Create an instance of the FieldParser
object.
FileParser oFP = new FileParser(filePath, schemaPath);
The following is an example of how the XML schema file would look:
<RecordDefinition>
<SECTION Name="ImportedHeader" Length="5"
StartString="1\n" SectionFormat="Header">
<FIELD Name="RunDate" StartIndex="77"
Length="8" DataType="DateTime" LineNumber="0" />
<FIELD Name="OtherDate" StartIndex="105"
Length="8" DataType="DateTime" LineNumber="0" />
<FIELD Name="Manufacturer" StartIndex="11"
Length="30" DataType="String" LineNumber="1" />
<FIELD Name="ManCode" StartIndex="52" Length="3"
DataType="String" LineNumber="1" />
<FIELD Name="Product" StartIndex="7" Length="27"
DataType="String" LineNumber="2" />
<FIELD Name="OtherCode" StartIndex="47"
Length="5" DataType="String" LineNumber="2" />
<FIELD Name="CheckDigit" StartIndex="53"
Length="1" DataType="Int32" LineNumber="2" />
</SECTION>
<SECTION Name="ImportedDetails" Length="0"
EndString="0" SectionFormat="FixedWidth">
<FIELD Name="ManCode" StartIndex="-1" Length="5"
DataType="String" LineNumber="0" />
<FIELD Name="OtherCode" StartIndex="-1"
Length="5" DataType="String" LineNumber="0" />
<FIELD Name="LocationCode" StartIndex="2"
Length="6" LineNumber="0" DataType="String" />
<FIELD Name="LocationName" StartIndex="12"
Length="27" LineNumber="0" DataType="String" />
<FIELD Name="Quantity" StartIndex="62"
Length="11" LineNumber="0" DataType="Int32" />
<FIELD Name="MoreDetail1" StartIndex="73"
Length="11" LineNumber="0" DataType="Decimal" />
<FIELD Name="MoreDetail2" StartIndex="83"
Length="10" LineNumber="0" DataType="Int32" />
</SECTION>
<SECTION Name="ImportedFooter" Length="4" SectionFormat="Footer">
<FIELD Name="ManCode" StartIndex="-1"
Length="5" DataType="String" LineNumber="0" />
<FIELD Name="OtherCode" StartIndex="-1"
Length="5" DataType="String" LineNumber="0" />
<FIELD Name="MoreDetail1" StartIndex="73"
Length="11" LineNumber="0" DataType="Decimal" />
<FIELD Name="MoreDetail2" StartIndex="83"
Length="10" LineNumber="0" DataType="Int32" />
<FIELD Name="FooterLine2Detail" StartIndex="111"
Length="11" LineNumber="1" DataType="Decimal" />
<FIELD Name="FooterLine3Detail" StartIndex="111"
Length="11" LineNumber="2" DataType="Decimal" />
</SECTION>
</RecordDefinition>
Section
The attributes of the Section
section are:
Name
: (mandatory) name of the section - this will be the name given to theDataTable
object, and most likely should map to the table being imported to. If all you need is aDataSet
, this won't be that important as you can reference each by theDataTable
ordinal.Length
: (mandatory) the number of lines contained in the section. A zero indicates multi-line.StartString
: (optional) this says what to look for in the code, and will be removed in a future update of this article. The importer assumes that the sections of hierarchical data appear in the order defined in the schema file, and the first line is a new line if you have worked with these reports before.EndString
: (mandatory with detailed sections) this says what to look for in the input file to determine the last line.SectionFormat
: (optional) this will be deprecated in the future release left over from the options. Originally this was used in this implementation to denote if the section is a header, footer, or details. Default is details.
Field
The attributes of the Field section:
Name
: (mandatory) name of the field. This name only appears in theDataTable
and maps to the name of the field in the database, and most likely should map to the table being imported to. If all you need is aDataSet
, this won't be that important as you can reference each by theDataTable
ordinal.Length
: (mandatory) the number of spaces to copy from theStartIndex
(argument forstring.SubString
).StartIndex
: (mandatory) the starting position in a line where the field starts.DataType
: (optional)DataType
of the field, default isstring
.LineNumber
: (mandatory) when the section is multi-line, specifies which line to read from. When the section is a details section, this needs to be zero.
Exceptions are very common when tweaking these settings. When an exception occurs, all data is disregarded. An exception will log to the machine's application log the error message, the stack trace, the full text of the line being processed, and the name of the field being processed. All other exceptions are logged to the application event log as well.
Now, you can call ParseToDataSet
to get the results in a DataSet
. The DataSet
will contain a DataTable
for each section
defined in the configuration file. I have only tested this in a Master/Detail/Footer scenario (like the included sample), although if it doesn't already support multiple different sections, the code should be easily adapted to that purpose.
DataSet oDS = oFP.ParseToDataSet();
oDS.WriteXml(sXMLFile);
Using the demo
Once you have downloaded the source, unzip it somewhere in the file system. If for some reason there is no default project set, set the ReportParserDemo project as the default project. Application.cs contains the Main
method, and contains the logic to enable you to run the demo to your heart's content. It also demonstrates some nice little console programming logic. Open Application.cs and App.config to change the file paths to point to the correct location for the schema file, and import the files included with the sample. I didn't feel like including the convoluted logic to determine the debug/run directory, and couldn't remember how to have the VS.NET IDE copy files from the project directory to the debug or release directories (if anyone knows, please post in the comments section of this forum, that would be great). Once paths are set, you can optionally create the necessary database tables from the included SQL script, and fire up the application.
That's it. I look forward to comments and suggestions from you all.
Similar Projects
FileHelpers v 1.3.5
The FileHelpers is an easy to use library to import/export data from fixed length or delimited files. If you want to start using the library, go directly to the Quick Start Guide.
Also has support to import/export data from different data storages like Excel, Access, SQL Server, etc.
Wrapper Class for Parsing Fixed-Width or Delimited Text Files
Original project from which this article was derived: Wrapper Class for Parsing Fixed-Width or Delimited Text Files.
A Modified C# Implementation of Tony Selke's TextFieldParser
Project from which this article was directly derived: A Modified C# Implementation of Tony Selke's TextFieldParser.
History
- 04/22/2006: Revised to include ReportParser 2.0.
- 03/02/2006: Initial revision.