Click here to Skip to main content
15,949,686 members
Articles / Programming Languages / C#

Convert PBS Legacy Files to XML

Rate me:
Please Sign up or sign in to vote.
3.54/5 (6 votes)
22 Jun 2009CPOL4 min read 27.8K   706   13  
Legacy file formats, such as UN-EDIFACT with a record per line and fixed-length fields, still exist and are widely used for B2B transactions. A tool that can convert legacy files to human-readable XML might come in handy.

Introduction

Legacy file formats, such as UN-EDIFACT with a record per line and fixed-length fields, still exist and are widely used for B2B transactions. A tool that can convert legacy files to human-readable XML might come in handy. The tool I present here converts files similar to, but not identical to UN-EDIFACT. The file format in question is used by PBS - Payment Business Services (PBS) in Denmark, see http://www.pbs.dk/en/. The tool might not be terribly relevant outside Denmark, but it does show how to deal with validating, searching and converting > 100 megabyte legacy files to XML in a fairly general manner. So I have decided to place it on CodeProject in spite of the strong local coupling to PBS in Denmark. This tool uses the class arguments from the article C#/.NET Command Line Arguments Parser, thanks to R. LOPES.

Using the Tool

The tool works like this:

pbs2Xml.exe –s InfoService.xml –i Leverance.xml –o Leverance.xml –f "John Schmidt"
  • The –s command line argument is the specification file which must follow the schema in PbsSpecification.xsd.
  • The –i argument is the input file in legacy format.
  • The –o argument is the output file in XML format. This is optional; leave it out when all you want is to validate the legacy file.
  • The –f argument is a search filter. This is optional. It can be handy when dealing with very large files. If you are looking for information regarding a specific SSN, use this option to convert only records containing that SSN.

Specification Files

DescriptionFile
Information service. Information types 100, 150: Pension and 700: LetLøn InformationsService.zip
Payment Service Invoicing: 601, section 112

BetalingsService601-0112.zip 

Payment Service Invoicing: 601, section 117 BetalingsService601-0117.zip
Payment Service Payments: 602 BetalingsService602.zip

Using the Code

I needed a tool to validate files used for business transactions in banking, pension and life insurance and convert them to XML. I also needed a general approach because the business rules for validating data were unclear. Basically I wanted a general parser that could read a legacy file with a record per line, fixed-length fields and a hierarchical record structure like the one in UN-EDIFACT documents. The parser must not know the specifics of the records, fields and validation rules. The specifics must be provided in a specification file so that changing parsing details does not require code changes, but merely changes to an XML file containing the parsing rules.

pbs2xml is just a parser, and a parser of a specific B2B legacy file format, which is only used in Denmark. This sounds like application-specific code, not suited for CodeProject!

Well, maybe not. It does however demonstrate an interesting technique: pulling out all of the business rules for parsing and validating a specific file format from the code and into an XML specification file.

The specification file must follow some ground rules that are common for all B2B files used by Payment Business Services (PBS); these rules are represented by the schema in PbsSpecification.xsd. The overall format is similar to UN-EDIFACT: one record per line with fixed-length fields and a hierarchy of record types.

The following classes model the entities in the specification schema:

  • Field Specifies the position, length and validation rule of a field in a record of fixed-length fields.
  • Set Field.Key to true if the field is part of what identifies the record.
  • Set Field.Optional to true if the field is not always supplied in the input fileRecord – Contains fields
  • Section – Contains a start record, some data records and an end record
  • Leverance – Contains Sections

The class PbsReader can read and validate an input file given a valid specification:

C#
XmlDocument spec = new XmlDocument();
spec.Load("InformationsService.xml");
Leverance leverance = new Leverance(spec);
PbsReader target = new PbsReader();
target.Read("Leverance.txt", leverance);
Console.WriteLine("Errors:");
foreach (Error error in target.Errors)
{
  Console.WriteLine(error);
}

If the input file does not honor the ground rules, a PbsFormatException is thrown. Fields with format errors are summarized in PbsReader.ErrorCount and the first 100 errors are accumulated in the collection PbsReader.Errors.

PbsReader is inherited by PbsWriter, which can convert the input file to XML.

PbsReader is inherited by PbsSearcher, which converts a selection of records to XML based on a search filter.

Points of Interest

This tool was developed by myself and my colleague Lotte Jensen during a programming course with Kent Beck. I learned at least two important things during that course:

  • I used write tests after coding for a while, waiting for the design to stabilize. Now; I start with writing the tests before writing the code.
  • Curly braces go on a new line after the method name, not at the same line. This is according to Kent Beck's principle of symmetry, wish he would take his own medicine!

History

  • March 2008: Version 1.0
  • June 2009: Bug fix - Introduced support for reading an arbitrary number of sections

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Denmark Denmark
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --