Click here to Skip to main content
11,484,135 members (69,913 online)
Click here to Skip to main content

ASC2XXX - Two classes for parsing delimited text files

, 8 Mar 2003 212.4K 6.3K 86
Rate this:
Please Sign up or sign in to vote.
Convert delimited text files to XML file or DataSet object
<!-- Add the rest of your HTML here -->

Turn a delimited text file:

Into a DataSet:

Or an XML file:

Introduction

Two classes that illustrate one way to: read a delimited text file, parse the "fields" of data using regular expressions and move the data it into either an XML file or a DataSet object for direct use. 

.Net framework classes used:

  • System;//For strings and things
  • System.IO;//For reading and writing streams and files
  • System.Xml;//For creating and writing the XML file
  • System.Text.RegularExpressions;//For parsing the text file
  • System.Data;//to generate a DataSet

Concepts illustrated

  • Reading and writing files through stream objects
  • Parsing text using regular expressions
  • Generating a DataSet in memory from code and using it to fill a DataGrid control
  • Generating an XML file from code

Background

The reason I wrote these classes is twofold:

  • I needed to write an application that would parse a web server log file (in W3C common log format) and put that data into a SQL server database. 
  • I needed a class that I could re-use in other applications where it was necessary to move data from a CSV text file into a database.

Using the code

Although very short, the code is commented heavily throughout and contains referenced hyperlinks to the MSDN articles that explain in more detail the .Net class being used at each point in the code where relevant.

This code is set to parse a web server log file, however it can easily be modified to parse any delimited text file and I've indicated in the comments where to do so. I've also included a commented out line of an alternate regular expression that can be used to parse comma delimited text files.

A file samplelog.txt is provided with the demo which contains a test web server log file.  I have mangled the IP Addresses for privacy, however the data is straight out of an Apache server log from our web server.

I've recently started using C# after many years of working in C++ so any constructive criticism would be welcome.

History

  • Original version: Feb.26.2003

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Member 96

Canada Canada
No Biography provided

Comments and Discussions

 
GeneralThank you. Pin
m9u3523-Jan-05 10:44
memberm9u3523-Jan-05 10:44 
QuestionWhy not use a normal data connection? Pin
Paul Menefee10-Mar-04 5:19
memberPaul Menefee10-Mar-04 5:19 
AnswerRe: Why not use a normal data connection? Pin
John Cardinal10-Mar-04 6:14
memberJohn Cardinal10-Mar-04 6:14 
GeneralRe: Why not use a normal data connection? Pin
Paul Menefee10-Mar-04 6:51
memberPaul Menefee10-Mar-04 6:51 
Generaltab delimited files Pin
juliusPH19-Jan-04 17:48
memberjuliusPH19-Jan-04 17:48 
GeneralRe: tab delimited files Pin
John Cardinal19-Jan-04 18:26
memberJohn Cardinal19-Jan-04 18:26 
GeneralRe: tab delimited files Pin
juliusPH19-Jan-04 20:11
memberjuliusPH19-Jan-04 20:11 
GeneralRe: tab delimited files Pin
John Cardinal19-Jan-04 22:40
memberJohn Cardinal19-Jan-04 22:40 
juliusPH wrote:
why isn't \t recognized?

It *is* recognized, problem is that it's matching \t's instead of anything that is NOT \t. You could use the technique in the article with that regex because the article shows a method that works by matching to the separator character rather than just extracting the text you want.

That's why the S is capitalized because it's a match if absent, not match if included. Because there is no capital T equivalent for tabs you need to get a bit more complex.

What you want is this:
[^\t^\n^\r]+
I.E. match one or more (+) occurences of any character but \t or \n or \r. (assuming your file has cr/lf at the end of each record)

In code format:
Regex regex = new Regex(
@"[^\t^\n^\r]+",
RegexOptions.None
);





CLIP CLOP CLIP CLOP BANG! CLIP CLOP - Amish Drive-by shooting

GeneralRe: tab delimited files Pin
juliusPH19-Jan-04 23:32
memberjuliusPH19-Jan-04 23:32 
GeneralRe: tab delimited files Pin
simona441116-Apr-04 10:38
membersimona441116-Apr-04 10:38 
GeneralRe: tab delimited files Pin
John Cardinal16-Apr-04 13:51
memberJohn Cardinal16-Apr-04 13:51 
GeneralRe: tab delimited files Pin
oktayy27-Aug-04 12:10
sussoktayy27-Aug-04 12:10 
GeneralRe: tab delimited files Pin
Anonymous25-Aug-04 15:29
sussAnonymous25-Aug-04 15:29 
GeneralRe: tab delimited files Pin
oktayy25-Aug-04 15:29
sussoktayy25-Aug-04 15:29 
GeneralRe: tab delimited files Pin
John Cardinal25-Aug-04 19:41
memberJohn Cardinal25-Aug-04 19:41 
GeneralRe: tab delimited files Pin
Offlinesurfer18-Dec-04 15:26
memberOfflinesurfer18-Dec-04 15:26 
GeneralGreat!!!! Pin
Anonymous18-Apr-03 6:19
sussAnonymous18-Apr-03 6:19 
GeneralXML/Dataset -> delimited text Pin
jadeboy9-Mar-03 23:21
memberjadeboy9-Mar-03 23:21 
GeneralRe: XML/Dataset -> delimited text Pin
J Cardinal10-Mar-03 6:20
memberJ Cardinal10-Mar-03 6:20 
GeneralRe: XML/Dataset -> delimited text Pin
rohancragg20-Mar-03 3:01
memberrohancragg20-Mar-03 3:01 
GeneralWelcome! Pin
Rocky Moore9-Mar-03 13:27
memberRocky Moore9-Mar-03 13:27 
GeneralRe: Welcome! Pin
J Cardinal9-Mar-03 13:33
memberJ Cardinal9-Mar-03 13:33 
GeneralRe: Welcome! Pin
Rocky Moore9-Mar-03 21:08
memberRocky Moore9-Mar-03 21:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150520.1 | Last Updated 9 Mar 2003
Article Copyright 2003 by Member 96
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid