Click here to Skip to main content
Click here to Skip to main content

Read Text File (txt, csv, log, tab, fixed length)

, 9 Jun 2008 CPOL
Rate this:
Please Sign up or sign in to vote.
This article is mainly focused on reading text files efficiently. It includes log, csv, tab delimited, fixed length files, etc. Instead of using StreamReader(.NET)/FileSystemObject (VB 6.0), we consider the file as a database table and read the data by querying it.

Introduction

We come across many situations where we need to read the data and process it from the text files (.txt, .CSV, .tab). The most common way is to use the StreamReader (.NET) / FileSystemObject (VB 6.0) and read the file line by line.

Suppose we are able to read the file as a database table and process the data by querying the table, we find many disadvantages with the above approach.

Some of the disadvantages are:

  1. Connected environment. Locks the file until the process is complete.
  2. We have to split each line separately to get individual columns of data. It is difficult to handle the rows which have comma as part of its data.
    Example: “Yes, comma is here”,2,”Hello, another comma”,4,5,6
  3. We do not have the option to filter the data until we read the line and start processing it.
  4. Counting the number of records in the file or counting the number of records of a particular type involves reading the whole file.

We can list out many such disadvantages. All the advantages of using the database table querying can be counted for the disadvantages of reading line by line.

Now, can we read the text file as a database table?
We wouldn't have discussed this topic if the answer is no. YES, we can read the file as a database table and we will overcome all the disadvantages mentioned above easily.

Read the Data

It is as simple as connecting to a database and querying the data from the database table. In this case, we consider the text file as table and the containing folder as database.
The steps to read the data are as follows:

  1. Open a database connection
    Note: Connection string is very important here. It varies depending on the type of the file we are trying to read. We will discuss this later in this article.
  2. Get the result set using the basic query language.
  3. Loop through the records and read the fields.

C# Code

DataSet myData = new DataSet();
string myXML;
string strFilePath = "C:\\";
string mySelectQuery = "SELECT * FROM SampleFile.CSV";
OleDbConnection myConnection = new OleDbConnection
		("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\\;" + 
                   "Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
OleDbDataAdapter dsCmd = new OleDbDataAdapter(mySelectQuery, myConnection);
//Fill the DataSet object
dsCmd.Fill(myData, "CustomerOwners");
//Create a XML document with the table data
myData.WriteXml("D:\\TestXML.xml");
myConnection.Close();

VB 6.0 Code

'Set the database connection
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
                   "Data Source=" & strFilePath & ";" & _
                   "Extended Properties=""text;HDR=YES;FMT=Delimited"""

'Query from the file assuming it as a database table
objRecordset.Open "SELECT * FROM " & fileName, _
          objConnection, adOpenStatic, adLockOptimistic, adCmdText

'Loop through the records and insert in to the table
Do Until objRecordset.EOF
    Debug.Print objRecordset.Fields.Item("Number")
    Debug.Print objRecordset.Fields.Item("Name")
objRecordset.MoveNext
Loop

We read the text file as a database table and upload it in the database. Let us see the advantages with this approach:

  1. This process is fast.
  2. The first thing that we need to observe is the way in which it handles the data row and splits the data. Suppose you have a comma as part of text in one of the columns (point 2 of disadvantages listed above), this process automatically handles this case. We need not handle it separately.
  3. We can select specific set of rows from the file if we give selection criteria
    Example: "SELECT * FROM SampleFile1.CSV WHERE Number >= 3"
  4. By this time, it can be clearly understood that we can have WHERE, HAVING or GROUP BY clauses in the queries which can be very helpful at times.
    Ex: "SELECT Number, Count(Name) FROM SampleFile1.CSV GROUP BY Number HAVING Count(Name) >= 1"
  5. We can filter the data based on the requirements once we read the whole data in the file. This can be done using the RowFilter/Filter property or DataView/RecordSet. As I said, once we read the data in to the Dataset/RecordSet, we can do all the operations that can be done with them. This includes filtering, sorting, etc.

Other File Formats

We discussed about reading a comma separated values (CSV) file. How does this approach handle the tab delimited files and fixed length files?
Everything lies in the connection string that we provide. Let us see the details of the connection string here.

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\FolderName;
Extended Properties="text;HDR=YES;FMT=Delimited

Provider – This represents the type of database. We use the OLEDB connection type.
Data Source – Folder is considered as database.
Extended Properties – These properties will define the way we want to read the file.

  • The first part of it defines the file type. In our case, though the file format can be classified as comma delimited, tab delimited or fixed length data, it is all simple text. If we are reading an Excel file, we use Excel x.x, where x.x is version number.
  • HDR (Header) – Used to specify whether the header is available or not. YES – First line of the input file is considered as header and the remaining lines are considered as data. NO – Considered as data from the first line.
  • FMT (Format) – Specify the formatting type. It can have the following values:
    Delimited File is considered as a comma delimited file. Comma is the default delimited character.
    Delimited(x) File is considered as a delimited file with delimited character ‘x’.
    TabDelimited File is considered as a tab delimited file.
    FixedLength Reads the data with the fixed length of the field specified. You have to specify the widths and types of the columns using properties like Col1, Col2, etc. More at MSDN.

If the format specified is 'Delimited', the default character is comma (,). It is stored in the registry. You can change it in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format.
As everybody says, do not mess up with the registry. Microsoft provided an alternative to this by providing us the Schema.ini file. If we are reading a FixedLength file, we must use the Schema.ini file. We specify the length of the fields in the schema file.

Reading from Multiple Files

If you have multiple files and data needs to be merged or filtered based on the common columns, we do the same way as we do in SQL. We can join tables and get the merged data. Remember that the output will be CROSS JOIN of the rows in both the files. Make sure you filter the data based on the common columns.

Examples:

SampleFile1.CSV – (EmpID, Name, Address)
SampleFile2.CSV – (EmpID, Salary, Month)

//Where clause is used to get ‘Natural Join’
string mySelectQuery = "SELECT * FROM SampleFile.CSV As Sample1, _
			SampleFile2.CSV As Sample2 " + _
                       	"Where Sample1.Number=Sample2.Number";

'Where clause is used to get ‘Natural Join’
objRecordset.Open "SELECT * FROM SampleFile.CSV As Sample1, _
		SampleFile2.CSV As Sample2 " & _ 
                  "Where Sample1.Number=Sample2.Number", _
                  objConnection, adOpenStatic, adLockOptimistic, adCmdText

Note that aliasing (SampleFile.CSV As Sample1) is required in join queries, especially when tables have columns with same names. In our case, table names (filenames) have dot (.) in between. So the query parser will be misguided by the dot in the table and may consider ‘CSV.Number’ as column name.

History

  • 9th June, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Pradeep KV
Software Developer (Senior)
United States United States
I did MS in Computers from BITS, Pilani, India. Have been working as a .NET developer for more than 5 years. I have been working on Web Applications for some major banks in United States. I like to upgrade myself with latest technologies in .NET and also like to work with them.

Comments and Discussions

 
QuestionA good job Pinmemberyushan_guo6-Nov-12 16:49 
GeneralMy vote of 5 Pinmemberyushan_guo6-Nov-12 16:48 
GeneralPulling Information From a .txt file and creating a .csv file PinmemberAllen Hook13-Jul-12 11:35 
GeneralRe: Pulling Information From a .txt file and creating a .csv file PinmemberPradeep KV13-Jul-12 11:52 
QuestionConfused JET engine PinmemberMHKhan14-Feb-12 19:38 
QuestionYou also dont mention that this is only applicable to the JET driver PinmemberMember 806065114-Dec-11 10:22 
QuestionValidating Header and Trailer records for Fixed Width file using VB script. Pinmembershakeel7313-Nov-11 8:50 
QuestionData missing While importing Text file (.txt) to sql server 2008 using ASP Pinmemberpriyaahh9-Aug-11 21:32 
Answeranswer from www.abobjects.com Pinmemberparvez1213-May-11 7:38 
GeneralRe: answer from www.abobjects.com PinmemberDaniel Vlasceanu25-May-11 22:04 
GeneralAll in one column PinmemberGerson Abdala20-Mar-11 18:24 
GeneralText File with tab delimited to print Pinmemberpriyaahh15-Mar-11 1:46 
GeneralRe: Text File with tab delimited to print PinmemberPradeep KV15-Mar-11 4:03 
GeneralRe: Text File with tab delimited to print Pinmemberpriyaahh15-Mar-11 19:25 
GeneralRe: Text File with tab delimited to print PinmemberPradeep KV16-Mar-11 3:11 
GeneralRe: Text File with tab delimited to print Pinmemberpriyaahh16-Mar-11 23:37 
GeneralPHP Pinmemberszmehta28-Oct-10 8:09 
GeneralRe: PHP PinmemberPradeep KV28-Oct-10 9:31 
GeneralMy vote of 4 PinmemberNJK28-Sep-10 23:38 
QuestionOnly getting the first column PinmemberMatthewRyan20-Sep-10 2:48 
AnswerRe: Only getting the first column PinmemberMatthewRyan23-Sep-10 3:20 
GeneralCustom SQL Perser Design PinmembertheAxE15-Jan-10 3:31 
QuestionCannot read text files using other than a comma delimiter PinmemberMember 420977024-Jun-09 3:42 
QuestionText File(Fixed-width) with extension different from TXT or CSV (say *.001 or *.012 etc) [modified] PinmemberMohammad Rahman4-May-09 8:22 
QuestionI have some error, please reply... Pinmemberbigdaddy98i9-Apr-09 14:52 
AnswerRe: I have some error, please reply... PinmemberPradeep KV9-Apr-09 18:43 
GeneralRe: I have some error, please reply... Pinmemberbigdaddy98i10-Apr-09 3:00 
Questioncould you implemment it in vc++?? Pinmembergenliu77721-Mar-09 6:43 
GeneralCould not find ISAM Pinmemberparthi intel5-Aug-08 13:34 
GeneralRe: Could not find ISAM PinmemberPradeep KV6-Aug-08 6:46 
GeneralRe: Could not find ISAM Pinmemberparthi intel6-Aug-08 11:07 
GeneralRe: Could not find ISAM PinmemberPradeep KV6-Aug-08 12:34 
GeneralRe: Could not find ISAM Pinmemberparthi intel6-Aug-08 13:27 
GeneralHiii Mr. Rajasthani Pinmember~Khatri Mitesh~18-Jul-08 21:40 
GeneralNeed ur hlp in reading this file and adding it to the tables. Pinmembernaveennk21-Jun-08 22:17 
GeneralRe: Need ur hlp in reading this file and adding it to the tables. PinmemberPradeep KV22-Jun-08 0:07 
GeneralExcellent Pinmemberdnpro10-Jun-08 2:05 
AnswerRe: Excellent PinmemberPradeep KV11-Jun-08 6:19 
GeneralGood.. Pinmembersuresh suthar10-Jun-08 1:29 
GeneralRe: Good.. PinmemberPradeep KV11-Jun-08 6:08 
QuestionOh, so this isn't code you wrote? PinmemberPIEBALDconsult9-Jun-08 19:32 
AnswerRe: Oh, so this isn't code you wrote? PinmemberPradeep KV9-Jun-08 20:09 
GeneralRe: Oh, so this isn't code you wrote? PinmemberPIEBALDconsult9-Jun-08 20:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141223.1 | Last Updated 9 Jun 2008
Article Copyright 2008 by Pradeep KV
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid