|
|||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
Introduction
We come across many situations where we need to read the data and process it from the text files (.txt, .CSV, .tab). The most common way is to use the StreamReader (.NET) / FileSystemObject (VB 6.0) and read the file line by line.
We can list out many such disadvantages. All the advantages of using the database table querying can be counted for the disadvantages of reading line by line. Read the data
It is as simple as connecting to a database and querying the data from the database table. In this case, we consider the text file as table and the containing folder as database.
C# code DataSet myData = new DataSet();
string myXML;
string strFilePath = "C:\\";
string mySelectQuery = "SELECT * FROM SampleFile.CSV";
OleDbConnection myConnection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\\;" +
"Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
OleDbDataAdapter dsCmd = new OleDbDataAdapter(mySelectQuery, myConnection);
//Fill the DataSet object
dsCmd.Fill(myData, "CustomerOwners");
//Create a XML document with the table data
myData.WriteXml("D:\\TestXML.xml");
myConnection.Close();
VB 6.0 code 'Set the database connection
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & strFilePath & ";" & _
"Extended Properties=""text;HDR=YES;FMT=Delimited"""
'Query from the file assuming it as a database table
objRecordset.Open "SELECT * FROM " & fileName, _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
'Loop through the records and insert in to the table
Do Until objRecordset.EOF
Debug.Print objRecordset.Fields.Item("Number")
Debug.Print objRecordset.Fields.Item("Name")
objRecordset.MoveNext
Loop
We read the text file as a database table and uploaded in the database. Let us see the advantages with this approach
Other file formats
We discussed about reading a comma separated values (CSV) file. How does this approach handle the tab delimited files and fixed length files?
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\FolderName;Extended Properties="text;HDR=YES;FMT=Delimited
Provider – This represents the type of database. We use the OLEDB connection type.
If the format specified is 'Delimited', the default character is comma (,) It is stored in the registry. You can change it in the registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text\Format. Reading from multiple files
If you have multiple files and data needs to be merged or filtered based on the common columns, we do the same way as we do in SQL. We can join tables and get the merged data. Remember that the output will be CROSS JOIN of the rows in both the files. Make sure you filter the data based on the common columns.
//Where clause is used to get ‘Natural Join’
string mySelectQuery = "SELECT * FROM SampleFile.CSV As Sample1, SampleFile2.CSV As Sample2 " +
"Where Sample1.Number=Sample2.Number";
'Where clause is used to get ‘Natural Join’
objRecordset.Open "SELECT * FROM SampleFile.CSV As Sample1, SampleFile2.CSV As Sample2 " & _
"Where Sample1.Number=Sample2.Number", _
objConnection, adOpenStatic, adLockOptimistic, adCmdText
Note that aliasing (SampleFile.CSV As Sample1) is required in join queries, especially when tables have columns with same names. In our case, table names (filenames) have dot (.) in between. So the query parser will be misguided by the dot in the table and may consider ‘CSV.Number’ as column name.
|
||||||||||||||||||||||||||||||