|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThe AnyDataFileToXmlConverter utility is a Windows application that reads and converts data-files of various formats like Access databases, Excel spreadsheets, CSV files, tabbed-delimited files, etc., to the standard BackgroundA few months ago, I had a project that required data-collection, validation, and processing of data-files from a wide variety of formats. Due to the frequency of how the validation/processing of the data-files would be changing over time, and the need for standardization of the processing format, I decided to use the .NET Using the ApplicationThe AnyDataFileToXmlConverter utility is quick and easy to use - just open/load the file in the utility and the file will be automatically processed to the .NET XML format. For Excel spreadsheets, the worksheets in the workbook are enumerated, and a worksheet must be selected for processing to XML. In Microsoft Access databases, a query for the specific data to convert to XML must be specified before processing. Several sample data-files are provided with the utility, which demonstrate the various common data-file formats that can be re-processed to XML. There are additional file-formats that can be processed (like pipe-delimited files), and the class can be easily modified to handle additional formats also. XML files can be loaded into the utility, but no processing is performed since they're already in the end-result format. Also, there is an optional data-cleanup function that automatically removes any "junk" XML-nodes that are commonly created as a result of processing an Excel spreadsheet that contains empty or cleared rows at the end of a spreadsheet. You can also change the output/results display from XML to a grid, for easier viewing and sorting. Data-File Processing ExamplesComma-Separated File
Tab-Delimited File
Excel Spreadsheet
Microsoft Access Database
The AnyDataFileToXmlConverter Engine - How It WorksThe main engine/processor of the AnyDataFileToXmlConverter utility is the For text files, the file contents are evaluated to find the likely character delimiter between the data columns, and then each line of the file is split into columns according to the character delimiter. A /// <summary>
/// Converts the specified TEXT file to it's equivalent XmlDocument
/// </summary>
/// <PARAM name="sFilePath"></PARAM>
/// <returns></returns>
private static XmlDocument ConvertTextFile(string sFilePath)
{
XmlDocument xmlRaw = null;
StreamReader oSR = null;
try
{
DataSet dsTextFile = new DataSet();
DataTable dtTextFile = new DataTable();
DataRow drRows = null;
// check and pre-process the text file if it's a non-standard text file
sFilePath = PreprocessNonStandardFiles(sFilePath);
// find the correct delimiter for the file
// (some files have multiple delimiting chars, but only one is correct)
char chrDelimiter = GetDelimiterCharacter(sFilePath);
//Open the file and go to the top of the file
oSR = new StreamReader(sFilePath);
oSR.BaseStream.Seek(0, SeekOrigin.Begin);
// read the first line
string sFirstLine = oSR.ReadLine();
bool bHeaderIsDataRow = false;
// init the columns if the file has a valid, parsible header
string[] sColumns = sFirstLine.Split(chrDelimiter);
if(sColumns.Length > MINIMUM_NUMBER_CSV_COLUMNS)
{
bHeaderIsDataRow = InitializeTableColumns(sColumns,
ref dtTextFile, true);
if (bHeaderIsDataRow == true)
{
oSR.BaseStream.Seek(0, SeekOrigin.Begin);
oSR.Close();
oSR = new StreamReader(sFilePath);
oSR.BaseStream.Seek(0, SeekOrigin.Begin);
}
}
// add in the Rows for the datatable/file
dsTextFile.DataSetName = "NewDataSet";
dsTextFile.Tables.Clear();
dtTextFile.TableName = "Table";
dsTextFile.Tables.Add(dtTextFile);
// iterate thru the file and process each line
while (oSR.Peek() > -1)
{
int iFieldIndex = 0;
string sLine = oSR.ReadLine();
string sLineTrimmed = sLine.Trim();
string[] sLineFields = sLine.Split(chrDelimiter);
if ((sLineFields.Length <= 0) ||
(sLineTrimmed.Length < MINIMUM_NUMBER_CSV_COLUMNS))
{
continue;
}
// if the number of fields is less that the minimum, skip the field
if (sLineFields.Length <= MINIMUM_NUMBER_CSV_COLUMNS)
{
continue;
}
// if we suddenly have more fields than columns,
// we're in a header or something, so re-init the columns
if ((sLineFields.Length > dtTextFile.Columns.Count) &&
(sLineFields.Length > MINIMUM_NUMBER_CSV_COLUMNS))
{
//note: bad data?! - header/inconsistent delimiting problems?
if (dtTextFile.Rows.Count <= 0)
{
InitializeTableColumns(sLineFields, ref dtTextFile, false);
}
}
drRows = dtTextFile.NewRow();
foreach (string strField in sLineFields)
{
string sField = strField.Trim();
sField = sField.Replace("\"", "");
sField = sField.Replace("'", "");
sField = sField.Replace("$", "");
sField = sField.Replace("%", "");
sField = sField.Replace("-0-", "0");
sField = sField.Replace("&", "and");
// header/inconsistent file delimiting problems?
if (dtTextFile.Columns.Count <= iFieldIndex)
break;
drRows[iFieldIndex] = sField;
iFieldIndex = iFieldIndex + 1;
}
dtTextFile.Rows.Add(drRows);
}
// load the dataset to an xmldocument
xmlRaw = CleanupRawXml(dsTextFile.GetXml());
}
catch (Exception ex)
{
throw new Exception("Error: ConvertTextFile", ex);
}
finally
{
oSR.Close();
}
return xmlRaw;
}
ConclusionI hope you find this article and utility useful - it has come in quite handy a number of times, and the core class was recently incorporated into a generic
|
||||||||||||||||||||||