Click here to Skip to main content
15,881,898 members
Articles / Programming Languages / C# 4.0
Tip/Trick

How to Parse Delimited Files Having the Delimiter Symbol as Part of Data Too

Rate me:
Please Sign up or sign in to vote.
4.20/5 (5 votes)
13 Jan 2016CPOL3 min read 18.1K   5   7
Parse delimited files having the delimiter being part of actual data, apart from acting as delimiter

Introduction

In this tip, we will see how to parse delimited files, which may have the delimiter as valid data.

Background

In my recent project, I had to parse data from pipe(|) delimited files and insert that into database table. The code was completed, deployed and everything was fine until one day an exception was thrown saying the number of columns in the table does not match with number of columns we try to insert (not the exact error message). During analysis, I found that there was data in the file which had pipe(|) as valid data, which is supposed to be considered as input for a single column, but the file parser logic split that data, which is:

Do you want to know how I solved this issue and parsed the file? Read on.

Using the Code

Before jumping to the code, let's learn about the class that I used for parsing and some of the important methods and properties of it.

The class that is used for parsing is TextFieldParser. This class is found Microsoft.VisualBasic.FileIO.

  • SetDelimiters(params string[] delimiters) - This method sets the delimiter for the file.
  • LineNumber - This property returns the current line number in the file
  • TrimWhiteSpace - This property if set to true trims leading and trailing white spaces
  • ReadFields() - This method reads all fields on the current line, returns them as an array of strings, and advances the cursor to the next line containing data.
  • EndOfData - This property indicates the end of the file.

For the purpose of this tip, I am not going to insert data into DB. Instead, I will be displaying the parsed data in a datagrid view.

Let us first see a working example, then a file causing the exception and see how to solve the exception.

Working Example

Here is my sample file data:

C#
ID|FirstName|LastName|Phone|Address
1|John|John|951425|"Hollywood"
2|Eli|Eli|725486|"New York"

Here is the code for parsing and displaying the data from the file in a datagrid view.

C#
private void btnParse_Click(object sender, EventArgs e)
{
    try
    {
        string[] delimiter = new string[] {"|"};
        DataTable fileData = new DataTable();
        using (TextFieldParser pipeParser = new TextFieldParser(@"<filepath with file name>"))
        {
            pipeParser.SetDelimiters(delimiter);
            string[] columnHeaders = pipeParser.ReadFields();
            foreach (string column in columnHeaders)
            {
                if (!string.IsNullOrEmpty(column))
                {
                    DataColumn datecolumn = new DataColumn(column);
                    datecolumn.AllowDBNull = true;
                    fileData.Columns.Add(datecolumn);
                }
            }
            while (!pipeParser.EndOfData)
            {
                string[] dataFields = pipeParser.ReadFields();
                fileData.Rows.Add(dataFields);
            }
        }
        dataGridView1.DataSource = fileData;
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
}

Initial screen when the application is run.

Image 1

Click the "Parse File and Display Data" button to fire the event. Now we can see the data displayed in the data grid view.

Image 2

Exception Causing Scenario Example

Now, I am adding a third row to the input file.

C#
ID|FirstName|LastName|Phone|Address
1|John|John|951425|"Hollywood"
2|Eli|Eli|725486|"New York"
3|Tom|Tom|235046|"123 street|Downtown|LA"

You can see that the address column of the third row has pipes(|), which is the actual delimiter. Let's see how the code works now. Run the application and click the button to parse and load data. The application throws the following exception.

Image 3

Surprisingly, the code that worked before, now fails after adding the third row. Let us debug and see what is causing the exception.

Image 4

In the above screenshot, we can see that the address column has been parsed into 3 different columns instead of a single column, thus causing the exception.

Now the question is how to properly parse this column. Before addressing how to parse, we have to figure out how to distinguish this column from the rest of the columns. There is a hint in the file.

If we look at the Address column closely, we can find that the data for this column alone is enclosed within double quotes (""). This is what will help us distinguish it from other columns and parse properly.

Solution for the Issue

There is a property called HasFieldsEnclosedInQuotes, in TextFieldParser class, which provides the ability to parse the fields within double quotes properly, when set to TRUE.

  • HasFieldsEnclosedInQuotes - This property denotes whether fields are enclosed in quotation marks.

Add this line of code to the button click handler.

C#
pipeParser.HasFieldsEnclosedInQuotes = true;

Now the code will look like this (highlighted the line added).

C#
private void btnParse_Click(object sender, EventArgs e)
{
    try
    {
        string[] delimiter = new string[] {"|"};
        DataTable fileData = new DataTable();
        using (TextFieldParser pipeParser = new TextFieldParser(@"<filepath with file name>"))
        {
            pipeParser.SetDelimiters(delimiter);
        pipeParser.HasFieldsEnclosedInQuotes = true;
            string[] columnHeaders = pipeParser.ReadFields();
            foreach (string column in columnHeaders)
            {
                if (!string.IsNullOrEmpty(column))
                {
                    DataColumn datecolumn = new DataColumn(column);
                    datecolumn.AllowDBNull = true;
                    fileData.Columns.Add(datecolumn);
                }
            }
            while (!pipeParser.EndOfData)
            {
                string[] dataFields = pipeParser.ReadFields();
                fileData.Rows.Add(dataFields);
            }
        }
        dataGridView1.DataSource = fileData;
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
}

Just build and run the application. Click the button to parse and display data. Now you can the application is able to parse the data successfully.

Image 5

Here is a screenshot of parsed data from debug mode.

Image 6

Hope this helps you. If you have any other ideas or feedback, do let me know in the comments section.

Reference

History

  • 01/14/2016

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionWhat about asking user for verification when a suspect line of data is found? Pin
AORD15-Jan-16 7:59
AORD15-Jan-16 7:59 
GeneralThoughts Pin
PIEBALDconsult14-Jan-16 13:15
mvePIEBALDconsult14-Jan-16 13:15 
GeneralRe: Thoughts Pin
Mathi Mani14-Jan-16 14:43
Mathi Mani14-Jan-16 14:43 
GeneralRe: Thoughts Pin
PIEBALDconsult14-Jan-16 14:50
mvePIEBALDconsult14-Jan-16 14:50 
Questiongood Pin
Garth J Lancaster14-Jan-16 13:01
professionalGarth J Lancaster14-Jan-16 13:01 
PraiseRe: good Pin
PIEBALDconsult14-Jan-16 13:30
mvePIEBALDconsult14-Jan-16 13:30 
GeneralRe: good Pin
Garth J Lancaster14-Jan-16 13:42
professionalGarth J Lancaster14-Jan-16 13:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.