Click here to Skip to main content
15,867,686 members
Articles / Programming Languages / C#

Simple CSV Parser/Reader Function Written in C#

Rate me:
Please Sign up or sign in to vote.
2.55/5 (27 votes)
14 Oct 2008GPL32 min read 118.8K   1.3K   34   26
This function parses string read from CSV file and returns values in ArrayList object

Introduction

This is a simple parser function named CSVParser for parsing a CSV (comma separated values) file. It takes a string containing a single line in CSV file as input and returns ArrayList.

CSV follows the following rules:

  1. All values are separated by comma
  2. If comma is part of value, then enclose value in double quotes
    e.g. a,b,"12,000",c
  3. If double quote is part of value, then replace it with two double quotes and enclose value in double quotes
    e.g. a,b,"He said ""Hi""",c

Usage

C#
ArrayList alResult;
using (StreamReader objReader = new StreamReader(@"C:\Testfile.csv"))
{
    while ((strLineText = objReader.ReadLine()) != null)
    {
        alResult = CSVParser(strLineText);
        //do processing
    }
}

How Does It Work?

This function works based on Finite State Automata concept. Finite state automata has current state and an input. Based on transition table, it changes current state for an input and performs an action.

The state diagram is as follows:

Screenshot - CSVParser.jpg

The parser function maintains two objects, a string builder object (henceforth called as TEMPSTR) to temporarily store characters and an array list (henceforth called as ARRLIST).

INPUT

" (double quote) (Indicated by 0)
, (Comma) (Indicated by 1)
N (newline) (Indicated by 3)
O (character other than , " and N) (Indicated by 2)

TRANSITION TABLE
CUR_STATE>INPUT>NEXT STATE
0 > 0 > 2
0 > 1 > 0
0 > 2 > 1
0 > 3 > 5

1 > 0 > 6
1 > 1 > 0
1 > 2 > 1
1 > 3 > 5

2 > 0 > 4
2 > 1 > 3
2 > 2 > 3
2 > 3 > 6

3 > 0 > 4
3 > 1 > 3
3 > 2 > 3
3 > 3 > 6

4 > 0 > 2
4 > 1 > 8
4 > 2 > 6
4 > 3 > 7

5 > X > 5

6 > X > 6

7 > X > 5

8 > X > 0

(X = Any input)

The code is as follows:

The 9X4 aActionDecider array represents the above transition table. First dimension represents state (S0 to S8) while second dimension represents input character (0 to 3). The array gives the next state based on the current state and input character. For example, if the current state is 3 and next input character is quote (0), then the next state is aActionDecider[3][0] i.e. 4.

C#
private static ArrayList CSVParser(string strInputString)
{
    int intCounter = 0, intLenght;
    StringBuilder strElem = new StringBuilder();
    ArrayList alParsedCsv = new ArrayList();
    intLenght = strInputString.Length;
    strElem = strElem.Append("");
    int intCurrState = 0;
    int[][] aActionDecider = new int[9][];
    //Build the state array
    aActionDecider[0] = new int[4] { 2, 0, 1, 5 };
    aActionDecider[1] = new int[4] { 6, 0, 1, 5 };
    aActionDecider[2] = new int[4] { 4, 3, 3, 6 };
    aActionDecider[3] = new int[4] { 4, 3, 3, 6 };
    aActionDecider[4] = new int[4] { 2, 8, 6, 7 };
    aActionDecider[5] = new int[4] { 5, 5, 5, 5 };
    aActionDecider[6] = new int[4] { 6, 6, 6, 6 };
    aActionDecider[7] = new int[4] { 5, 5, 5, 5 };
    aActionDecider[8] = new int[4] { 0, 0, 0, 0 };
    for (intCounter = 0; intCounter < intLenght; intCounter++)
    {
        intCurrState = aActionDecider[intCurrState]
                                  [GetInputID(strInputString[intCounter])];
        //take the necessary action depending upon the state 
        PerformAction(ref intCurrState, strInputString[intCounter], 
                     ref strElem, ref alParsedCsv);
    }
    //End of line reached, hence input ID is 3
    intCurrState = aActionDecider[intCurrState][3];
    PerformAction(ref intCurrState, '\0', ref strElem, ref alParsedCsv); 
    return alParsedCsv;
}

private static int GetInputID(char chrInput)
{
    if (chrInput == '"')
    {
        return 0;
    }
    else if (chrInput == ',')
    {
        return 1;
    }
    else
    {
        return 2;
    }
}
private static void PerformAction(ref int intCurrState, char chrInputChar, 
                    ref StringBuilder strElem, ref ArrayList alParsedCsv)
{
    string strTemp = null;
    switch (intCurrState)
    {
    case 0:
        //Separate out value to array list
        strTemp = strElem.ToString();
        alParsedCsv.Add(strTemp);
        strElem = new StringBuilder();
        break;
    case 1:
    case 3:
    case 4:
        //accumulate the character
        strElem.Append(chrInputChar);
        break;
    case 5:
        //End of line reached. Separate out value to array list
        strTemp = strElem.ToString();
        alParsedCsv.Add(strTemp);
        break;
    case 6:
        //Erroneous input. Reject line.
        alParsedCsv.Clear();
        break;
    case 7:
        //wipe ending " and Separate out value to array list
        strElem.Remove(strElem.Length - 1, 1);
        strTemp = strElem.ToString();
        alParsedCsv.Add(strTemp);
        strElem = new StringBuilder();
        intCurrState = 5;
        break;
    case 8:
        //wipe ending " and Separate out value to array list
        strElem.Remove(strElem.Length - 1, 1);
        strTemp = strElem.ToString();
        alParsedCsv.Add(strTemp);
        strElem = new StringBuilder();
        //goto state 0
        intCurrState = 0;
        break;
    }
}

About the Demo

The demo program reads a CSV file having 4 columns and displays data in datagrid. The demo program is just to show how to use the parser function. It has no practical use.

Download the zip file, run the Windows application. Select "temp.csv" file in "data" folder. Click on parse. The datagrid shows content of the CSV file.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionOld thread but still valid Pin
fcm 19-Jun-14 5:02
fcm 19-Jun-14 5:02 
GeneralGood idea but wrong result Pin
Gophern Kwok22-Jan-11 14:25
Gophern Kwok22-Jan-11 14:25 
GeneralRe: Good idea but wrong result Pin
Daniel Neamtu6-Apr-12 7:10
Daniel Neamtu6-Apr-12 7:10 
GeneralMy vote of 4 Pin
Gophern Kwok22-Jan-11 14:23
Gophern Kwok22-Jan-11 14:23 
GeneralMy vote of 1 Pin
NTDLS23-Jul-10 6:14
NTDLS23-Jul-10 6:14 
GeneralMy vote of 1 Pin
Eugene.int23-Jul-10 5:59
Eugene.int23-Jul-10 5:59 
GeneralOMG! Pin
NTDLS22-Jul-10 11:11
NTDLS22-Jul-10 11:11 
GeneralRe: OMG! Pin
Tanathos17-Aug-11 20:39
Tanathos17-Aug-11 20:39 
Questionwhat if trapped in state 5 or 6? Pin
wasabie1-Oct-09 23:15
wasabie1-Oct-09 23:15 
GeneralAdded class wrapper and extension methods [modified] Pin
Chuck14118-Jun-09 14:35
Chuck14118-Jun-09 14:35 
GeneralSimple Excel version Pin
thailandmatt5-May-09 5:24
thailandmatt5-May-09 5:24 
GeneralRe: Simple Excel version Pin
Christopher Stratmann29-Oct-09 11:10
Christopher Stratmann29-Oct-09 11:10 
GeneralRe: Simple Excel version Pin
JamesHoward97220-Dec-11 22:40
JamesHoward97220-Dec-11 22:40 
GeneralGreat article! Pin
nospam196116-Apr-09 5:47
nospam196116-Apr-09 5:47 
Generalseems a bit of a long way to do it Pin
CARPETBURNER14-Oct-08 10:44
CARPETBURNER14-Oct-08 10:44 
QuestionQuestion Pin
Daniel Kamisnki3-Sep-08 3:06
Daniel Kamisnki3-Sep-08 3:06 
AnswerRe: Question Pin
Mandar Ranjit Date14-Oct-08 7:33
Mandar Ranjit Date14-Oct-08 7:33 
GeneralCool Pin
FredericSivignonPro8-Mar-07 1:43
FredericSivignonPro8-Mar-07 1:43 
GeneralRe: Cool Pin
lamsyi20-Nov-07 11:23
lamsyi20-Nov-07 11:23 
GeneralNince one Pin
yavor nenov4-Mar-07 22:11
yavor nenov4-Mar-07 22:11 
QuestionHow about escapes? Pin
PIEBALDconsult2-Mar-07 10:09
mvePIEBALDconsult2-Mar-07 10:09 
AnswerRe: How about escapes? [modified] Pin
Mandar Ranjit Date4-Mar-07 18:12
Mandar Ranjit Date4-Mar-07 18:12 
As per my understanding, csv follows following rules:-
1) All values are separated by comma
2) If comma is part of value, then enclose value in double quotes
e.g.
a,b,"12,000",c
3) If double quote is part of value, then replace it with two double quotes and enclose value in double quotes.
e.g.
a,b,"He said ""Hi""",c

The parser function takes care of all above rules. I wonder if \" is a way to escape double quote in csv. As per my understanding, rule 3 handles escaping of double quotes.

Copy example in point 3 above, paste it in notepad and save as .csv file. Open the file in excel you can see properly separated values.
If you repeat the same for "He said \"Hi\"" excel can't separate values.




-- modified at 22:38 Wednesday 7th March, 2007

Regards,
Mandar Date

GeneralRe: How about escapes? Pin
madwilliamflint27-Jun-07 5:10
madwilliamflint27-Jun-07 5:10 
GeneralRe: How about escapes? Pin
percyboy30-Mar-09 21:57
percyboy30-Mar-09 21:57 
GeneralFormatting Pin
Colin Angus Mackay2-Mar-07 4:24
Colin Angus Mackay2-Mar-07 4:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.