Click here to Skip to main content
15,880,469 members
Articles / Programming Languages / C#
Tip/Trick

CSV Parser (C#)

Rate me:
Please Sign up or sign in to vote.
4.88/5 (16 votes)
13 Mar 2014CPOL2 min read 79.7K   3.4K   26   12
Simple implementation of a parser of comma-separated values (CSV) files in C#

Introduction

This is a very simple implementation of CSV files parser written in C#, which successfully parses files, saved in MS Excel. You can write your own in two-three hours or download from here. As CSV files have very simple structure, this implementation can be used as a starting point for creating more complex parsers.

Background

Comma-separated values format is used to represent tabular data in text format. Usually comma is used as a delimiter between individual values in a row, and a new line (CR+LF) is used to separate rows. If a value includes a special character, it must be quoted with a double quote character (").

State transition table for the algorithm implemented in this demo is below. It incorporates only five states, and four classes of input characters.

Any characterComma (,)Quote (")EOL

0 LineStart

2C1V30L

1 ValueStart

2C 1V 3 0VL

2 Value

2C 1V2C0VL

3 QuotedValue

3C 3C43C
4 Quote 3C (?) 1V 3C 0VL

Footnotes:

C - add character to the current value
V - add current value to the current line
L - finish parsing current line

(?) - represents an error in the source sequence. Decision has been made not to throw an exception, but ignore a single quote character inside a quote value instead.

Using the code

Each state in the table above is represented by a class. Each character class is represented by a method. All state classes derive from the common class named ParserState, declared as follows.

C#
private abstract class ParserState
{
    public static readonly LineStartState LineStartState = new LineStartState();
    public static readonly ValueStartState ValueStartState = new ValueStartState();
    public static readonly ValueState ValueState = new ValueState();
    public static readonly QuotedValueState QuotedValueState = new QuotedValueState();
    public static readonly QuoteState QuoteState = new QuoteState();

    public abstract ParserState AnyChar(char ch, ParserContext context);
    public abstract ParserState Comma(ParserContext context);
    public abstract ParserState Quote(ParserContext context);
    public abstract ParserState EndOfLine(ParserContext context);
}

A flyweight pattern is utilized to reuse state classes instances. Thus, instead of instantiating a state object on every transition, common state is extracted into a separate class, ParserContext, instance of which is passed in every transition method. Transition is facilitated by returning a new state by each method. Implementations of ParserState are straightforward and only dub the rules described above.

Below is the implementation of ParserContext class.

private class ParserContext
{
    private readonly StringBuilder _currentValue = new StringBuilder();
    private readonly List<string[]> _lines = new List<string[]>();
    private readonly List<string> _currentLine = new List<string>();

    public void AddChar(char ch)
    {
        _currentValue.Append(ch);
    }

    public void AddValue()
    {
        _currentLine.Add(_currentValue.ToString());
        _currentValue.Remove(0, _currentValue.Length);
    }

    public void AddLine()
    {
        _lines.Add(_currentLine.ToArray());
        _currentLine.Clear();
    }

    public List<string[]> GetAllLines()
    {
        if (_currentValue.Length > 0)
        {
            AddValue();
        }
        if (_currentLine.Count > 0)
        {
            AddLine();
        }
        return _lines;
    }
}

Basically, ParserContext implements the "footnotes" above: it is used to add a character, a value, a line to results, and it also provides the results at the end.

The parser itself accepts an instance of class TextReader, which can provide access to a file, an in-memory string, or any other stream of text data, and returns an array of arrays of strings. It instantiates ParserContext, and starts parsing from the LineStartState. Here is the main method:

C#
public string[][] Parse(TextReader reader)
{
    var context = new ParserContext();

    ParserState currentState = ParserState.LineStartState;
    string next;
    while ((next = reader.ReadLine()) != null)
    {
        foreach (char ch in next)
        {
            switch (ch)
            {
                case CommaCharacter:
                    currentState = currentState.Comma(context);
                    break;
                case QuoteCharacter:
                    currentState = currentState.Quote(context);
                    break;
                default:
                    currentState = currentState.AnyChar(ch, context);
                    break;
            }
        }
        currentState = currentState.EndOfLine(context);
    }
    List<string[]> allLines = context.GetAllLines();
    return allLines.ToArray();
}

Note: all other classes were made nested in the CsvParser for brevity and simplicity.

Other files available for download include unit tests and an extended implementation of the parser, which supports additional options: reading only a certain number of columns and trimming trailing empty lines (useful for parsing CSV files saved in Excel with a long invisible column on the right).

Points of Interest

This demo shows use of design patterns State and Flyweight.


License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Team Leader
Canada Canada
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
PraiseSaved Me Two to Three Hours Pin
Kevin Li (Li, Ken-un)26-May-16 4:33
Kevin Li (Li, Ken-un)26-May-16 4:33 
QuestionMIT License Pin
ChrisboGregson4-Apr-15 11:04
ChrisboGregson4-Apr-15 11:04 
AnswerRe: MIT License Pin
ideafixxxer1-Dec-15 5:10
ideafixxxer1-Dec-15 5:10 
QuestionOne of the test fails Pin
gilthans3-Jan-15 4:36
gilthans3-Jan-15 4:36 
AnswerRe: One of the test fails Pin
ideafixxxer5-Jan-15 4:49
ideafixxxer5-Jan-15 4:49 
SuggestionNuget Package Pin
Sidharth Balakrishnan8-Jul-14 3:40
Sidharth Balakrishnan8-Jul-14 3:40 
GeneralMy vote of 5 Pin
Ilya Moroz8-Jul-14 3:11
Ilya Moroz8-Jul-14 3:11 
nice example
GeneralNice article Pin
dusanedhiraj25-Mar-14 1:57
dusanedhiraj25-Mar-14 1:57 
QuestionCould you provide a download version in form of csharp project? Pin
leiyangge12-Mar-14 5:29
leiyangge12-Mar-14 5:29 
AnswerRe: Could you provide a download version in form of csharp project? Pin
ideafixxxer13-Mar-14 5:40
ideafixxxer13-Mar-14 5:40 
GeneralRe: Could you provide a download version in form of csharp project? Pin
leiyangge13-Mar-14 15:22
leiyangge13-Mar-14 15:22 
QuestionVery good article about a CSV file parser, which successfully parses files, saved in Excel. Pin
Volynsky Alex12-Mar-14 2:31
professionalVolynsky Alex12-Mar-14 2:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.