Click here to Skip to main content
Click here to Skip to main content

CSV Parser (C#)

, 13 Mar 2014
Rate this:
Please Sign up or sign in to vote.
Simple implementation of a parser of comma-separated values (CSV) files in C#

Introduction

This is a very simple implementation of CSV files parser written in C#, which successfully parses files, saved in MS Excel. You can write your own in two-three hours or download from here. As CSV files have very simple structure, this implementation can be used as a starting point for creating more complex parsers.

Background

Comma-separated values format is used to represent tabular data in text format. Usually comma is used as a delimiter between individual values in a row, and a new line (CR+LF) is used to separate rows. If a value includes a special character, it must be quoted with a double quote character (").

State transition table for the algorithm implemented in this demo is below. It incorporates only five states, and four classes of input characters.

Any characterComma (,)Quote (")EOL

0 LineStart

2C1V30L

1 ValueStart

2C 1V 3 0VL

2 Value

2C 1V2C0VL

3 QuotedValue

3C 3C43C
4 Quote 3C (?) 1V 3C 0VL

Footnotes:

C - add character to the current value
V - add current value to the current line
L - finish parsing current line

(?) - represents an error in the source sequence. Decision has been made not to throw an exception, but ignore a single quote character inside a quote value instead.

Using the code

Each state in the table above is represented by a class. Each character class is represented by a method. All state classes derive from the common class named ParserState, declared as follows.

        private abstract class ParserState
        {
            public static readonly LineStartState LineStartState = new LineStartState();
            public static readonly ValueStartState ValueStartState = new ValueStartState();
            public static readonly ValueState ValueState = new ValueState();
            public static readonly QuotedValueState QuotedValueState = new QuotedValueState();
            public static readonly QuoteState QuoteState = new QuoteState();

            public abstract ParserState AnyChar(char ch, ParserContext context);
            public abstract ParserState Comma(ParserContext context);
            public abstract ParserState Quote(ParserContext context);
            public abstract ParserState EndOfLine(ParserContext context);
        }

A flyweight pattern is utilized to reuse state classes instances. Thus, instead of instantiating a state object on every transition, common state is extracted into a separate class, ParserContext, instance of which is passed in every transition method. Transition is facilitated by returning a new state by each method. Implementations of ParserState are straightforward and only dub the rules described above.

Below is the implementation of ParserContext class.

        private class ParserContext
        {
            private readonly StringBuilder _currentValue = new StringBuilder();
            private readonly List<string[]> _lines = new List<string[]>();
            private readonly List<string> _currentLine = new List<string>();

            public void AddChar(char ch)
            {
                _currentValue.Append(ch);
            }

            public void AddValue()
            {
                _currentLine.Add(_currentValue.ToString());
                _currentValue.Remove(0, _currentValue.Length);
            }

            public void AddLine()
            {
                _lines.Add(_currentLine.ToArray());
                _currentLine.Clear();
            }

            public List<string[]> GetAllLines()
            {
                if (_currentValue.Length > 0)
                {
                    AddValue();
                }
                if (_currentLine.Count > 0)
                {
                    AddLine();
                }
                return _lines;
            }
        }

Basically, ParserContext implements the "footnotes" above: it is used to add a character, a value, a line to results, and it also provides the results at the end.

The parser itself accepts an instance of class TextReader, which can provide access to a file, an in-memory string, or any other stream of text data, and returns an array of arrays of strings. It instantiates ParserContext, and starts parsing from the LineStartState. Here is the main method:

        public string[][] Parse(TextReader reader)
        {
            var context = new ParserContext();

            ParserState currentState = ParserState.LineStartState;
            string next;
            while ((next = reader.ReadLine()) != null)
            {
                foreach (char ch in next)
                {
                    switch (ch)
                    {
                        case CommaCharacter:
                            currentState = currentState.Comma(context);
                            break;
                        case QuoteCharacter:
                            currentState = currentState.Quote(context);
                            break;
                        default:
                            currentState = currentState.AnyChar(ch, context);
                            break;
                    }
                }
                currentState = currentState.EndOfLine(context);
            }
            List<string[]> allLines = context.GetAllLines();
            return allLines.ToArray();
        }

Note: all other classes were made nested in the CsvParser for brevity and simplicity.

Other files available for download include unit tests and an extended implementation of the parser, which supports additional options: reading only a certain number of columns and trimming trailing empty lines (useful for parsing CSV files saved in Excel with a long invisible column on the right).

Points of Interest

This demo shows use of design patterns State and Flyweight.


License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

ideafixxxer
Software Developer (Senior) EPAM Systems
Canada Canada
No Biography provided

Comments and Discussions

 
SuggestionNuget Package PinmemberSidharth Balakrishnan8-Jul-14 3:40 
GeneralMy vote of 5 PinmemberIlya Moroz8-Jul-14 3:11 
GeneralNice article PinmemberMember 1005582025-Mar-14 1:57 
QuestionCould you provide a download version in form of csharp project? Pinmemberleiyangge12-Mar-14 5:29 
AnswerRe: Could you provide a download version in form of csharp project? Pinpremiumideafixxxer13-Mar-14 5:40 
GeneralRe: Could you provide a download version in form of csharp project? Pinmemberleiyangge13-Mar-14 15:22 
QuestionVery good article about a CSV file parser, which successfully parses files, saved in Excel. PinpremiumVolynsky Alex12-Mar-14 2:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 13 Mar 2014
Article Copyright 2014 by ideafixxxer
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid