Click here to Skip to main content
15,885,141 members
Articles / Programming Languages / C#
Tip/Trick

C# - Light and Fast CSV Parser

Rate me:
Please Sign up or sign in to vote.
4.83/5 (14 votes)
27 Sep 2014CPOL 85.9K   37   19
Light yet functional CSV Parser with custom delimiters and qualifiers, yield returns records.

Introduction

Parsing CSV files may sound like an easy task, but in reality it is not that trivial. Below is a CsvParser class implementation that I use in my own projects. It supports the following features that I find critical:

  • Custom Delimiter and Qualifier characters
  • Supports quoting notation (allows delimiter character to be part of a value)
  • Supports quote escaping (allows quote character to be part of a value)
  • Supports both '\n' and '\r\n' line endings
  • Designed to return IEnumerable via yield return (no memory buffers)
  • Designed to return Header and the rest of lines separately (using Tuple)

Source Code

C#
public static class CsvParser
{
    private static Tuple<T, IEnumerable<T>> HeadAndTail<T>(this IEnumerable<T> source)
    {
        if (source == null)
            throw new ArgumentNullException("source");
        var en = source.GetEnumerator();
        en.MoveNext();
        return Tuple.Create(en.Current, EnumerateTail(en));
    }

    private static IEnumerable<T> EnumerateTail<T>(IEnumerator<T> en)
    {
        while (en.MoveNext()) yield return en.Current;
    }

    public static IEnumerable<IList<string>> 
           Parse(string content, char delimiter, char qualifier)
    {
        using (var reader = new StringReader(content))
            return Parse(reader, delimiter, qualifier);
    }

    public static Tuple<IList<string>, IEnumerable<IList<string>>> 
           ParseHeadAndTail(TextReader reader, char delimiter, char qualifier)
    {
        return HeadAndTail(Parse(reader, delimiter, qualifier));
    }

    public static IEnumerable<IList<string>> 
           Parse(TextReader reader, char delimiter, char qualifier)
    {
        var inQuote = false;
        var record = new List<string>();
        var sb = new StringBuilder();

        while (reader.Peek() != -1)
        {
            var readChar = (char) reader.Read();

            if (readChar == '\n' || (readChar == '\r' && (char) reader.Peek() == '\n'))
            {
                // If it's a \r\n combo consume the \n part and throw it away.
                if (readChar == '\r')
                    reader.Read();

                if (inQuote)
                {
                    if (readChar == '\r')
                        sb.Append('\r');
                    sb.Append('\n');
                }
                else
                {
                    if (record.Count > 0 || sb.Length > 0)
                    {
                        record.Add(sb.ToString());
                        sb.Clear();
                    }

                    if (record.Count > 0)
                        yield return record;

                    record = new List<string>(record.Count);
                }
            }
            else if (sb.Length == 0 && !inQuote)
            {
                if (readChar == qualifier)
                    inQuote = true;
                else if (readChar == delimiter)
                {
                    record.Add(sb.ToString());
                    sb.Clear();
                }
                else if (char.IsWhiteSpace(readChar))
                {
                    // Ignore leading whitespace
                }
                else
                    sb.Append(readChar);
            }
            else if (readChar == delimiter)
            {
                if (inQuote)
                    sb.Append(delimiter);
                else
                {
                    record.Add(sb.ToString());
                    sb.Clear();
                }
            }
            else if (readChar == qualifier)
            {
                if (inQuote)
                {
                    if ((char) reader.Peek() == qualifier)
                    {
                        reader.Read();
                        sb.Append(qualifier);
                    }
                    else
                        inQuote = false;
                }
                else
                    sb.Append(readChar);
            }
            else
                sb.Append(readChar);
        }

        if (record.Count > 0 || sb.Length > 0)
            record.Add(sb.ToString());

        if (record.Count > 0)
            yield return record;
    }
}

Using the Code

Here is an example of reading CSV file. The following code snippet parses out the first 5 records and prints them out to the Console in form of key/value pairs:

C#
const string fileName = @"C:\Temp\file.csv";
using (var stream = File.OpenRead(fileName))
using (var reader = new StreamReader(stream))
{
    var data = CsvParser.ParseHeadAndTail(reader, ',', '"');

    var header = data.Item1;
    var lines = data.Item2;

    foreach (var line in lines.Take(5))
    {
        for (var i = 0; i < header.Count; i++)
            if (!string.IsNullOrEmpty(line[i]))
                Console.WriteLine("{0}={1}", header[i], line[i]);
        Console.WriteLine();
    }
}
Console.ReadLine();

History

  • 27th September, 2014: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionAbout the fields in each line Pin
Ing. Cristian Marucci19-Oct-18 2:18
professionalIng. Cristian Marucci19-Oct-18 2:18 
QuestionDo not ignore leading whitespace at all Pin
seblon26-May-18 6:20
seblon26-May-18 6:20 
QuestionYikes Pin
Jeremy Stafford 14-May-17 15:52
Jeremy Stafford 14-May-17 15:52 
AnswerRe: Yikes Pin
Yuriy Magurdumov5-May-17 5:22
Yuriy Magurdumov5-May-17 5:22 
GeneralRe: Yikes Pin
Jeremy Stafford 15-May-17 7:01
Jeremy Stafford 15-May-17 7:01 
GeneralRe: Yikes Pin
Yuriy Magurdumov5-May-17 8:51
Yuriy Magurdumov5-May-17 8:51 
GeneralRe: Yikes Pin
Jeremy Stafford 15-May-17 11:07
Jeremy Stafford 15-May-17 11:07 
QuestionCR only line breaks Pin
ssdred15-Feb-17 8:22
ssdred15-Feb-17 8:22 
QuestionShort Question Pin
david123@codeproject4-Sep-15 2:59
david123@codeproject4-Sep-15 2:59 
AnswerRe: Short Question Pin
Yuriy Magurdumov4-Sep-15 3:55
Yuriy Magurdumov4-Sep-15 3:55 
QuestionYuramag ! Very good article! Pin
Volynsky Alex29-Sep-14 12:19
professionalVolynsky Alex29-Sep-14 12:19 
QuestionVery nice. Just a few suggestions Pin
irneb29-Sep-14 4:15
irneb29-Sep-14 4:15 
AnswerRe: Very nice. Just a few suggestions Pin
Yuriy Magurdumov30-Sep-14 4:44
Yuriy Magurdumov30-Sep-14 4:44 
GeneralRe: Very nice. Just a few suggestions Pin
irneb30-Sep-14 19:25
irneb30-Sep-14 19:25 
GeneralRe: Very nice. Just a few suggestions Pin
irneb30-Sep-14 19:58
irneb30-Sep-14 19:58 
GeneralRe: Very nice. Just a few suggestions Pin
Yuriy Magurdumov1-Oct-14 4:11
Yuriy Magurdumov1-Oct-14 4:11 
GeneralRe: Very nice. Just a few suggestions Pin
PIEBALDconsult30-Sep-14 4:48
mvePIEBALDconsult30-Sep-14 4:48 
" Clear is a convenience method that is equivalent to setting the Length property of the current instance to 0 (zero).
"

And it was added in .net 4; setting the Length works on all versions. Hence, don't use Clear.
GeneralRe: Very nice. Just a few suggestions Pin
irneb30-Sep-14 19:17
irneb30-Sep-14 19:17 
GeneralRe: Very nice. Just a few suggestions Pin
PIEBALDconsult1-Oct-14 4:42
mvePIEBALDconsult1-Oct-14 4:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.