Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Parsing CSV embedded in double quotes - RFC 4180

0.00/5 (No votes)
3 May 2007 1  
Simple ways to parse CSV documents when values are embedded in double quotes

Introduction

This is how CSV can be parsed when they are embedded in double quotes.

Background

The article in based on RFC 4180 which states that the standered CSV may contain values seperated by COMMA and embeddedin DOUBLEQUOTE.

Take a look at the RFC >> http://tools.ietf.org/html/rfc4180

Using the Code

Blocks of code should be set as style "formatted" like this:

// Code

/// <summary>

/// RFC 4180 Comma Seperated Values

/// http://tools.ietf.org/html/rfc4180

/// </summary>

/// <param name="line">Single line from CSV file</param>

public string[] SplitOnDoubleQuotes(string line)
{
    int i = 0;
    ArrayList occurs = new ArrayList();
    if (line.IndexOf('\n') <= 0)
        line = line + '\n';
    while (true)
    {
        if (line[i] == '\n')
            break;
        if (line[i] == ',')
            if (line[i-1] == '"' && line[i+1] == '"')
                occurs.Add(i);

        i++;
    }
    line= line.Remove(line.Length - 1);
    ArrayList tokens = new ArrayList();
    int startIdx =0;
    int endIdx;
    int len ;
    for(int t =0; t<= occurs.Count ; t++)
    {
        if (t != occurs.Count)
        {
            len = (int)occurs[t] - startIdx;
            tokens.Add((line.Substring(startIdx, len).StartsWith(
                ",") == true) ? line.Substring(startIdx, len).Remove(
                0,1):line.Substring(startIdx, len) );
            startIdx = (int)occurs[t];
        }
        else
        {
            tokens.Add((line.Substring(startIdx).StartsWith(",") == true) ?
                line.Substring(startIdx).Remove(0, 1) : line.Substring(
                startIdx));
        }
    }
    i=0;
    for (i = 0; i < tokens.Count;i++ )
    {
        string str = tokens[i].ToString();
        if (str.StartsWith("\"", StringComparison.Ordinal))
            str= str.Remove(0, 1);
        if (str.EndsWith("\""))
            str= str.Remove(str.Length - 1);
        tokens[i] = str;

    }
    return (string[])tokens.ToArray(typeof(string));
}

Points of Interest

There are many possible ways to it, and this is just one of it.

Here are the possible ways:

  1. Define a Finite State Automata to parse the line character by character.
  2. Using our String.Replace("\",\"",SpecialChar), replace the pattern with a special character value which is not possible to occur in you normal set of values in the csv file.

I hope this code might have been useful for beginners.

-Vaibhav Gaikwad

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here