Click here to Skip to main content
Licence CDDL
First Posted 3 May 2007
Views 13,715
Bookmarked 5 times

Parsing CSV embedded in double quotes - RFC 4180

By | 3 May 2007 | Article
Simple ways to parse CSV documents when values are embedded in double quotes.

Introduction

This is how CSV can be parsed when it is embedded in double quotes.

Background

The article in based on RFC 4180, which states that the standard CSV may contain values separated by comma and embedded in double quotes.

Take a look at the RFC here: http://tools.ietf.org/html/rfc4180.

The Code

// Code
/// <summary>
/// RFC 4180 Comma Seperated Values
/// http://tools.ietf.org/html/rfc4180
/// </summary>
/// <param name="line">Single line from CSV file</param>
public string[] SplitOnDoubleQuotes(string line)
{
    int i = 0;
    ArrayList occurs = new ArrayList();
    if (line.IndexOf('\n') <= 0)
        line = line + '\n';
    while (true)
    {
        if (line[i] == '\n')
            break;
        if (line[i] == ',')
            if (line[i-1] == '"' && line[i+1] == '"')
                occurs.Add(i);

        i++;
    }
    line= line.Remove(line.Length - 1);
    ArrayList tokens = new ArrayList();
    int startIdx =0;
    int endIdx;
    int len ;
    for(int t =0; t<= occurs.Count ; t++)
    {
        if (t != occurs.Count)
        {
            len = (int)occurs[t] - startIdx;
            tokens.Add((line.Substring(startIdx, len).StartsWith(
                ",") == true) ? line.Substring(startIdx, len).Remove(
                0,1):line.Substring(startIdx, len) );
            startIdx = (int)occurs[t];
        }
        else
        {
            tokens.Add((line.Substring(startIdx).StartsWith(",") == true) ?
                line.Substring(startIdx).Remove(0, 1) : line.Substring(
                startIdx));
        }
    }
    i=0;
    for (i = 0; i < tokens.Count;i++ )
    {
        string str = tokens[i].ToString();
        if (str.StartsWith("\"", StringComparison.Ordinal))
            str= str.Remove(0, 1);
        if (str.EndsWith("\""))
            str= str.Remove(str.Length - 1);
        tokens[i] = str;

    }
    return (string[])tokens.ToArray(typeof(string));
}

Points of Interest

There are many possible ways to do this, and this is just one of it.

Here are the possible ways:

  1. Define a Finite State Automata to parse the line character by character.
  2. Using String.Replace("\",\"",SpecialChar), replace the pattern with a special character value which is not possible to occur in your normal set of values in the CSV file.

I hope this code might have been useful for beginners.

License

This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)

About the Author

vrg786



India India

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120517.1 | Last Updated 3 May 2007
Article Copyright 2007 by vrg786
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid