5,136,034 members and growing! (12,152 online)
Email Password   helpLost your password?
Languages » C# » General     Intermediate License: The Common Development and Distribution License (CDDL)

Parsing CSV embedded in double quotes - RFC 4180

By vrg786

Simple ways to parse CSV documents when values are embedded in double quotes
C# 2.0, C#, Windows, .NET, .NET 2.0VS2005, VS, Dev

Posted: 3 May 2007
Updated: 3 May 2007
Views: 4,131
Announcements



Search    
Advanced Search
Sitemap
4 votes for this Article.
Popularity: 1.15 Rating: 1.91 out of 5
3 votes, 75.0%
1
0 votes, 0.0%
2
1 vote, 25.0%
3
0 votes, 0.0%
4
0 votes, 0.0%
5
Note: This is an unedited contribution. If this article is inappropriate, needs attention or copies someone else's work without reference then please Report This Article

Introduction

This is how CSV can be parsed when they are embedded in double quotes.

Background

The article in based on RFC 4180 which states that the standered CSV may contain values seperated by COMMA and embeddedin DOUBLEQUOTE.

Take a look at the RFC >> http://tools.ietf.org/html/rfc4180

Using the Code

Blocks of code should be set as style "formatted" like this:

// Code

/// <summary>

/// RFC 4180 Comma Seperated Values

/// http://tools.ietf.org/html/rfc4180

/// </summary>

/// <param name="line">Single line from CSV file</param>

public string[] SplitOnDoubleQuotes(string line)
{
    int i = 0;
    ArrayList occurs = new ArrayList();
    if (line.IndexOf('\n') <= 0)
        line = line + '\n';
    while (true)
    {
        if (line[i] == '\n')
            break;
        if (line[i] == ',')
            if (line[i-1] == '"' && line[i+1] == '"')
                occurs.Add(i);

        i++;
    }
    line= line.Remove(line.Length - 1);
    ArrayList tokens = new ArrayList();
    int startIdx =0;
    int endIdx;
    int len ;
    for(int t =0; t<= occurs.Count ; t++)
    {
        if (t != occurs.Count)
        {
            len = (int)occurs[t] - startIdx;
            tokens.Add((line.Substring(startIdx, len).StartsWith(
                ",") == true) ? line.Substring(startIdx, len).Remove(
                0,1):line.Substring(startIdx, len) );
            startIdx = (int)occurs[t];
        }
        else
        {
            tokens.Add((line.Substring(startIdx).StartsWith(",") == true) ?
                line.Substring(startIdx).Remove(0, 1) : line.Substring(
                startIdx));
        }
    }
    i=0;
    for (i = 0; i < tokens.Count;i++ )
    {
        string str = tokens[i].ToString();
        if (str.StartsWith("\"", StringComparison.Ordinal))
            str= str.Remove(0, 1);
        if (str.EndsWith("\""))
            str= str.Remove(str.Length - 1);
        tokens[i] = str;

    }
    return (string[])tokens.ToArray(typeof(string));
}

Points of Interest

There are many possible ways to it, and this is just one of it.

Here are the possible ways:

  1. Define a Finite State Automata to parse the line character by character.
  2. Using our String.Replace("\",\"",SpecialChar), replace the pattern with a special character value which is not possible to occur in you normal set of values in the csv file.

I hope this code might have been useful for beginners.

-Vaibhav Gaikwad

License

This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)

About the Author

vrg786



Location: India India

Other popular C# articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
  (Refresh) 
Subject  Author Date 
-- There are no messages in this forum --

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 3 May 2007
Editor:
Copyright 2007 by vrg786
Everything else Copyright © CodeProject, 1999-2008
Web18 | Advertise on the Code Project