Click here to Skip to main content
15,881,757 members
Articles / Programming Languages / C#
Article

LINQ to Text or CSV Files - Part 1

Rate me:
Please Sign up or sign in to vote.
4.71/5 (42 votes)
1 Jan 2009CPOL2 min read 112.3K   1.1K   92   23
A simple utility to LINQ against CSV files

Introduction

With the introduction of LINQ in C# 3.0, the life of a developer has become very easy when it comes to querying collections. In this article, I will show you a quick and easy way to query large CSV files.

Background

Imagine you have a very large CSV file. Loading the entire file into memory (e.g. DataSet) and querying it with LINQ would be a huge overhead. So I thought why not use StreamReader on the file instead.

Using the Code

I have implemented IEnumerable<string[]> and for every call to GetEnumerator, a line is read from CSV file. This class returns an enumerable string[].

C#
public class TextFileReader : IEnumerable<string[]>
{
    private string _fileName = string.Empty;
    private string _delimiter = string.Empty;

    public TextFileReader(string fileName, string delimiter)
    {
        this._fileName = fileName;
        this._delimiter = delimiter;
    }

    #region IEnumerable<string[]> Members

    IEnumerator<string[]> IEnumerable<string[]>.GetEnumerator()
    {
        using (StreamReader streamReader = new StreamReader(this._fileName))
        {
            while (!streamReader.EndOfStream)
            {
                yield return streamReader.ReadLine().Split(new char[] { ',' });
            }
        }
    }

    #endregion

    #region IEnumerable Members

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ((IEnumerable)((IEnumerator<t />)this)).GetEnumerator();
    }

    #endregion
}

Using this class is very simple. Just initialize the class with the CSV file name and you are all set to LINQ it using standard operator. The following code shows how to query all the rows in the CSV file with the first column text starting with 'a'.

C#
TextFileReader reader1 = new TextFileReader(@"Sample.txt", ",");

            var query1 = from it1 in reader1
                        where it1[0].StartsWith("a")
                        select new { Name = it1[0], Age = it1[1], 
			EmailAddress = it1[2] };

            foreach (var x1 in query1)
            {
                Console.WriteLine(String.Format("Name={0} Age={1} 
		EmailAddress = {2}", x1.Name, x1.Age, x1.EmailAddress));
            }
}

Let's make things a little bit more interesting. Suppose you know the format of the CSV file and have defined business classes corresponding to the CSV files. In this case, the above class is not very useful. Let's modify the 'TextFilereader' to have generic support.

C#
public class TextFileReader<T> : IEnumerable<T> where T : new()
{
    private string _fileName = string.Empty;
    private string _delimiter = string.Empty;
    private Dictionary<String, PropertyInfo> _headerPropertyInfos = 
		new Dictionary<string, PropertyInfo>();
    private Dictionary<String, Type> _headerDaytaTypes = new Dictionary<string, Type>();

    public TextFileReader(string fileName, string delimiter)
    {
        this._fileName = fileName;
        this._delimiter = delimiter;
    }

    #region IEnumerable<string[]> Members

    IEnumerator<T> IEnumerable<T>.GetEnumerator()
    {
        using (StreamReader streamReader = new StreamReader(this._fileName))
        {
            string[] headers = streamReader.ReadLine().Split(new String[] 
		{ this._delimiter }, StringSplitOptions.None);
            this.ReadHeader(headers);

            while (!streamReader.EndOfStream)
            {
                T item = new T();

                string[] rowData = streamReader.ReadLine().Split(new String[] 
		{ this._delimiter }, StringSplitOptions.None);

                for (int index = 0; index < headers.Length; index++)
                {
                    string header = headers[index];
                    this._headerPropertyInfos[header].SetValue
			(item, Convert.ChangeType(rowData[index], 
			this._headerDaytaTypes[header]), null);
                }
                yield return item;
            }
        }
    }

    #endregion

    #region IEnumerable Members

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ((IEnumerable)((IEnumerator<T>)this)).GetEnumerator();
    }

    #endregion

    private void ReadHeader(string[] headers)
    {
        foreach (String header in headers)
        {
            foreach (PropertyInfo propertyInfo in (typeof(T)).GetProperties())
            {
                foreach (object attribute in propertyInfo.GetCustomAttributes(true))
                {
                    if ( attribute is ColumnAttribute )
                    {
                        ColumnAttribute columnAttribute = attribute as ColumnAttribute;
                        if (columnAttribute.Name == header)
                        {
                            this._headerPropertyInfos[header] = propertyInfo;
                            this._headerDaytaTypes[header] = columnAttribute.DataType;
                            break;
                        }
                    }
                }
            }
        }
    }
}
}

Now let's define a business class 'Person'. This class will hold the information stored in the CSV file. Its properties are marked with 'ColumnAttribute' to map the columns with CSV columns. The code is self explanatory.

C#
public class Person
{
    private string _name;

    [Column(Name = "Name", DataType = typeof(String))]
    public string Name
    {
        get { return _name; }
        set { _name = value; }
    }
    private int _age;

    [Column(Name = "Age", DataType = typeof(Int32))]
    public int Age
    {
        get { return _age; }
        set { _age = value; }
    }
    private string _emailAddress;

    [Column(Name = "EmailId", DataType = typeof(String))]
    public string EmailAddress
    {
        get { return _emailAddress; }
        set { _emailAddress = value; }
    }
}

[AttributeUsage(AttributeTargets.Property)]
public class ColumnAttribute : Attribute
{
    private string _name;

    public string Name
    {
        get { return _name; }
        set { _name = value; }
    }
    private Type _dataType;

    public Type DataType
    {
        get { return _dataType; }
        set { _dataType = value; }
    }
}

Using this class is again very simple. Just initialize the class with the CSV file name and the delimiter and LINQ it. The following code shows how to query all the rows in the CSV file with the first column Age greater than 2.

C#
TextFileReader<Person> reader2 =
new TextFileReader<Person>(@"SampleWithHeader.txt", ",");
var query2 = from it2 in reader2
             where it2.Age > 2
             select it2;

foreach (var x2 in query2)
{
    Console.WriteLine(String.Format("Name={0} Age={1}
    EmailAddress = {2}", x2.Name, x2.Age, x2.EmailAddress));
}

Points of Interest

Well, this may not be the best way to implement this. I am also into the process of learning LINQ. Please feel free to give your comments or suggestions. In my next article, I will show you how to write a custom query provider for CSV files. Please stay tuned.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionthousand separator issue Pin
Cyclone5520-Jun-12 17:27
Cyclone5520-Jun-12 17:27 
GeneralMy vote of 5 Pin
RJPalkar26-Oct-11 8:55
RJPalkar26-Oct-11 8:55 
QuestionOne question about headers Pin
anderdw27-Aug-11 5:55
anderdw27-Aug-11 5:55 
AnswerRe: One question about headers Pin
Alomgir Miah A7-Aug-11 6:11
Alomgir Miah A7-Aug-11 6:11 
GeneralRe: One question about headers Pin
anderdw28-Aug-11 5:21
anderdw28-Aug-11 5:21 
GeneralReally, really amazing utility. Pin
smt5217-Mar-11 12:27
smt5217-Mar-11 12:27 
GeneralRe: Really, really amazing utility. Pin
Alomgir Miah A24-Mar-11 13:04
Alomgir Miah A24-Mar-11 13:04 
GeneralAwesome - thanks. Pin
cheartwell1-Feb-10 7:32
cheartwell1-Feb-10 7:32 
GeneralRe: Awesome - thanks. Pin
Alomgir Miah A1-Mar-10 4:21
Alomgir Miah A1-Mar-10 4:21 
GeneralGood to see you here! Pin
Digvijay Chauhan11-Sep-09 4:33
Digvijay Chauhan11-Sep-09 4:33 
GeneralThanks for the codez Pin
Mycroft Holmes20-Dec-08 14:56
professionalMycroft Holmes20-Dec-08 14:56 
GeneralRe: Thanks for the codez [modified] Pin
Alomgir Miah A20-Dec-08 20:56
Alomgir Miah A20-Dec-08 20:56 
GeneralCSV and field separator Pin
TobiasP14-Oct-08 1:52
TobiasP14-Oct-08 1:52 
GeneralElegant! Pin
Pat Tormey14-Oct-08 1:06
Pat Tormey14-Oct-08 1:06 
GeneralJust a comment... Pin
Seth Morris13-Oct-08 22:17
Seth Morris13-Oct-08 22:17 
GeneralRe: Just a comment... Pin
TobiasP14-Oct-08 3:07
TobiasP14-Oct-08 3:07 
GeneralRe: Just a comment... Pin
Alomgir Miah A14-Oct-08 4:00
Alomgir Miah A14-Oct-08 4:00 
GeneralRe: Just a comment... Pin
Mycroft Holmes20-Dec-08 14:51
professionalMycroft Holmes20-Dec-08 14:51 
GeneralExcellent, but how about... Pin
John O'Halloran13-Oct-08 17:23
John O'Halloran13-Oct-08 17:23 
GeneralRe: Excellent, but how about... Pin
Alomgir Miah A13-Oct-08 17:55
Alomgir Miah A13-Oct-08 17:55 
GeneralGreat article!!! Pin
Achintya Jha12-Oct-08 14:16
Achintya Jha12-Oct-08 14:16 
GeneralGreat article! Pin
IamBond00712-Oct-08 2:25
IamBond00712-Oct-08 2:25 
GeneralNice! Pin
sdesciencelover12-Oct-08 1:32
sdesciencelover12-Oct-08 1:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.