Click here to Skip to main content
13,144,309 members (37,354 online)
Click here to Skip to main content
Add your own
alternative version

Stats

23K views
810 downloads
45 bookmarked
Posted 18 Aug 2011

Strongly typed CSV reader (CsvToObj) and Code First initialization

, 21 Sep 2011
Rate this:
Please Sign up or sign in to vote.
Loading a strongly typed list from a CSV via Reflection and initializing tables using EF Code First.

Introduction

Sometimes there's the need to import and handle sets of data coming from a CVS file. The usual steps are to read each line of the flat file as an array of strings and map them to the corresponding entity.

Scope

The aim of the article is to provide a strongly typed access to flat files; this is done by associating, with a lambda expression, the property of our entity with the position on the file. This can be handy specially in these days where newnesses like EF Code First are coming out and there's the need to initialize the table with more than a few records created manually.

Using the code

The class responsible to parse and cast the file into a list of the desired objects is CsvManager. Once we define the target type, we can associate the columns we want to get and their position, and the unmapped fields will be ignored. We can also specify if the first line of the CSV contains the column names in order to skip it. The SetField method only supports string properties but I guess it can be easily extended.

Update (26/08/2011)

I've updated the SetField method, it now supports:

The Solution contains three projects:

  • GM.CsvToObj.Lib contains the CsvToObj library responsible for the CSV handling.
  • GM.CsvToObj.Models contains a few entities used to run the code sample. It also contains the initializer for Code First and the Entity Framework Context.
  • GM.CsvToObj.ConsoleApp is responsible for creating the database and calling the initializer. It also triggers the parsing of the CSVs and populates the database.
//the boolean specify if the first line of the file contains the column names
CsvManager<Postcode> postcodeManager = 
  new CsvManager<Postcode>(@"c:\temp\postcodes.csv", true);
postcodeManager.SetField(x => x.Name, 0);
postcodeManager.SetField(x => x.Latitute, 1);
postcodeManager.SetField(x => x.Longitudine, 2);
postcodeManager.SetField(x => x.Easting, 3);
postcodeManager.SetField(x => x.Northing, 4);
postcodeManager.SetField(x => x.Grid, 5);
postcodeManager.SetField(x => x.Area, 6);
postcodeManager.SetField(x => x.Region, 7);
List<Postcode> postcodesResult = postcodeManager.GetObjectList();

Here are the CsvManager and the CsvReader:

public class CsvManager<T> where T: new()
{
    public char DefaultSeparator { get; set; }
    public List<CsvFieldTarget> CsvFieldsToMap { get; set; }
    public string FilePath { get; set; }
    public CsvReader Reader { get; set; }
    public bool IsFirstLineColumnName { get; private set; }

    public CsvManager(string filePath, bool isFirstLineColumnName, char separator)
    {
        CsvFieldsToMap = new List<CsvFieldTarget>();
        //Setting default separator 
        DefaultSeparator = separator;
        IsFirstLineColumnName = isFirstLineColumnName;
        FilePath = filePath;
    }


    public CsvManager(string filePath, bool isFirstLineColumnName):
           this(filePath, isFirstLineColumnName,','){}

    #region Public methods

    public void SetField(Expression<Func<T, string>> expression, int position)
    {
        GetAndSetFieldTarget(expression, position);
    }

    public void SetField(Expression<Func<T,ValueType>> expression, int position)
    {
        GetAndSetFieldTarget(expression, position);
    }

    public void SetField(Expression<Func<T, DateTime>> expression, int position)
    {
        GetAndSetFieldTarget(expression, position);
    }

    public List<T> GetObjectList()
    {
        Reader = new CsvReader(FilePath, CsvFieldsToMap, 
                     DefaultSeparator, IsFirstLineColumnName);
        var csvRows = GetRowsFromFile();
        var resultList = new List<T>(csvRows.Count);
        foreach (var csvRow in csvRows)
        {
            var destinationObject = new T();
            var createdObj = SetPropertyViaReflection(destinationObject, csvRow);
            resultList.Add(createdObj);
        }
        return resultList;
    }

    #endregion

    #region Private methods

    private void GetAndSetFieldTarget(Expression expression, int position)
    {
        var property = GetMemberInfo(expression);
        CsvFieldsToMap.Add(new CsvFieldTarget()
        {
            FieldName = property.Name,
            Position = position
        });
    }

    private List<CsvRow> GetRowsFromFile()
    {
        var csvRows = Reader.ReadCsvRows();
        return csvRows;

    }

    private T SetPropertyViaReflection(T destinationObj, CsvRow csvRow)
    {
        Type type = destinationObj.GetType();
        foreach (var csvFieldResult in csvRow.CsvFieldsResult)
        {
            PropertyInfo prop = type.GetProperty(csvFieldResult.FieldName);
            var propertyType = prop.PropertyType;
            var convertedValue = 
                Convert.ChangeType(csvFieldResult.FieldValue, propertyType);
            prop.SetValue(destinationObj, convertedValue, null);
        }
        return destinationObj;
    }

    private static MemberInfo GetMemberInfo(Expression method)
    {
        var lambda = method as LambdaExpression;
        if (lambda == null)
            throw new InvalidCastException("Invalid lambda expression");

        MemberExpression memberExpression = null;

        switch (lambda.Body.NodeType)
        {
            case ExpressionType.Convert:
                memberExpression =
                    ((UnaryExpression) lambda.Body).Operand as MemberExpression;
                break;
            case ExpressionType.MemberAccess:
                memberExpression = lambda.Body as MemberExpression;
                break;
        }

        if (memberExpression == null)
            throw new ArgumentException("Invalid expression");

        return memberExpression.Member;
    }

    #endregion
}

public class CsvReader
{
    #region Public members
    public List<CsvFieldTarget> CsvFieldTargets { get; private set; }
    public List<CsvRow> CsvRowsResult { get; private set; }
    public char Separator { get; private set; }
    public bool IsFirstLineColumnName { get; private set; }
    public string FilePath { get; private set; }
        
    #endregion
    #region Constructors
    public CsvReader(string filePath, List<CsvFieldTarget> 
           csvFieldTargets, char separator, bool isFirstLineColumnName)
    {
        FilePath = filePath;
        CsvFieldTargets = csvFieldTargets;
        Separator = separator;
        IsFirstLineColumnName = isFirstLineColumnName;
        CsvRowsResult = new List<CsvRow>();
    }
    #endregion
    #region Public methods
    public List<CsvRow> ReadCsvRows()
    {
        CsvRowsResult = IsFirstLineColumnName ? 
          GetCsvRowIterator().Skip(1).ToList() : GetCsvRowIterator().ToList();
        return CsvRowsResult;
    }
    #endregion
    #region Private methods
    private IEnumerable<CsvRow> GetCsvRowIterator()
    {
        using (StreamReader readFile = new StreamReader(FilePath))
        {
            string line;
            string[] row;
            while ((line = readFile.ReadLine()) != null)
            {
                row = line.Split(Separator);
                var resultRow = BuildCsvRow(row);
                yield return resultRow;
            }
        }
    }
    private CsvRow BuildCsvRow(string[] row)
    {
        var resultRow = new CsvRow(CsvFieldTargets.Count);
        foreach (var csvField in CsvFieldTargets)
        {
            var field = new CsvFieldResult()
            {
                FieldName = csvField.FieldName,
                Position = csvField.Position,
                FieldValue = row[csvField.Position]
            };
            resultRow.CsvFieldsResult.Add(field);
        }
        return resultRow;
          
    }
    #endregion
}

Usage example

The sample attached contains a console application that triggers the creation of three tables with EF Code First that gets then initialized using the CsvToObj library with two flat files, one for countries and one for the UK postcodes; the result can be seen through Management Studio:

Update (26/08/2011)

The sample file USPresident_Wikipedia.csv has been added. The corresponding class UsPresident has a few properties as DateTimes and integers.

public class UsPresident
{
    [Key]
    public int PresidencyId { get; set; }
    public string President { get; set; }
    public string WikipediaEntry { get; set; }
    public DateTime TookOffice { get; set; }
    public DateTime LeftOffice { get; set; }
    public string Party { get; set; }
    public string Portrait { get; set; }
    public string Thumbnail { get; set; }
    public string HomeState { get; set; }
}

Mapping:

private static List<UsPresident> LoadUsPresidents()
{
    CsvManager<UsPresident> presidentsManager = 
      new CsvManager<UsPresident>(
      @"c:\temp\USPresident_Wikipedia.csv", true);
    presidentsManager.SetField(x=>x.PresidencyId, 0);
    presidentsManager.SetField(x => x.President, 1);
    presidentsManager.SetField(x => x.WikipediaEntry, 2);
    presidentsManager.SetField(x => x.TookOffice, 3);
    presidentsManager.SetField(x => x.LeftOffice, 4);
    presidentsManager.SetField(x => x.Party, 5);
    presidentsManager.SetField(x => x.Portrait, 6);
    presidentsManager.SetField(x => x.Thumbnail, 7);
    presidentsManager.SetField(x => x.HomeState, 8);
    List<UsPresident> presidentsResult = presidentsManager.GetObjectList();

    return presidentsResult;
}

Here's the corresponding table created by EF:

Dependencies

In order to run the sample, you will need the latest version of Entity Framework. This is only needed if you want to use the Code First feature in the sample: http://www.microsoft.com/download/en/details.aspx?id=8363.

Last thing: create the c:\temp folder and add the CSV sample files included in the sample.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Giorgio Minardi
Software Developer Minalabs
United Kingdom United Kingdom
Giorgio Minardi is a .Net consultant actually working in UK after diffent years spent in Italy working on enteprise clients.

You may also be interested in...

Pro
Pro

Comments and Discussions

 
SuggestionGreat Tool. Expanded it with an AutoSetFIelds-Method Pin
Thomas Fröhle17-Apr-15 14:21
memberThomas Fröhle17-Apr-15 14:21 
GeneralMy vote of 5 Pin
Simon Hughes22-Aug-11 23:51
memberSimon Hughes22-Aug-11 23:51 
GeneralRe: My vote of 5 Pin
george2giga26-Aug-11 5:51
membergeorge2giga26-Aug-11 5:51 
GeneralI love it! Pin
zshun22-Aug-11 12:52
memberzshun22-Aug-11 12:52 
GeneralRe: I love it! Pin
george2giga26-Aug-11 5:47
membergeorge2giga26-Aug-11 5:47 
QuestionLINQ to CSV library Pin
Matt Perdeck21-Aug-11 19:07
memberMatt Perdeck21-Aug-11 19:07 
AnswerRe: LINQ to CSV library [modified] Pin
george2giga26-Aug-11 6:14
membergeorge2giga26-Aug-11 6:14 
Questionwell done! Pin
steliodibello18-Aug-11 23:59
membersteliodibello18-Aug-11 23:59 
AnswerRe: well done! Pin
george2giga19-Aug-11 0:03
membergeorge2giga19-Aug-11 0:03 
QuestionSniff a 5 - Very good start Pin
Dewey18-Aug-11 21:10
memberDewey18-Aug-11 21:10 
AnswerRe: Sniff a 5 - Very good start Pin
george2giga18-Aug-11 22:56
membergeorge2giga18-Aug-11 22:56 
AnswerRe: Sniff a 5 - Very good start Pin
george2giga26-Aug-11 5:48
membergeorge2giga26-Aug-11 5:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170915.1 | Last Updated 21 Sep 2011
Article Copyright 2011 by Giorgio Minardi
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid