Click here to Skip to main content
13,250,182 members (58,107 online)
Click here to Skip to main content
Add your own
alternative version

Stats

73.9K views
106 bookmarked
Posted 24 Oct 2016

Cinchoo ETL - CSV Reader

, 28 Oct 2017
Rate this:
Please Sign up or sign in to vote.
Simple CSV file reader for .NET

Contents

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This article talks about using CSVReader component offered by ChoETL framework. It is a simple utility class to extract CSV data from file / source.

UPDATE: Corresponding CSVWriter article can be found here.

Features:

  • Ultra-fast CSV Reader, parses CSV file with 15 columns / 500,000 records in 15 seconds.
  • Stream based parsers allow for ultimate performance, low resource usage, and nearly unlimited versatility scalable to any size data file, even tens or hundreds of gigabytes.
  • Event based data manipulation and validation allows total control over the flow of data during the bulk insert process.
  • Follows CSV standard file rules. Gracefully handles data fields that contain commas and line breaks.
  • In addition to comma, most delimiting characters can be used, including tab delimited fields.
  • Exposes IEnumarable list of objects - which is often used with LINQ query for projection, aggregation and filtration etc.
  • Supports deferred reading.
  • Supports processing files with culture specific date, currency and number formats.
  • Supports different character encoding.
  • Recognizes a wide variety of date, currency, enum, boolean and number formats when reading files.
  • Provides fine control of date, currency, enum, boolean, number formats when writing files.
  • Detailed and robust error handling, allowing you to quickly find and fix problems.

Performance:

This framework was tested with the test files containing 15 fields with different number of records (250,000, 500,000, 1M, 1.5M, 2M, 2.5M etc). The graph below shows the total time taken to load each file. The test was performed in Intel Core i5 / 8GB RAM machine. 

No of Records Run time (secs)
250,000 7
500,000 15
1,000,000 29
1,500,000 44
2,000,000 58
2,500,000 72

 

2. Requirement

This framework library is written in C# using .NET 4.5 Framework.

3. "Hello World!" Sample

  • Open VS.NET 2013 or higher
  • Create a sample VS.NET (.NET Framework 4.5) Console Application project
  • Install ChoETL via Package Manager Console using Nuget Command: Install-Package ChoETL
  • Use the ChoETL namespace

Let's begin by looking into a simple example of reading CSV file having 2 columns

Listing 3.1 Sample CSV data file

Id,Name
1,Tom
2,Carl
3,Mark

There are number of ways you can get the CSV file parsing started with minimal setup

3.1. Quick load - Data First Approach

It is the zero config, quick way to load a CSV file in no time. No POCO object is required. Sample code below shows how to load the file

Listing 3.1.1 Load CSV file using iterator

foreach (dynamic e in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

Listing 3.1.2 Load CSV file using loop

var reader = new ChoCSVReader("Emp.csv").WithFirstLineHeader();
dynamic rec;
 
while ((rec = reader.Read()) != null)
{            
    Console.WriteLine(String.Format("Id: {0}", rec.Id));
    Console.WriteLine(String.Format("Name: {0}", rec.Name));
}

This model uses generaric CSVReader object for parsing the file. CSVReader auto name the columns as Column1, Column2 ... in the dynamic object.

UPDATE:

With the latest Cinchoo ETL library, you can access members by index. Indexes are start by zero.

Listing 3.1.3 Access by index

foreach (dynamic e in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e[0]));
    Console.WriteLine(String.Format("Name: {0}", e[1]));
}

3.2. Code First Approach

This is another zero config way to parse and load CSV file using POCO class. First define a simple data class to match the underlying CSV file layout

Listing 3.2.1 Simple POCO entity class

public partial class EmployeeRec
{
    public int Id { get; set; }
    public string Name { get; set; } 
}

In above, the class defines two properties matching the sample CSV file template.

Listing 3.2.2 Load CSV file

foreach (var e in new ChoCSVReader<EmployeeRec>("Emp.csv").WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

3.3. Configuration First Approach

In this model, we define the CSV configuration with all the necessary parsing parameters along with CSV columns matching with the underlying CSV file. 

Listing 3.3.1 Define CSV configuration

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));

In above, the class defines two properties matching the sample CSV file template.

Listing 3.3.2 Load CSV file without POCO object

foreach (dynamic e in new ChoCSVReader("Emp.csv", config).WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

Listing 3.3.3 Load CSV file with POCO object

foreach (dynamic e in new ChoCSVReader<EmployeeRec>("Emp.csv", config).WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

3.4. Code First with declarative configuration

This is the combined approach to define POCO entity class along with CSV configuration parameters decorated declaratively. id is required column and name is optional value column with default value "XXXX". If name is not present, it will take the default value.

Listing 3.4.1 Define POCO Object

public class EmployeeRec
{
    [ChoCSVRecordField(1)]
    [Required]
    public int Id
    {
        get;
        set;
    }
    [ChoCSVRecordField(2)]
    [DefaultValue("XXXX")]
    public string Name
    {
        get;
        set;
    }

    public override string ToString()
    {
        return "{0}. {1}.".FormatString(Id, Name);
    }
}

The code above illustrates about defining POCO object to carry the values of each record line in the input file. First thing defines property for each record field with ChoCSVRecordFieldAttribute to qualify for CSV record mapping. Each property must specify position in order to be mapped to CSV column. Position is 1 based. Id is a required property. We decorated it with RequiredAttribute. Name is given default value using DefaultValueAttribute. It means that if the Name CSV column contains empty value in the file, it will be defaulted to 'XXXX' value.

It is very simple and ready to extract CSV data in no time.

Listing 3.4.2 Main Method

class Program
{
    static void Main(string[] args)
    {
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader<EmployeeRec>(reader))
        {
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.WriteLine("3,");

            writer.Flush();
            stream.Position = 0;
 
            object row = null;
 
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(String.Format("Id: {0}", row.Id));
                Console.WriteLine(String.Format("Name: {0}", row.Name));
            }
        }
    }
}

We start by creating a new instance of ChoCSVReader object. That's all. All the heavy lifting of parsing and loading CSV data stream into the objects is done by the parser under the hood.

By default, CSVReader discovers and uses default configuration parameters while loading CSV file. These can be overridable according to your needs. The following sections will give details about each configuration attributes.

4. Reading All Records

It is as easy as setting up POCO object match up with CSV file structure, you can read the whole file as enumerable pattern. It is a deferred execution mode, but take care while making any aggregate operation on them. This will load the entire file records into memory.

Listing 4.1 Read CSV File

foreach (var e in new ChoCSVReader<EmployeeRec>("Emp.csv").WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

or:

Listing 4.2 Read CSV file stream

foreach (var e in new ChoCSVReader<EmployeeRec>(textReader).WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

This model keeps your code elegant, clean, easy to read and maintain. Also leverages LINQ extension methods to to perform grouping, joining, projection, aggregation etc.

Listing 4.3 Using LINQ

var list = (from o in new ChoCSVReader<EmployeeRec>("Emp.csv").WithFirstLineHeader()
           where o.Name != null && o.Name.StartsWith("R")
           select o).ToArray();
 
foreach (var e in list)
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

5. Read Records Manually

It is as easy as setting up POCO object match up with CSV file structure, you can read the whole file as enumerable pattern

Listing 5.1 Read CSV file

var reader = new ChoCSVReader<EmployeeRec>("Emp.csv").WithFirstLineHeader();
var rec = (object)null;
 
while ((rec = reader.Read()) != null)
{            
    Console.WriteLine(String.Format("Id: {0}", rec.Id));
    Console.WriteLine(String.Format("Name: {0}", rec.Name));
}

6. Customize CSV Record

Using ChoCSVRecordObjectAttribute, you can customize the POCO entity object declaratively.

Listing 6.1 Customizing POCO object for each record

[ChoCSVRecordObject(Encoding = "Encoding.UTF32", 
ErrorMode = ChoErrorMode.IgnoreAndContinue, IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All)]
public class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    public int Id { get; set; }
    [ChoCSVRecordField(2, FieldName ="Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    public string Name { get; set; }
}

Here are the available attributes to carry out customization of CSV load operation on a file.

  • Delimiter - The value used to separate the fields in a CSV row. Default is Culture.TextInfo.ListSeparator used.
  • EOLDelimiter - The value used to separate CSV rows. Default is \r\n (NewLine).
  • CultureName - The culture name (ex. en-US, en-GB) used to read and write CSV data.
  • IgnoreEmptyLine - A flag to let the reader know if a record should be skipped when reading if it's empty. A record is considered empty if all fields are empty.
  • Comments - The value used to denote a line that is commented out. Multiple comments can be specified. Must be separated by comma.
  • QuoteChar - The value used to escape fields that contain a delimiter, quote, or line ending.
  • QuoteAllFields - N/A for reader.
  • Encoding - The encoding of the CSV file.
  • HasExcelSeperator - N/A for reader. Reader seamlessly recognize the excel separator if specified in the CSV file and use them for parsing.
  • ColumnCountStrict - This flag indicates if an exception should be thrown if reading an expected field is missing.
  • ColumnOrderStrict - This flag indicates if an exception should be thrown if reading an expected field is in wrong position in the file. This check will be performed only when ColumnCountStrict is true.
  • BufferSize - The size of the internal buffer that is used when reader is from the StreamReader.
  • ErrorMode - This flag indicates if an exception should be thrown if reading and an expected field is failed to load. This can be overridden per property. Possible values are:
    • IgnoreAndContinue - Ignore the error, record will be skipped and continue with next.
    • ReportAndContinue - Report the error to POCO entity if it is of IChoNotifyRecordRead type
    • ThrowAndStop - Throw the error and stop the execution
  • IgnoreFieldValueMode - A flag to let the reader know if a record should be skipped when reading if it's empty / null. This can be overridden per property. Possible values are:
    • Null - N/A
    • DBNull - N/A
    • Empty - skipped if the record value is empty
    • WhiteSpace - skipped if the record value contains only whitespaces
  • ObjectValidationMode - A flag to let the reader know about the type of validation to be performed with record object. Possible values are:
    • Off - No object validation performed.
    • MemberLevel - Validation performed at the time of each CSV property gets loaded with value.
    • ObjectLevel - Validation performed after all the properties are loaded to the POCO object.

7. Customize CSV Header

If the CSV file has header, you can instruct the POCO entity by using ChoCSVFileHeaderAttribute.

Listing 6.1 Customizing POCO object for file header

[__strong__][ChoCSVFileHeader]
public class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    public int Id { get; set; }
    [ChoCSVRecordField(2, FieldName ="Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    public string Name { get; set; }
}

Here are the available members to add some customization to it according to your need.

  • FillChar - N/A for reader
  • Justification - N/A for reader
  • IgnoreCase - true, CSV column matching are done with case insensitive. Otherwise false.
  • TrimOption - This flag tells the reader to trim whitespace from the beginning and ending of the CSV column header when reading. Possible values are Trim, TrimStart, TrimEnd.
  • Truncate - N/A for reader

8. Customize CSV Fields

For each CSV column, you can specify the mapping in POCO entity property using ChoCSVRecordFieldAttribute.

Listing 6.1 Customizing POCO object for CSV columns

[ChoCSVFileHeader]
public class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    public int Id { get; set; }
    [ChoCSVRecordField(2, FieldName ="Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    public string Name { get; set; }
}

Here are the available members to add some customization to it for each property:

  • FieldPosition - When mapping by position, you specify the index of the CSV column that you want to use for that property. It is 1 based.
  • FieldName - When mapping by name, you specify the name of the CSV column that you want to use for that property. For this to work, the CSV file must have a header record. The name you specify must match with the name of the header record.
  • FillChar - N/A for reader.
  • FieldValueJustification - N/A for reader.
  • FieldValueTrimOption - This flag tells the reader to trim whitespace from the beginning and ending of the field value when reading. Possible values are Trim, TrimStart, TrimEnd.
  • Truncate - N/A for reader.
  • Size - N/A for reader.
  • QuoteField - A flag that tells the reader that the CSV column value is surrounded by quotes.
  • ErrorMode - This flag indicates if an exception should be thrown if reading and an expected field failed to load. Possible values are:
    • IgnoreAndContinue - Ignore the error and continue to load other properties of the record.
    • ReportAndContinue - Report the error to POCO entity if it is of IChoRecord type.
    • ThrowAndStop - Throw the error and stop the execution.
  • IgnoreFieldValueMode - A flag to let the reader know if a record should be skipped when reading if it's empty / null. Possible values are:
    • Null - N/A
    • DBNull - N/A
    • Empty - skipped if the record value is empty.
    • WhiteSpace - skipped if the record value contains only whitespaces.

8.1. DefaultValue

It is the value used and set to the property when the CSV value is empty or whitespace (controlled via IgnoreFieldValueMode).

Any POCO entity property can be specified with default value using System.ComponentModel.DefaultValueAttribute.

8.2. ChoFallbackValue

It is the value used and set to the property when the CSV value failed to set. Fallback value only set when ErrorMode is either IgnoreAndContinue or ReportAndContinue.

Any POCO entity property can be specified with fallback value using ChoETL.ChoFallbackValueAttribute.

8.3. Type Converters

Most of the primitive types are automatically converted and set them to the properties. If the value of the CSV field can't automatically be converted into the type of the property, you can specify a custom / built-in .NET converters to convert the value. These can be either IValueConverter or TypeConverter converters.

There are couple of ways you can specify the converters for each field

  • Declarative Approach
  • Configuration Approach

8.3.1. Declarative Approach

This model is applicable to POCO entity object only. If you have POCO class, you can specify the converters to each property to carry out necessary conversion on them. Samples below shows the way to do it.

Listing 8.3.1.1 Specifying type converters

[ChoCSVFileHeader]
public class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    public int Id { get; set; }
    [ChoCSVRecordField(2, FieldName ="Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    public string Name { get; set; }
}

Listing 8.3.1.2 IntConverter implementation

public class IntConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        return value;
    }
 
    public object ConvertBack(object value, Type targetType, object parameter, CultureInfo culture)
    {
        return value;
    }
}

In the example above, we defined custom IntConverter class. And showed how to use it with 'Id' CSV property.

8.3.2. Configuration Approach

This model is applicable to both dynamic and POCO entity object. This gives freedom to attach the converters to each property at runtime. This takes the precedence over the declarative converters on POCO classes.

Listing 8.3.2.2 Specifying TypeConverters

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;

ChoCSVRecordFieldConfiguration idConfig = new ChoCSVRecordFieldConfiguration("Id", 1);
idConfig.AddConverter(new IntConverter());
config.CSVRecordFieldConfigurations.Add(idConfig);

config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name1", 2));

In above, we construct and attach the IntConverter to 'Id' field using AddConverter helper method in ChoCSVRecordFieldConfiguration object.

Likewise, if you want to remove any converter from it, you can use RemoveConverter on ChoCSVRecordFieldConfiguration object.

8.4. Validations

CSVReader leverages both System.ComponentModel.DataAnnotations and Validation Block validation attributes to specify validation rules for individual fields of POCO entity. Refer to the MSDN site for a list of available DataAnnotations validation attributes.

Listing 8.4.1 Using validation attributes in POCO entity

[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.IgnoreAndContinue,
       IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false)]
public partial class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }
 
    [ChoCSVRecordField(2, FieldName = "Name")]
    [Required]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
}

In example above, used Range validation attribute for Id property. Required validation attribute to Name property. CSVReader performs validation on them during load based on Configuration.ObjectValidationMode is set to ChoObjectValidationMode.MemberLevel or ChoObjectValidationMode.ObjectLevel.

Sometime you may want override the defined declarative validation behaviors comes with POCO class, you can do with Cinchoo ETL via configuration approach. The sample below shows the way to override them.

static void ValidationOverridePOCOTest()
{
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    var idConfig = new ChoCSVRecordFieldConfiguration("Id", 1);
    idConfig.Validators = new ValidationAttribute[] { new RequiredAttribute() };
    config.CSVRecordFieldConfigurations.Add(idConfig);
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader<EmployeeRecWithCurrency>(reader, config))
    {
        writer.WriteLine("1,Carl,$100000");
        writer.WriteLine("2,Mark,$50000");
        writer.WriteLine("3,Tom,1000");
 
        writer.Flush();
        stream.Position = 0;
 
        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", rec.Id));
            Console.WriteLine(String.Format("Name: {0}", rec.Name));
            Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
        }
    }
}
 
public class EmployeeRecWithCurrency
{
    public int? Id { get; set; }
    public string Name { get; set; }
    public ChoCurrency Salary { get; set; }
}

Some cases, you may want to take control and perform manual self validation within the POCO entity class. This can be achieved by inheriting POCO object from IChoValidatable interface.

Listing 8.4.2 Manual validation on POCO entity

[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.IgnoreAndContinue,
       IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false)]
public partial class EmployeeRec : IChoValidatable
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }
 
    [ChoCSVRecordField(2, FieldName = "Name")]
    [Required]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
 
    public bool TryValidate(object target, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public bool TryValidateFor(object target, string memberName, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public void Validate(object target)
    {
    }
 
    public void ValidateFor(object target, string memberName)
    {
    }
}

Sample above shows how to implement custom self-validation in POCO object.

IChoValidatable interface exposes below methods

  • TryValidate - Validate entire object, return true if all validation passed. Otherwise return false.
  • Validate - Validate entire object, throw exception if validation is not passed.
  • TryValidateFor - Validate specific property of the object, return true if all validation passed. Otherwise return false.
  • ValidateFor - Validate specific property of the object, throw exception if validation is not passed.

9. Excel Field Separator

If the input CSV file contains Excel field separator, the reader automatically detects and use them as field separator.

Listing 9.1 Sample CSV file with Excel field separator

sep=,
1,"Eldon Base for stackable storage shelf, platinum"
2,"1.7 Cubic Foot Compact ""Cube"" Office Refrigerators"
3,"Cardinal Slant-D® Ring Binder, Heavy Gauge Vinyl"
4,R380
5,Holmes HEPA Air Purifier

10. Callback Mechanism

CSVReader offers industry standard CSV parsing out of the box to handle most of the parsing needs. If the parsing is not handling any of the needs, you can use the callback mechanism offered by CSVReader to handle such situations. In order to participate in the callback mechanism, you can use either of the following models

  • Using event handlers exposed by CSVReader via IChoReader interface.
  • Inheriting POCO entity object from IChoNotifyRecordRead interface
  • Inheriting DataAnnotation's MetadataType type object by IChoNotifyRecordRead interface.

Note: Any exceptions raised out of these interface methods will be ignored.

Both IChoNotifyRecordRead and IChoRecorder exposes the below methods:

  • BeginLoad - Invoked at the begin of the CSV file load
  • EndLoad - Invoked at the end of the CSV file load
  • BeforeRecordLoad - Raised before the CSV record load
  • AfterRecordLoad - Raised after CSV record load
  • RecordLoadError - Raised when CSV record load errors out
  • BeforeRecordFieldLoad - Raised before CSV column value load
  • AfterRecordFieldLoad - Raised after CSV column value load
  • RecordFieldLoadError - Raised when CSV column value errors out

10.1 Using CSVReader events

This is more direct and simplest way to subscribe to the callback events and handle your odd situations in parsing CSV files. Downside is that code can't be reusable as you do by implementing IChoNotifyRecordRead with POCO record object.

Sample below shows how to use the BeforeRecordLoad callback method to skip lines stating with '%' characters.

Listing 10.1.1 Using CSVReader callback events

static void IgnoreLineTest()
{
    using (var parser = new ChoCSVReader("IgnoreLineFile.csv").WithFirstLineHeader())
    {
        parser.Configuration.Encoding = Encoding.BigEndianUnicode;
 
        parser.BeforeRecordLoad += (o, e) =>
        {
            if (e.Source != null)
            {
                e.Skip = ((string)e.Source).StartsWith("%");
            }
        };
        foreach (var e in parser)
            Console.WriteLine(e.ToStringEx());
    }
}

Likewise you can use other callback methods as well with CSVReader.

10.2 Implementing IChoNotifyRecordRead interface

Listing 10.2.1 Direct POCO callback mechanism implementation

[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.IgnoreAndContinue,
       IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false)]
public partial class EmployeeRec : IChoNotifyRecordRead
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }
    
    [ChoCSVRecordField(2, FieldName = "Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
 
    public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
    {
        throw new NotImplementedException();
    }
 
    public bool AfterRecordLoad(object target, int index, object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordFieldLoad(object target, int index, string propName, ref object value)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordLoad(object target, int index, ref object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeginLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public void EndLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordFieldLoadError(object target, int index, string propName, object value, Exception ex)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordLoadError(object target, int index, object source, Exception ex)
    {
        throw new NotImplementedException();
    }
}

Listing 10.2.2 MetaDataType based callback mechanism implementation

[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.IgnoreAndContinue,
       IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false)]
public class EmployeeRecMeta : IChoNotifyRecordRead
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }

    [ChoCSVRecordField(2, FieldName = "Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
 
    public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
    {
        throw new NotImplementedException();
    }
 
    public bool AfterRecordLoad(object target, int index, object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordFieldLoad(object target, int index, string propName, ref object value)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordLoad(object target, int index, ref object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeginLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public void EndLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordFieldLoadError(object target, int index, string propName, object value, Exception ex)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordLoadError(object target, int index, object source, Exception ex)
    {
        throw new NotImplementedException();
    }
} 

[MetadataType(typeof(EmployeeRecMeta))]
public partial class EmployeeRec
{
    [ChoCSVRecordField(1, FieldName = "id")]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }
    
    [ChoCSVRecordField(2, FieldName = "Name", QuoteField = true)]
    [Required]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
}

10.3 BeginLoad

This callback invoked once at the beginning of the CSV file load. source is the CSV file stream object. In here you have chance to inspect the stream, return true to continue the CSV load. Return false to stop the parsing.

Listing 10.3.1 BeginLoad Callback Sample

public bool BeginLoad(object source)
{
    StreamReader sr = source as StreamReader;
    return true;
}

10.4 EndLoad

This callback invoked once at the end of the CSV file load. source is the CSV file stream object. In here you have chance to inspect the stream, do any post steps to be performed on the stream.

Listing 10.4.1 EndLoad Callback Sample

public void EndLoad(object source)
{
    StreamReader sr = source as StreamReader;
}

10.5 BeforeRecordLoad

This callback invoked before each record line in the CSV file is loaded. target is the instance of the POCO record object. index is the line index in the file. source is the CSV record line. In here you have chance to inspect the line, and override it with new line if want to.

TIP: If you want to skip the line from loading, set the source to null.

TIP: If you want to take control of parsing and loading the record properties by yourself, set the source to String.Empty. 

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.5.1 BeforeRecordLoad Callback Sample

public bool BeforeRecordLoad(object target, int index, ref object source)
{
    string line = source as string;
    return true;
}

10.6 AfterRecordLoad

This callback invoked after each record line in the CSV file is loaded. target is the instance of the POCO record object. index is the line index in the file. source is the CSV record line. In here you have chance to do any post step operation with the record line.

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.6.1 AfterRecordLoad Callback Sample

public bool AfterRecordLoad(object target, int index, object source)
{
    string line = source as string;
    return true;
}

10.7 RecordLoadError

This callback invoked if error encountered while loading record line. target is the instance of the POCO record object. index is the line index in the file. source is the CSV record line. ex is the exception object. In here you have chance to handle the exception. This method invoked only when Configuration.ErrorMode is ReportAndContinue.

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.7.1 RecordLoadError Callback Sample

public bool RecordLoadError(object target, int index, object source, Exception ex)
{
    string line = source as string;
    return true;
}

10.8 BeforeRecordFieldLoad

This callback invoked before each CSV record column is loaded. target is the instance of the POCO record object. index is the line index in the file. propName is the CSV record property name. value is the CSV column value. In here you have chance to inspect the CSV record property value and perform any custom validations etc.

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.8.1 BeforeRecordFieldLoad Callback Sample

public bool BeforeRecordFieldLoad(object target, int index, string propName, ref object value)
{
    return true;
}

10.9 AfterRecordFieldLoad

This callback invoked after each CSV record column is loaded. target is the instance of the POCO record object. index is the line index in the file. propName is the CSV record property name. value is the CSV column value. Any post field operation can be performed here, like computing other properties, validations etc.

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.9.1 AfterRecordFieldLoad Callback Sample

public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
{
    return true;
}

10.10 RecordLoadFieldError

This callback invoked when error encountered while loading CSV record column value. target is the instance of the POCO record object. index is the line index in the file. propName is the CSV record property name. value is the CSV column value. ex is the exception object. In here you have chance to handle the exception. This method invoked only after the below two sequences of steps performed by the CSVReader

  • CSVReader looks for FallbackValue value of each CSV property. If present, it tries to assign its value to it.
  • If the FallbackValue value not present and the Configuration.ErrorMode is specified as ReportAndContinue., this callback will be executed.

Return true to continue the load process, otherwise return false to stop the process.

Listing 10.10.1 RecordFieldLoadError Callback Sample

public bool RecordFieldLoadError(object target, int index, string propName, object value, Exception ex)
{
    return true;
}

10. Customization

CSVReader automatically detects and loads the configured settings from POCO entity. At runtime, you can customize and tweak these parameters before CSV parsing. CSVReader exposes Configuration property, it is of ChoCSVRecordConfiguration object. Using this property, you can customize them.

Listing 10.1 Customizing CSVReader at run-time

class Program
{
    static void Main(string[] args)
    {
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader<EmployeeRec>(reader))
        {
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.WriteLine("3,");

            writer.Flush();
            stream.Position = 0;
 
            object row = null;
  
            parser.Configuration.ColumnCountStrict = true;
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(String.Format("Id: {0}", row.Id));
                Console.WriteLine(String.Format("Name: {0}", row.Name));
            }
        }
    }

11. AsDataReader Helper Method

CSVReader exposes AsDataReader helper method to retrieve the CSV records in .NET datareader object. DataReader are fast-forward streams of data. This datareader can be used in few places like bulk coping data to database using SqlBulkCopy, loading disconnected DataTable, etc.

Listing 11.1 Reading as DataReader sample

static void AsDataReaderTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader<EmployeeRec>(reader))
    {
        writer.WriteLine("1,Carl");
        writer.WriteLine("2,Mark");
        writer.WriteLine("3,Tom");
 
        writer.Flush();
        stream.Position = 0;
 
        IDataReader dr = parser.AsDataReader();
        while (dr.Read())
        {
            Console.WriteLine("Id: {0}, Name: {1}", dr[0], dr[1]);
        }
    }
}

12. AsDataTable Helper Method

CSVReader exposes AsDataTable helper method to retrieve the CSV records in .NET DataTable object. It then can be persisted to disk, displayed in grid/controls or stored in memory like any other object.

Listing 12.1 Reading as DataTable sample

static void AsDataTableTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader<EmployeeRec>(reader))
    {
        writer.WriteLine("1,Carl");
        writer.WriteLine("2,Mark");
        writer.WriteLine("3,Tom");
 
        writer.Flush();
        stream.Position = 0;
 
        DataTable dt = parser.AsDataTable();
        foreach (DataRow dr in dt.Rows)
        {
            Console.WriteLine("Id: {0}, Name: {1}", dr[0], dr[1]);
        }
    }
}

13. Using Dynamic Object

So far, the article explained about using CSVReader with POCO object. CSVReader also supports loading CSV file without POCO object. It leverages .NET dynamic feature. The sample below shows how to read CSV stream without POCO object.

If you have CSV file, you can parse and load the file with minimal/zero configuration. If the CSV file does not have header record line, the parser automatically names the columns as Column1, Column2, etc.

The sample below shows it:

Listing 13.1 Loading CSV file without header sample

class Program
{
    static void Main(string[] args)
    {
        dynamic row;
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader(reader))
        {
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.Flush();
            stream.Position = 0;
 
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(row.Column1);
            }
        }
    }
}

If the CSV file has a header, you can state that in the configuration as HasHeaderRecord is true and parse the file as simple as below:

Listing 13.2 Loading CSV file with header sample

class Program
{
    static void Main(string[] args)
    {
        ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
        config.CSVFileHeaderConfiguration.HasHeaderRecord = true;

        dynamic row;
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader(reader. config).WithFirstLineHeader())
        {
            writer.WriteLine("Id,Name");
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.Flush();
            stream.Position = 0;
 
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(row.Name);
            }
        }
    }
}

The above example automatically discovers the CSV columns from the header and parses the file.

You can override the default behavior of discovering columns automatically by adding field configurations manually and pass it to CSVReader for parsing file.

Sample shows how to do it:

Listing 13.3 Loading CSV file with configuration

class Program
{
    static void Main(string[] args)
    {
        ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
        config.CSVFileHeaderConfiguration.HasHeaderRecord = true;
        config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
        config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));

        dynamic row;
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader(reader. config).WithFirstLineHeader())
        {
            writer.WriteLine("Id,Name");
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.Flush();
            stream.Position = 0;
 
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(row.Name);
            }
        }
    }
}

To completely turn off the auto column discovery, you will have to set ChoCSVRecordConfiguration.AutoDiscoverColumns to false.

13.1. DefaultValue

It is the value used and set to the property when the CSV value is empty or whitespace (controlled via IgnoreFieldValueMode).

Any POCO entity property can be specified with default value using System.ComponentModel.DefaultValueAttribute.

For dynamic object members or to override the declarative POCO object member's default value specification, you can do so through configuration as shown below.

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2) { DefaultValue = "NoName" })

13.2. ChoFallbackValue

It is the value used and set to the property when the CSV value failed to set. Fallback value only set when ErrorMode is either IgnoreAndContinue or ReportAndContinue.

Any POCO entity property can be specified with fallback value using ChoETL.ChoFallbackValueAttribute.

For dynamic object members or to override the declarative POCO object member's fallback values, you can do through configuration as shown below.

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2) { FallbackValue = "Tom" });

13.3. FieldType

In the type less dynamic object model, the reader reads individual field value and populate them to dynamic object members in 'string' value. If you want to enforce the type and do extra type checking during load, you can do so by declaring the field type at the field configuration.

Listing 8.5.1 Defining FieldType

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1) { FieldType = typeof(int) });
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));

In above sample shows to define field type as 'int' to 'Id' field. This instruct the CSVReader to parse and convert the value to integer before assigning to it. This extra type safety alleviate the incorrect values being loaded to object while parsing.

13.4. Type Converters

Most of the primitive types are automatically converted and set them to the properties by CSVReader. If the value of the CSV field can't automatically be converted into the type of the property, you can specify a custom / built-in .NET converters to convert the value. These can be either IValueConverter or TypeConverter converters.

In the dynamic object model, you can specify these converters via configuration. See below example on the approach taken to specify type converters for CSV columns

Listing 13.4.1 Specifying TypeConverters

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;

ChoCSVRecordFieldConfiguration idConfig = new ChoCSVRecordFieldConfiguration("Id", 1);
idConfig.AddConverter(new IntConverter());
config.CSVRecordFieldConfigurations.Add(idConfig);

config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name1", 2));

In above, we construct and attach the IntConverter to 'Id' field using AddConverter helper method in ChoCSVRecordFieldConfiguration object.

Likewise, if you want to remove any converter from it, you can use RemoveConverter on ChoCSVRecordFieldConfiguration object.

13.5. Validations

CSVReader leverages both System.ComponentModel.DataAnnotations and Validation Block validation attributes to specify validation rules for individual CSV fields. Refer to the MSDN site for a list of available DataAnnotations validation attributes.

Listing 13.5.1 Specifying Validations

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.FileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = false;

ChoCSVRecordFieldConfiguration idConfig = new ChoCSVRecordFieldConfiguration("Id", 1);
idConfig.Validators = new ValidationAttribute[] { new RangeAttribute(0, 100) };
config.CSVRecordFieldConfigurations.Add(idConfig);

config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name1", 2));

In example above, we used Range validation attribute for Id property. CSVReader performs validation on them during load based on Configuration.ObjectValidationMode is set to ChoObjectValidationMode.MemberLevel or ChoObjectValidationMode.ObjectLevel.

PS: Self validation NOT supported in Dynamic object model

14. Working with sealed POCO object

If you already have existing sealed POCO object or the object is in 3rd party library, we can use them with CSVReader.  All you need is the CSV file with header in it.

Listing 14.1 Exisiting sealed POCO Object

public sealed class ThirdPartyRec
{
    public int Id
    {
        get;
        set;
    }
    public string Name
    {
        get;
        set;
    }
}

Listing 14.2 Consuming CSV file

class Program
{
    static void Main(string[] args)
    {
        using (var stream = new MemoryStream())
        using (var reader = new StreamReader(stream))
        using (var writer = new StreamWriter(stream))
        using (var parser = new ChoCSVReader<ThirdPartyRec>(reader).WithFirstLineHeader())
        {
            writer.WriteLine("Id,Name"); 
            writer.WriteLine("1,Carl");
            writer.WriteLine("2,Mark");
            writer.WriteLine("3,Tom");

            writer.Flush();
            stream.Position = 0;
 
            object row = null;
 
            while ((row = parser.Read()) != null)
            {
                Console.WriteLine(String.Format("Id: {0}", row.Id));
                Console.WriteLine(String.Format("Name: {0}", row.Name));
            }
        }
    }
}

In this case, CSVReader reverse discover the CSV columns from the CSV file and load the data into POCO object. If the CSV file structure and POCO object matches, the load will success with populating all corresponding data to its properties. In case the property is missing for any CSV column, CSVReader silently ignores them and continue on with rest.

You can override this behavior by setting ChoCSVRecordConfiguration.ThrowAndStopOnMissingField property to false. In this case, the CSVReader will throw ChoMissingRecordFieldException exception if a property is missing for a CSV column.

15. Exceptions

CSVReader throws different types of exceptions in different situations.

  • ChoParserException - CSV file is bad and parser not able to recover.
  • ChoRecordConfigurationException - Any invalid configuration settings are specified, this exception will be raised.
  • ChoMissingRecordFieldException - A property is missing for a CSV column, this exception will be raised.

16. Tips

16.1 Multiline CSV column value

If CSV file contains column values with newline characters in it, ChoCSVReader can handle it by specifing Configuration.MayContainEOLInData = true.

Listing 16.1.1 Multiline column values in CSV file

Id,Name
1,"Tom
Cassawaw"
2,"Carl"
3,"Mark"

In above, the Id (1) has name in multiline with surrounded quotes. Sample below show on how to handle it.

Listing 16.1.2 Reading multiline column values in CSV file

static void IgnoreUnwantedLines()
{
    using (var parser = new ChoCSVReader("Emp.csv").WithFirstLineHeader())
    {
        parser.Configuration.MayContainEOLInData = true;
 
        foreach (var e in parser)
            Console.WriteLine(e.ToStringEx());
    }
}

16.2 CSV column value with field separator

If CSV file contains column values with field separator (,) in it, ChoCSVReader can handle it with surrounded quotes.

Listing 16.2.1 CSV column value with separators

Id,Name
1,"Tom Cassawaw"
2,"Carl, Malcolm"
3,"Mark"

In above, the Id (2) has name with separator (,) in it. In order for ChoCSVReader recognize this situation, it must be surrounded by quotes.

16.3 CSV column value with single quotes

ChoCSVReader can read CSV column values with single quotes in it seamlessly. No surrounded quotes required.

Listing 16.3.1 CSV column value with single quotes

Id,Name
1,Tom Cassawaw
2,Carl'Malcolm
3,Mark

In above, the Id (2) has name with single quotes (') in it. ChoCSVReader recognize this situation, load these values successfully.

17. Using MetadataType Annotation

Cinchoo ETL works better with data annotation's MetadataType model. It is way to attach MetaData class to data model class. In this associated class, you provide additional metadata information that is not in the data model. It roles is to add attribute to a class without having to modify this one. You can add this attribute that takes a single parameter to a class that will have all the attributes. This is useful when the POCO classes are auto generated (by Entity Framework, MVC etc) by an automatic tools. This is why second class come into play. You can add new stuffs without touching the generated file. Also this promotes modularization by separating the concerns into multiple classes.

For more information about it, please search in MSDN.

Listing 17.1 MetadataType annotation usage sample

[MetadataType(typeof(EmployeeRecMeta))]
public class EmployeeRec
{
    public int Id { get; set; }
    public string Name { get; set; }
}

[ChoCSVFileHeader]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.ThrowAndStop,
IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false, 
    ObjectValidationMode = ChoObjectValidationMode.MemberLevel)]
public class EmployeeRecMeta : IChoNotifyRecordRead, IChoValidatable
{
    [ChoCSVRecordField(1, FieldName = "id", ErrorMode = ChoErrorMode.ReportAndContinue )]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, 1, ErrorMessage = "Id must be > 0.")]
    [ChoFallbackValue(1)]
    public int Id { get; set; }

    [ChoCSVRecordField(2, FieldName = "Name", QuoteField = true)]
    [StringLength(1)]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
 
    public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
    {
        throw new NotImplementedException();
    }
 
    public bool AfterRecordLoad(object target, int index, object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordFieldLoad(object target, int index, string propName, ref object value)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordLoad(object target, int index, ref object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeginLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public void EndLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordFieldLoadError(object target, int index, string propName, object value, Exception ex)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordLoadError(object target, int index, object source, Exception ex)
    {
        throw new NotImplementedException();
    }
 
    public bool TryValidate(object target, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public bool TryValidateFor(object target, string memberName, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public void Validate(object target)
    {
    }
 
    public void ValidateFor(object target, string memberName)
    {
    }
}

In above EmployeeRec is the data class. Contains only domain specific properties and operations. Mark it very simple class to look at it.

We separate the validation, callback mechanism, configuration etc into metadata type class, EmployeeRecMeta.

18. Configuration Choices

If the POCO entity class is an auto-generated class or exposed via library or it is a sealed class, it limits you to attach CSV schema definition to it declaratively. In such case, you can choose one of the options below to specify CSV layout configuration

  • Manual Configuration
  • Auto Map Configuration
  • Attaching MetadataType class 

I'm going to show you how to configure the below POCO entity class on each approach

Listing 18.1 Sealed POCO entity class

public sealed class EmployeeRec
{
    public int Id { get; set; }
    public string Name { get; set; }
}

18.1 Manual Configuration

Define a brand new configuration object from scratch and add all the necessary CSV fields to the ChoCSVConfiguration.CSVRecordFieldConfigurations collection property. This option gives you greater flexibility to control the configuration of CSV parsing. But the downside is that possibility of making mistakes and hard to manage them if the CSV file layout is large,

Listing 18.1.1 Manual Configuration

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.CSVFileHeaderConfiguration.HasHeaderRecord = true;
config.ThrowAndStopOnMissingField = true;
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));

18.2 Auto Map Configuration

This is an alternative approach and very less error-prone method to auto map the CSV columns for the POCO entity class.

First define a schema class for EmployeeRec POCO entity class as below

Listing 18.2.1 Auto Map class

public class EmployeeRecMap
{
    [ChoCSVRecordField(1, FieldName = "id")]
    public int Id { get; set; }
 
    [ChoCSVRecordField(2, FieldName = "Name")]
    public string Name { get; set; } 
}

Then you can use it to auto map CSV columns by using ChoCSVRecordConfiguration.MapRecordFields method

Listing 18.2.2 Using Auto Map configuration

ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
config.MapRecordFields<EmployeeRecMap>();

foreach (var e in new ChoCSVReader<EmployeeRec>("Emp.csv", config).WithFirstLineHeader()) 
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

18.3 Attaching MetadataType class

This is one another approach to attach MetadataType class for POCO entity object. Previous approach simple care for auto mapping of CSV columns only. Other configuration properties like property converters, parser parameters, default/fallback values etc. are not considered.

This model, accounts for everything by defining MetadataType class and specifying the CSV configuration parameters declaratively. This is useful when your POCO entity is sealed and not partial class. Also it is one of favorable and less error-prone approach to configure CSV parsing of POCO entity.

Listing 18.3.1 Define MetadataType class

[ChoCSVFileHeader()]
[ChoCSVRecordObject(Encoding = "Encoding.UTF32", ErrorMode = ChoErrorMode.ReportAndContinue,
IgnoreFieldValueMode = ChoIgnoreFieldValueMode.All, ThrowAndStopOnMissingField = false, 
    ObjectValidationMode = ChoObjectValidationMode.MemberLevel)]
public class EmployeeRecMeta : IChoNotifyRecordRead, IChoValidatable
{
    [ChoCSVRecordField(1, FieldName = "id", ErrorMode = ChoErrorMode.ReportAndContinue )]
    [ChoTypeConverter(typeof(IntConverter))]
    [Range(1, 1, ErrorMessage = "Id must be > 0.")]
    public int Id { get; set; }

    [ChoCSVRecordField(2, FieldName = "Name", QuoteField = true)]
    [StringLength(1)]
    [DefaultValue("ZZZ")]
    [ChoFallbackValue("XXX")]
    public string Name { get; set; }
 
    public bool AfterRecordFieldLoad(object target, int index, string propName, object value)
    {
        throw new NotImplementedException();
    }
 
    public bool AfterRecordLoad(object target, int index, object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordFieldLoad(object target, int index, string propName, ref object value)
    {
        throw new NotImplementedException();
    }
 
    public bool BeforeRecordLoad(object target, int index, ref object source)
    {
        throw new NotImplementedException();
    }
 
    public bool BeginLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public void EndLoad(object source)
    {
        throw new NotImplementedException();
    }
 
    public bool RecordFieldLoadError(object target, int index, string propName, object value, Exception ex)
    {
        return true;
    }
 
    public bool RecordLoadError(object target, int index, object source, Exception ex)
    {
        throw new NotImplementedException();
    }
 
    public bool TryValidate(object target, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public bool TryValidateFor(object target, string memberName, ICollection<ValidationResult> validationResults)
    {
        return true;
    }
 
    public void Validate(object target)
    {
    }
 
    public void ValidateFor(object target, string memberName)
    {
    }
}

Listing 18.3.2 Attaching MetadataType class

//Attach metadata 
ChoMetadataObjectCache.Default.Attach<EmployeeRec>(new EmployeeRecMeta());

foreach (var e in new ChoCSVReader<EmployeeRec>("Emp.csv").WithFirstLineHeader()) 
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

19. LoadText Helper Method

This is little nifty helper method to parse and load CSV text string into objects.

Listing 19.1 Using LoadText method

string txt = "Id, Name\r\n1, Mark";
foreach (dynamic e in ChoCSVReader.LoadText(txt).WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

20. Advanced Topics

20.1 Override Converters Format Specs

Cinchoo ETL automatically parses and converts each CSV column values to the corresponding CSV column's underlying data type seamlessly. Most of the basic .NET types are handled automatically without any setup needed.

This is achieved through two key settings in the ETL system

  1. ChoCSVRecordConfiguration.CultureInfo - Represents information about a specific culture including the names of the culture, the writing system, and the calendar used, as well as access to culture-specific objects that provide information for common operations, such as formatting dates and sorting strings. Default is 'en-US'.
  2. ChoTypeConverterFormatSpec - It is global format specifier class holds all the intrinsic .NET types formatting specs.

In this section, I'm going to talk about changing the default format specs for each .NET intrinsic data types according to parsing needs.

ChoTypeConverterFormatSpec is singleton class, the instance is exposed via 'Instance' static member. It is thread local, means that there will be separate instance copy kept on each thread.

There are 2 sets of format specs members given to each intrinsic type, one for loading and another one for writing the value, except for Boolean, Enum, DataTime types. These types have only one member for both loading and writing operations.

Specifying each intrinsic data type format specs through ChoTypeConverterFormatSpec will impact system wide. ie. By setting ChoTypeConverterFormatSpec.IntNumberStyle = NumberStyles.AllowParentheses, will impact all integer members of CSV objects to allow parentheses. If you want to override this behavior and take control of specific CSV data member to handle its own unique parsing of CSV value from global system wide setting, it can be done by specifying TypeConverter at the CSV field member level. Refer section 13.4 for more information.

NumberStyles (optional) used for loading values from CSV stream and Format string are used for writing values to CSV stream.

In this article I'll brief about using NumberStyles for loading CSV data from stream. These values are optional. It determines the styles permitted for each type during parsing of CSV file. System automatically figures out the way to parse and load the values from underlying Culture. In odd situation, you may want to override and set the styles the way you want in order to successfully load the file. Refer the MSDN for more about NumberStyles and its values.

Listing 20.1.1 ChoTypeConverterFormatSpec Members

public class ChoTypeConverterFormatSpec
{
    public static readonly ThreadLocal<ChoTypeConverterFormatSpec> Instance = new ThreadLocal<ChoTypeConverterFormatSpec>(() => new ChoTypeConverterFormatSpec());
 
    public string DateTimeFormat { get; set; }
    public ChoBooleanFormatSpec BooleanFormat { get; set; }
    public ChoEnumFormatSpec EnumFormat { get; set; }
 
    public NumberStyles? CurrencyNumberStyle { get; set; }
    public string CurrencyFormat { get; set; }
 
    public NumberStyles? BigIntegerNumberStyle { get; set; }
    public string BigIntegerFormat { get; set; }
 
    public NumberStyles? ByteNumberStyle { get; set; }
    public string ByteFormat { get; set; }
 
    public NumberStyles? SByteNumberStyle { get; set; }
    public string SByteFormat { get; set; }
 
    public NumberStyles? DecimalNumberStyle { get; set; }
    public string DecimalFormat { get; set; }
 
    public NumberStyles? DoubleNumberStyle { get; set; }
    public string DoubleFormat { get; set; }
 
    public NumberStyles? FloatNumberStyle { get; set; }
    public string FloatFormat { get; set; }
 
    public string IntFormat { get; set; }
    public NumberStyles? IntNumberStyle { get; set; }
 
    public string UIntFormat { get; set; }
    public NumberStyles? UIntNumberStyle { get; set; }
 
    public NumberStyles? LongNumberStyle { get; set; }
    public string LongFormat { get; set; }
 
    public NumberStyles? ULongNumberStyle { get; set; }
    public string ULongFormat { get; set; }
 
    public NumberStyles? ShortNumberStyle { get; set; }
    public string ShortFormat { get; set; }
 
    public NumberStyles? UShortNumberStyle { get; set; }
    public string UShortFormat { get; set; }
}

Sample below shows how to load CSV data stream having 'se-SE' (Swedish) culture specific data using CSVReader. Also the input feed comes with 'EmployeeNo' values containing parentheses. In order to make the load successful, we have to set the ChoTypeConverterFormatSpec.IntNumberStyle to NumberStyles.AllowParenthesis.

Listing 20.1.2 Using ChoTypeConverterFormatSpec in code

static void UsingFormatSpecs()
{
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    config.Culture = new System.Globalization.CultureInfo("se-SE");
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1) { FieldType = typeof(int) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("JoinedDate", 4) { FieldType = typeof(DateTime) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("EmployeeNo", 5) { FieldType = typeof(int) });
 
    ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader, config))
    {
        writer.WriteLine(@"1,Carl,12.345679 kr,2017-10-10,  (5)    ");
        writer.WriteLine("2,Markl,50000 kr,2001-10-01,  6    ");
        writer.WriteLine("3,Toml,150000 kr,1996-01-25,  9    ");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic row;
 
        while ((row = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
            Console.WriteLine(String.Format("JoinedDate: {0}", row.JoinedDate));
            Console.WriteLine(String.Format("EmployeeNo: {0}", row.EmployeeNo));
        }
    }
}

20.2 Currency Support

Cinchoo ETL provides ChoCurrency object to read and write currency values in CSV files. ChoCurrency is a wrapper class to hold the currency value in decimal type along with support of serializing them in text format during CSV load. 

Listing 20.2.1 Using Currency members in dynamic model

static void CurrencyDynamicTest()
{
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader, config))
    {
        writer.WriteLine("1,Carl,$100000");
        writer.WriteLine("2,Mark,$50000");
        writer.WriteLine("3,Tom,$1000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
        }
    }
}

Sample above shows how to load currency values using dynamic object model. By default, all the members of dynamic object are treated as string type, unless specified explicitly via ChoCSVFieldConfiguration.FieldType. By specifying the field type as ChoCurrency to the 'Sa;lary' CSV field, CSVReader loads them as currency object.

PS: The format of the currency value is figured by CSVReader through ChoRecordConfiguration.Culture and ChoTypeConverterFormatSpec.CurrencyNumberStyle.

Sample below shows how to use ChoCurrency CSV field in POCO entity class.

Listing 20.2.2 Using Currency members in POCO model

public class EmployeeRecWithCurrency
{
    public int Id { get; set; }
    public string Name { get; set; }
    public ChoCurrency Salary { get; set; }
}
 
static void CurrencyTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader<EmployeeRecWithCurrency>(reader))
    {
        writer.WriteLine("1,Carl,$100000");
        writer.WriteLine("2,Mark,$50000");
        writer.WriteLine("3,Tom,$1000");
 
        writer.Flush();
        stream.Position = 0;
 
        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
        }
    }
}

20.3 Enum Support

Cinchoo ETL implicitly handles parsing of enum column values from CSV files. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.EnumFormat. Default is ChoEnumFormatSpec.Value

FYI, changing this value will impact system wide.

There are 3 possible values can be used

  1. ChoEnumFormatSpec.Value - Enum value is used for parsing.
  2. ChoEnumFormatSpec.Name - Enum key name is used for parsing.
  3. ChoEnumFormatSpec.Description - If each enum key is decorated with DescriptionAttribute, its value will be use for parsing.

Listing 20.3.1 Specifying Enum format specs during parsing

public enum EmployeeType
{
    [Description("Full Time Employee")]
    Permanent = 0,
    [Description("Temporary Employee")]
    Temporary = 1,
    [Description("Contract Employee")]
    Contract = 2
}

static void EnumTest()
{
    ChoTypeConverterFormatSpec.Instance.EnumFormat = ChoEnumFormatSpec.Description;
 
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1) { FieldType = typeof(int) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("JoinedDate", 4) { FieldType = typeof(DateTime) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("EmployeeType", 5) { FieldType = typeof(EmployeeType) });
 
    ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader, config))
    {
        writer.WriteLine("1,Carl,12345679,01/10/2016,Full Time Employee");
        writer.WriteLine("2,Mark,50000,10/01/1995,Temporary Employee");
        writer.WriteLine("3,Tom,150000,01/01/1940,Contract Employee");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic row;
 
        while ((row = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
            Console.WriteLine(String.Format("JoinedDate: {0}", row.JoinedDate));
            Console.WriteLine(String.Format("EmployeeType: {0}", row.EmployeeType));
        }
    }
}

20.4 Boolean Support

Cinchoo ETL implicitly handles parsing of boolean CSV column values from CSV files. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.BooleanFormat. Default value is ChoBooleanFormatSpec.ZeroOrOne

FYI, changing this value will impact system wide.

There are 4 possible values can be used

  1. ChoBooleanFormatSpec.ZeroOrOne - '0' for false. '1' for true.
  2. ChoBooleanFormatSpec.YOrN - 'Y' for true, 'N' for false.
  3. ChoBooleanFormatSpec.TrueOrFalse - 'True' for true, 'False' for false.
  4. ChoBooleanFormatSpec.YesOrNo - 'Yes' for true, 'No' for false.

Listing 20.4.1 Specifying boolean format specs during parsing

static void BoolTest()
{
    ChoTypeConverterFormatSpec.Instance.BooleanFormat = ChoBooleanFormatSpec.ZeroOrOne;
 
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1) { FieldType = typeof(int) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("JoinedDate", 4) { FieldType = typeof(DateTime) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Active", 5) { FieldType = typeof(bool) });
 
    ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader, config))
    {
        writer.WriteLine("1,Carl,12345679,01/10/2016,0");
        writer.WriteLine("2,Mark,50000,10/01/1995,1");
        writer.WriteLine("3,Tom,150000,01/01/1940,1");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic row;
 
        while ((row = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
            Console.WriteLine(String.Format("JoinedDate: {0}", row.JoinedDate));
            Console.WriteLine(String.Format("Active: {0}", row.Active));
        }
    }
}

20.5 DateTime Support

Cinchoo ETL implicitly handles parsing of datetime CSV column values from CSV files using system Culture or custom set culture. If you want to fine control the parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.DateTimeFormat. Default value is 'd'.

FYI, changing this value will impact system wide.

You can use any valid standard or custom datetime .NET format specification to parse the datetime CSV values from the file.

Listing 20.5.1 Specifying datetime format specs during parsing

static void DateTimeTest()
{
    ChoTypeConverterFormatSpec.Instance.DateTimeFormat = "MMM dd, yyyy";
 
    ChoCSVRecordConfiguration config = new ChoCSVRecordConfiguration();
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Id", 1) { FieldType = typeof(int) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Name", 2));
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Salary", 3) { FieldType = typeof(ChoCurrency) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("JoinedDate", 4) { FieldType = typeof(DateTime) });
    config.CSVRecordFieldConfigurations.Add(new ChoCSVRecordFieldConfiguration("Active", 5) { FieldType = typeof(bool) });
 
    ChoTypeConverterFormatSpec.Instance.IntNumberStyle = NumberStyles.AllowParentheses;
 
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader, config))
    {
        writer.WriteLine(@"1,Carl,12345679,""Jan 01, 2011"",0");
        writer.WriteLine(@"2,Mark,50000,""Sep 23, 1995"",1");
        writer.WriteLine(@"3,Tom,150000,""Apr 10, 1999"",1");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic row;
 
        while ((row = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
            Console.WriteLine(String.Format("JoinedDate: {0}", row.JoinedDate));
            Console.WriteLine(String.Format("Active: {0}", row.Active));
        }
    }
}

Sample above shows how to parse custom datetime CSV values from CSV file. 

Note: As the datetime values contains CSV separator, it is given with double quotes to pass the parsing. 

21. Fluent API

CSVReader exposes few frequent to use configuration parameters via fluent API methods. This will make the programming of parsing of CSV files quicker.

21.1 WithDelimiter

This API method sets the CSV field separator on CSVReader.

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithDelimiter(",").WithFirstLineHeader())
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", row.Id));
            Console.WriteLine(String.Format("Name: {0}", row.Name));
            Console.WriteLine(String.Format("Salary: {0}", row.Salary));
        }
    }
}

21.2 WithFirstLineHeader

This API method flags the CSV file contains first row as header or not. Optional bool parameter specifies the first row header or not. Default is true. 

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithDelimiter(",").WithFirstLineHeader())
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", rec.Id));
            Console.WriteLine(String.Format("Name: {0}", rec.Name));
            Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
        }
    }
}

21.3 WithFields

This API method specifies the list of CSV fields to be considered for parsing and loading. Other fields in the CSV file will be discarded. This call re-initializes with the specified columns. 

In dynamic object model, all the CSV columns will be created and parsed as string type.

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().WithFields("Id", "Name"))
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", rec.Id));
            Console.WriteLine(String.Format("Name: {0}", rec.Name));
        }
    }
}

21.4 WithField

This API method used to add CSV column with specific date type. This method helpful in dynamic object model, by specifying each and individual CSV column with appropriate datatype.  

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().WithField("Id", typeof(ini).WithField("Name", typeof(string))
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", rec.Id));
            Console.WriteLine(String.Format("Name: {0}", rec.Name));
            Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
        }
    }
}

21.5 QuoteAllFields

This API method used to specify whether all fields are to be surrounded by quotes or not.  

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().QuoteAllFields())
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        dynamic rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(String.Format("Id: {0}", rec.Id));
            Console.WriteLine(String.Format("Name: {0}", rec.Name));
            Console.WriteLine(String.Format("Salary: {0}", rec.Salary));
        }
    }
}

21.6 ColumnCountStrict

This API method used to set the CSVReader to perform column count check before loading each line in CSV file.

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().ColumnCountStrict())
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");

        writer.Flush();
        stream.Position = 0;

        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(rec.ToStringEx());
        }
    }
}

21.6 ColumnOrderStrict

This API method used to set the CSVReader to perform column orderness check before loading each line in CSV file. If there is any order mis-match found in any line, an error will be reported. This option is effective only if the CSV file contains first line as header. So it must be combined with WithFirstLineHeader().

static void QuickDynamicTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().ColumnOrderStrict())
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(rec.ToStringEx());
        }
    }
}

21.8 NotifyAfter

This API method used to define the number of rows to be processed before generating a notification event. This property is designed for user interface components that illustrate the CSV loading progress. Notification are sent to subscribers who subscribed to RowsLoaded event.

static void NotifyAfterTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().NotifyAfter(1000))
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        parser.RowsLoaded += (o, e) => Console.WriteLine(e.RowsLoaded);

        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(rec.ToStringEx());
        }
    }
}

21.9 Configure

This API method used to configure all configuration parameters which are not exposed via fluent API. 

static void NotifyAfterTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().Configure(c => c.ErrorMode = ChoErrorMode.ThrowAndStop))
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        parser.RowsLoaded += (o, e) => Console.WriteLine(e.RowsLoaded);

        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(rec.ToStringEx());
        }
    }
}

21.10 Setup

This API method used to setup the reader's parameters / events via fluent API. 

static void NotifyAfterTest()
{
    using (var stream = new MemoryStream())
    using (var reader = new StreamReader(stream))
    using (var writer = new StreamWriter(stream))
    using (var parser = new ChoCSVReader(reader).WithFirstLineHeader().Configure(c => c.ErrorMode = ChoErrorMode.ThrowAndStop)
        .Setup(r => r.BeforeRecordLoad += (o, e) =>
        {
            if (e.Source.CastTo<string>().StartsWith("//"))
                e.Skip = true;
        })
    {
        writer.WriteLine("Id,Name,Salary");
        writer.WriteLine("1,Carl,1000");
        writer.WriteLine("2,Mark,2000");
        writer.WriteLine("3,Tom,3000");
 
        writer.Flush();
        stream.Position = 0;
 
        parser.RowsLoaded += (o, e) => Console.WriteLine(e.RowsLoaded);

        object rec;
        while ((rec = parser.Read()) != null)
        {
            Console.WriteLine(rec.ToStringEx());
        }
    }
}

21. FAQ

21.1 I want to write file with headers

You can use WithFirstLineHeader() fluent API method as below

foreach (dynamic e in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{            
    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

Or use Configuration.FirstLineHeader member

foreach (dynamic e in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{            
    parser.Configuration.FileHeaderConfiguration.HasHeaderRecord = true;

    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

 

21.1 File has lines I want to skip them from loading

You can subscribe to parser's BeforeRecordLoad event to observe the lines and ignore unwanted lines as below

using (var parser = new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{
    parser.BeforeRecordLoad += (o, e) =>
    {
        if (e.Source != null)
        {
            e.Skip = ((string)e.Source).StartsWith("_");
        }
    };

    foreach (var e in parser)
    {
        Console.WriteLine(String.Format("Id: {0}", e.Id));
        Console.WriteLine(String.Format("Name: {0}", e.Name));
    }
}

21.2 Not all characters are read from data file

CSVReader smart enough to determine the encoding of the CSV file automatically. In rare cases, you may want to specify the encoding explicitly to CSVReader to read out all the characters from the CSV file.

Sample below shows how to specify the encoding to CSVReader configuration object to let the reader handles the opening of the file with passed encoding value.

foreach (dynamic e in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{            
    parser.Configuration.Encoding = Encoding.BigEndianUnicode;

    Console.WriteLine(String.Format("Id: {0}", e.Id));
    Console.WriteLine(String.Format("Name: {0}", e.Name));
}

21.3 How to handle multiline records

CSVReader can handle multiline records in data file. You can turn on this option via Configuration.MayContainEOLInData option.

Sample below shows how to read such file

using (var parser = new ChoCSVReader("MultiLineValue.csv").WithFirstLineHeader())
{
    parser.Configuration.MayContainEOLInData = true;
 
    foreach (var e in parser)
        Console.WriteLine(e.ToStringEx());
}

21.4 How to handle errors and log them

CSVReader exposes callback machanism by which you can subscribe to the events and handle odd situation gracefully. Sample below shows how to handle field level errors and log them to custom log object

using (var parser = new ChoCSVReader<EmployeeRec>("empwithsalary.csv").WithFirstLineHeader())
{
    parser.RecordFieldLoadError += (o, e) =>
    {
        //Log the error message to log object
        Console.Write(e.Exception.Message);
        //Set the handled flag as true to continue loading of CSV  file
        e.Handled = true;
    };
    foreach (var i in parser)
        Console.WriteLine(i.ToStringEx());
}

In above, we subscribe to RecordFieldLoadError event and capture any errors in the custom log object.

21.5 How to handle column headers having spaces or special characters in them?

CSVReader can handle this situation gracefully, load the CSV file without any issues. When you use dynamic parser, the column names with special characters are replaced with _ character.

ex.

   COMPANY NAME -> COMPANY_NAME

  COMPANY TYPE -> COMPANY_TYPE

   FIRST NAME$ -> FIRST_NAME_

COMPANY NAME, Email, Job Title,COMPANY TYPE, CITY, FIRST NAME$, LAST NAME
"Bbc Worldwide Labs, Bounce Republic Ltd",hidden @hidden.com,"Broadcaster, Voice Over & Founder Of Passion Pods", Broadcast Media, London, Emily, Chiswell

Loading file using dynamic parser, the field names are normalized automtically with _ character as below

foreach (dynamic rec in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{
    Console.WriteLine(rec.COMPANY_NAME);
    Console.WriteLine(rec.COMPANY_TYPE);
    Console.WriteLine(rec.FIRST_NAME_);
}

UPDATE:

Now you can access members by original names through indexers as shown below

foreach (dynamic rec in new ChoCSVReader("Emp.csv").WithFirstLineHeader())
{
    Console.WriteLine(rec["COMPANY_NAME");
    Console.WriteLine(rec["COMPANY TYPE");
    Console.WriteLine(rec["FIRST NAME$"]);
}

 

21.6 Failed to format DateTime during load?

CSVReader uses default culture as en-US. If the data file contains datetimes in differenct culture, CSVReader may failed to load those values during parsing. This can be handled by settings culture to the matching data file culture. Sample below shows how to load data file containing datetimes in en-GB culture.

CSV File (Emp.csv):

Id,Date,Account,Amount,Subcategory,Memo
1,09/05/2017,XXX XXXXXX,-29.00,FT,Sample string
2,09/05/2017,XXX XXXXXX,-20.00,FT,Sample string
3,25/05/2017,XXX XXXXXX,-6.30,PAYMENT,Sample string

POCO class:

class Transaction
{
    public string Id { get; set; }
    public DateTime Date { get; set; }
    public string Account { get; set; }
    public decimal Amount { get; set; }
    public string Subcategory { get; set; }
    public string Memo { get; set; }
}

Parsing Code:

using (var parser = new ChoCSVReader<Transaction>("emp.csv").WithFirstLineHeader())
{
    csv.Configuration.Culture = CultureInfo.GetCultureInfo("en-GB");
    foreach (var t in parser)
        Console.WriteLine(string.Format("{0:dd-MMM-yyyy}  {1}  {2,6}  {3,-7}  {4}", t.Date, t.Account, t.Amount, t.Subcategory, t.Memo));
}

21.7 How to handle duplicate CSV columns?

CSVReader can't handle duplicate CSV columns out of the box. If found, it will error out and stop the parsing of the CSV file. But there is a workaround to handle such situation. 

CSV File (Emp.csv):

Id,Date,Account,Amount,Subcategory,Memo,Memo
1,09/05/2017,XXX XXXXXX,-29.00,FT,Memo1,DupMemo1
2,09/05/2017,XXX XXXXXX,-20.00,FT,Memo2,DupMemo2
3,25/05/2017,XXX XXXXXX,-6.30,PAYMENT,Memo3,DupMemo2

In above, the file has duplicate 'Memo' csv column. In order to load it using dynamic mode, you should tell the CSVReader to ignore the header line using WithFirstLineHeader() overload as below

using (var parser = new ChoCSVReader("emp.csv").WithFirstLineHeader(true))
{
    foreach (var t in parser)
    {
        Console.WriteLine(t[1]);
        Console.WriteLine(t["Column1"]);
    }
}

In above, you have choices to access the CSV columns by indexes or by system generated names Column1, Column2 etc.

Alternatively, you can provide column name overrides via configuration as below to access them by name

using (var parser = new ChoCSVReader("emp.csv").WithFirstLineHeader(true)
     .WithColumns("Id,Date,Account,Amount,Subcategory,Memo,Memo1"))
{
    foreach (var t in parser)
    {
        Console.WriteLine(t.Memo);
        Console.WriteLine(t.Memo1);
    }
}

POCO class:

When using POCO class for parsing, define the members with CSV column by position using ChoCSVRecordFieldAttribute. In below, we defined Memo1 object member to map to duplicate Memo column.

class Transaction
{
    [ChoCSVRecordField(1)]
    public string Id { get; set; }
    [ChoCSVRecordField(2)]
    public DateTime Date { get; set; }
    [ChoCSVRecordField(3)]
    public string Account { get; set; }
    [ChoCSVRecordField(4)]
    public decimal Amount { get; set; }
    [ChoCSVRecordField(5)]
    public string Subcategory { get; set; }
    [ChoCSVRecordField(6)]
    public string Memo { get; set; }
    [ChoCSVRecordField(7)]
    public string Memo1 { get; set; }
}

Then use this POCO class to ChoCSVReader to load the file as below

using (var parser = new ChoCSVReader<Tranaction>("emp.csv").WithFirstLineHeader(true))
{
    foreach (var t in parser)
    {
        Console.WriteLine(t.Memo);
        Console.WriteLine(t.Memo1);
    }
}

21.8 What if CSV header is found other than first line?

Rarely some CSV files may comes with header line at some place other than first line. If the headerline is always comes at fixed line, this sample shows you how to parse the file using WithHeaderLineAt() method. 

CSV File (Emp.csv):

# This file is produced by XYZ systems
# Id - id column
# Date - Date of the account opened
Id,Date,Account,Amount,Subcategory,Memo,Memo1
1,09/05/2017,XXX XXXXXX,-29.00,FT,Memo1,DupMemo1
2,09/05/2017,XXX XXXXXX,-20.00,FT,Memo2,DupMemo2
3,25/05/2017,XXX XXXXXX,-6.30,PAYMENT,Memo3,DupMemo2

In above, the CSV file has the header at line 4. (PS. Index starts with 1). Here is how you can load them with header is present at line 4.

using (var parser = new ChoCSVReader("emp.csv").WithHeaderLineAt(4))
{
    foreach (var t in parser)
    {
        Console.WriteLine(t.Id);
        Console.WriteLine(t.Date);
    }
}

What if the CSV file header is comes at random line each time when you receive the file, but preceded by it with only comment lines? This can be handled by standard mechanism as shown below

using (var parser = new ChoCSVReader("emp.csv")
      .Configure(c => c.Comments = new string[] { "#" })
      .WithFirstLineHeader()
)
{
    foreach (var t in parser)
    {
        Console.WriteLine(t.Id);
        Console.WriteLine(t.Date);
    }
}

21.9 How to turn off 'Records loaded' or other trace messages?

You can turn off or control the Cinchoo ETL trace messages by setting ChoETLBootstrap.TraceLevel property at the application startup

ChoETLFrxBootstrap.TraceLevel = System.Diagnostics.TraceLevel.Off;

Possible values are

System.Diagnostics.TraceLevel.Off
System.Diagnostics.TraceLevel.Info;
System.Diagnostics.TraceLevel.Error;
System.Diagnostics.TraceLevel.Verbose;

21.10 How to specify default values to CSV columns?

Sample shows how to specify defaults to CSV columns in dynamic model

using (var cr = new ChoCSVReader("sample.csv")
    .WithFirstLineHeader()
    .WithField("firstName", defaultValue)
    .WithField("lastName")
    .WithField("salary", defaultValue: 100, fieldType: typeof(double))
    )
)
{
    foreach (var x in cr)
        Console.WriteLine(ChoUtility.ToStringEx(x));
}

In above sample, the default value is specified to 'Salary' CSV colum as 100 using WithField fluent API.

In POCO model, the default values can be specified using System.ComponentModel.DefaultValueAttribute as shown below

public class Employee
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    [DefaultValue(100)]
    public double Salary { get; set; }
}

21.11 Does Cinchoo ETL support hierarchical  object support?

Yes, it does. If your CSV file comes with header containing CSV column names are specified with hierarchical separator, Cinchoo ETL automatically construct them into the nested object.

PS: This feature is currently supported in dynamic object model only.

Nested CSV:

id,name,category/id,category/name,category/subcategory/id,category/subcategory/name,description
1,Test123,15,Cat123,10,SubCat123,Desc123

The above sample CSV file comes with nested column names, category/name, category/subcategory/id etc. The column name separater is '/'. In order to treat them as nested columns and load them as nested object, you must specify the 'NestedColumnSepatator' property of ChoCSVReader engine. Sample below shows how to do it

using (var csv = new ChoCSVReader("nested.csv").WithFirstLineHeader()
    .Configure(c => c.NestedColumnSeparator = '/')
    )
{
    foreach (var x in csv) 
         Console.WriteLine(x.DumpAsJson());
}

22. References

Here are the list of other modules exposed by Cinchoo ETL for your reference to take look at it

  1. CSV Reader / Writer

  2. Fixed Length Reader / Writer

  3. Xml Reader / Writer

  4. JSON Reader / Writer

  • JSON Reader - Coming soon
  • JSON Writer - Coming soon

  5. Key-Value Reader / Writer

  • Key-Value Reader - Coming soon
  • Key-Value Writer  - Coming soon

  6. Manifold Reader / Writer

  • Manifold Reader - Coming soon
  • Manifold Writer - Coming soon

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Cinchoo
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

 
QuestionVery Functional Application - How can I turn off the messages about how many records loaded? Pin
Kimball Johnson16-Oct-17 1:21
memberKimball Johnson16-Oct-17 1:21 
AnswerRe: Very Functional Application - How can I turn off the messages about how many records loaded? Pin
Cinchoo17-Oct-17 14:27
memberCinchoo17-Oct-17 14:27 
GeneralColumn name matching in dynamic mode. Pin
SomeIvan10-Sep-17 3:50
memberSomeIvan10-Sep-17 3:50 
GeneralRe: Column name matching in dynamic mode. Pin
Cinchoo13-Sep-17 12:05
memberCinchoo13-Sep-17 12:05 
SuggestionGood presentation of program, but did you consider what was already out there? Pin
Thomas Nielsen - MIRSK21-Jun-17 3:09
memberThomas Nielsen - MIRSK21-Jun-17 3:09 
QuestionSave in a Log Record or Firld error Pin
Sql Para Todos25-May-17 9:13
memberSql Para Todos25-May-17 9:13 
AnswerRe: Save in a Log Record or Firld error Pin
Cinchoo26-May-17 8:57
memberCinchoo26-May-17 8:57 
QuestionReal sample apps Pin
kiquenet.com28-Mar-17 23:37
professionalkiquenet.com28-Mar-17 23:37 
GeneralMy vote of 5 Pin
_Vitor Garcia_22-Mar-17 3:40
member_Vitor Garcia_22-Mar-17 3:40 
GeneralRe: My vote of 5 Pin
Cinchoo22-Mar-17 5:41
memberCinchoo22-Mar-17 5:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171114.1 | Last Updated 28 Oct 2017
Article Copyright 2016 by Cinchoo
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid