Cinchoo ETL - CSV Lite Reader
Simple fast lite CSV file reader for .NET
- Download Cinchoo ETL source
- Download Cinchoo ETL binary (.NET Core)
- Download Cinchoo ETL binary (.NET Framework)
1. Introduction
ChoETL
is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time. This article talks about using CSVReader
component offered by ChoETL
framework. It is a simple utility class to extract CSV data from file / source.
Features
- Ultra-fast CSV Reader, parses CSV file quickly (1 million rows, 20 columns taking about ~7secs)
- Stream based parsers allow for ultimate performance, low resource usage, and nearly unlimited versatility scalable to any size data file, even tens or hundreds of gigabytes
- Follows CSV standard file rules (Multi-line, quoted columns, etc.)
- In addition to comma, most delimiting characters can be used, including tab delimited fields.
- Exposes
IEnumarable
list of objects - which is often used with LINQ query for projection, aggregation and filtration, etc. - Supports deferred reading
- Ability to get typed list of objects from CSV file
2. Requirement
This framework library is written in C# using .NET 4.5 Framework / .NET Core 2.x.
3. "Hello World!" Sample
- Open VS.NET 2013 or higher
- Create a sample VS.NET (.NET Framework 4.5) Console Application project
- Install ChoETL via Package Manager Console using Nuget Command based on the .NET environment:
Install-Package ChoETL
Install-Package ChoETL.NETStandard
- Use the
ChoETL
namespace - Use
ChoCSVLiteReader
class to parse CSV file
Let's begin by looking into a simple example of reading CSV file having two columns.
Listing 3.1 Sample CSV Data File (Emp.csv)
Id,Name
1,Tom
2,Carl
3,Mark
There are number of ways you can get the CSV file parsing started with minimal setup.
3.1. Quick load - Data First Approach
It is the zero config, quick way to load a CSV file in no time. No POCO object is required. The sample code below shows how to load the file.
Listing 3.1.1 Load CSV file using iterator (fastest)
using (var r = new ChoCSVLiteReader())
{
//Open the reader, skip the header
foreach (var cols in r.ReadFile("emp.csv").Skip(1))
{
Console.WriteLine($"Id: {cols[0]}");
Console.WriteLine($"Name: {cols[1]}");
}
}
Sample fiddle: https://dotnetfiddle.net/kWhr27
Listing 3.1.2 Load CSV file using loop (fastest)
using (var r = new ChoCSVLiteReader())
{
//Open the reader, skip the header
var recNum = r.ReadFile("emp.csv").Skip(1).GetEnumerator();
while (recNum.MoveNext())
{
var cols = recNum.Current;
Console.WriteLine($"Id: {cols[0]}");
Console.WriteLine($"Name: {cols[1]}");
}
}
Sample fiddle: https://dotnetfiddle.net/bV7nq5
You can also access csv
fields by names as well. The sample below shows how to access them by names.
Using ChoDynamicObject (special type of expando object)
Listing 3.1.3 Load CSV file using column names (using ChoDynamicObject)
using (var r = new ChoCSVLiteReader())
{
foreach (dynamic rec in r.ReadFile<ChoDynamicObject>("emp.csv", true))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
Sample fiddle: https://dotnetfiddle.net/PTnx2L
Using ExpandoObject
Listing 3.1.4 Load CSV file using column names (using ExpandoObject)
using (var r = new ChoCSVLiteReader())
{
foreach (var rec in r.ReadFile<ExpandoObject>("emp.csv", true))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
}
If the CSV file does not comes with header, CSVReader
auto name the columns as Column1
, Column2
... in the dynamic object.
3.2. Code First Approach
This is another zero config way to parse and load CSV file using POCO class. First, define a simple data class to match the underlying CSV file layout.
Listing 3.2.1 Simple POCO entity class
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
In the above, the class defines two properties matching the sample CSV file template.
3.2.1 Using User Defined Mapper
The sample below shows how to load CSV using custom user defined mapper.
Listing 3.2.1.1 Load CSV file with custom user defined mapper
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true,
mapper: (lineno, cols, rec) =>
{
rec.Id = cols[0].CastTo<int>();
rec.Name = cols[1];
}))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
In the above sample, we take control of loading CSV values to object members using mapper function.
Sample fiddle: https://dotnetfiddle.net/NZZ5EK
3.2.2 Using default built-in Mapper
The sample below shows how to load the CSV file using default built-in mapped comes with CSV reader.
Listing 3.2.2.1 Load CSV file with built-in mapper (default map)
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
In the above sample, we let the parse use the built-in mapper feature to load the CSV values to object members. By default, the built-in mapper simply maps the CSV columns to object members by index (first column map to first object member, second one maps to second object member and so on).
Sample fiddle: https://dotnetfiddle.net/IZRKWT
3.2.3 Using Positional Built-in Mapper
If the CSV files comes with different order from the defined POCO model object, but wanted to load them by positional mapping, you can do so by decorating the object members with ColumnAttribute
to specify the mapping order to CSV columns.
Listing 3.2.3.1 POCO entity class with OrderAttribute
public partial class EmployeeRec
{
[Column(Order=1)]
public string Name { get; set; }
[Column(Order=0)]
public int Id { get; set; }
}
Listing 3.2.3.2 Load CSV file with built-in mapper (positional map)
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
In the above sample, parser used order attribute to map the CSV columns to corresponding object members during parsing.
Sample fiddle: https://dotnetfiddle.net/fwd3j5
3.2.4 Using naming built-in Mapper
If the CSV files comes with column headers not matching with defined POCO model object members, you can match them by using DisplayNameAttribute / DisplayAttribute
to specify the CSV column names to object members.
Listing 3.2.4 POCO entity class with DisplayNameAttribute
public partial class EmployeeRec
{
[DisplayName("Id")]
public int Identifier { get; set; }
[DisplayName("Name")]
public string GivenName { get; set; }
}
Listing 3.2.5 Load CSV file with built-in mapper (name map)
foreach (var rec in r.ReadFile<EmployeeRec>("emp.csv", true))
{
Console.WriteLine($"Id: {rec.Id}");
Console.WriteLine($"Name: {rec.Name}");
}
In the above sample, parser uses the display attributes to map the CSV columns to corresponding object members during the parsing.
Sample fiddle: https://dotnetfiddle.net/K65Ywq
3.2. Other Reader Methods
Non-generic overloads:
ReadText
- Parses csv text, returnsstring[]
ReadFile
- Parses csv file, returnsstring[]
Read
- Parses csv stream, returnsstring[]
ReadLines
- Parses csv lines, returnsstring[]
Generic overloads:
ReadText<T>
- Parses csv text, returnsT
ReadFile<T>
- Parses csv file, returnsT
Read<T>
- Parses csv stream, returnsT
ReadLines<T>
- Parses csv lines, returnsT
3.3. CSV to DataTable
Cinchoo ETL library offers helper method to convert any enumerable of objects to Datatable
. This conversion is helpful for binding CSV data to grid / other controls / process them in memory, etc.
The sample below shows how to convert CSV to datatable
.
Listing 3.3.1 Convert CSV to DataTable
using (var r = new ChoCSVLiteReader())
{
var recs = r.ReadFile<EmployeeRec>("emp.csv", true);
var dt = recs.AsDataTable();
}
3.4. CSV to DataReader
Cinchoo ETL library offers helper method to convert any enumerable of objects to DataReader
. This conversion helpful to process (export) large CSV file to any databases via ADO.NET/EF.
The sample below shows how to convert CSV to datareader
.
Listing 3.4.1 Convert CSV to DataReader
using (var r = new ChoCSVLiteReader())
{
var recs = r.ReadFile<EmployeeRec>("emp.csv", true);
var dr = recs.AsDataReader();
}
For more information about Cinchoo ETL, please visit the other CodeProject articles:
History
- 30th December, 2021: Initial version