Click here to Skip to main content
15,868,016 members
Articles / Programming Languages / C#

CSV Serializer for .NET

Rate me:
Please Sign up or sign in to vote.
4.65/5 (10 votes)
10 May 2013CPOL7 min read 105.5K   2.9K   40   29
General solution to Serializing and Deserializing CSV files.

Introduction

Where is a general CSV serializer when you need one? Not there, right?

This article builds a general CSV serialization solution.

Design

We would expect such a serializer to deliver the following abilities:

  1. Serialize lists of a specified Type to a flat CSV file.
  2. Reconstitute objects from the same (or a similarly constructed) file.
  3. Mimic the usage of the .NET XML Serializer, or Binary Formatter, in writing-to and reading-from a stream object.
  4. Nominate properties to ignore from the target Type to be serialized using a CsvIgnore attribute.
  5. Allow the user to specify a delimiter character other than the default comma character.

Concerns

The following are concerns that represent risks and may not be apparent at the outset:

  1. The separator character may appear in the value of a property being serialized. This will cause the resulting row to appear to have more columns than expected.
  2. Likewise, a NewLine may appear in the value of a property. This will cause the resulting row to span multiple rows.

Both of these problems will render the row (if not the entire CSV file) unreadable, and so must be dealt with.

We solve both these problems with the same approach: replace the offending character sequence with some User-Specified string. On serialization each occurrence of the problem characters is to be replaced with a dummy character, and then on deserialization, the original character replaces the dummy characters, and the original string is restored.

The user needs to be aware that in each of these cases the replacement string must not possibly appear in the normal data to be serialized. Otherwise deserialization will replace valid occurrences of the replacement string with either the Separator or NewLine characters.

CSV Serialization

Given that CSV is a flat-file structure, we do not have to consider deep object serialization. Thus, associations will be rendered as a single text value.

The CSV structure also implies that the data to be serialized should be a list of objects, so that the Serialization method is to receive an IList of the target Type. The Deserialization method will return such a list.

The Class

From Design Requirements 1, 2, 3, and 5, and the discussion so far, we can define some of the main requirements for the class.

The class will:

  • Expose a delimiter character (comma by default).
  • Expose a replacement string
  • Reference the target Type to be serialized/deserialized by the generic parameter T.
  • Build and maintain a list of PropertyInfo for the properties to be serialized/deserialized.
  • Expose a Serialization method that accepts a Stream object and an IList of data.
  • Expose a Deserialization method that accepts a Stream object and returns an IList of data.

We can now define this rough outline as follows:

C#
public class CsvSerializer<T> where T : class, new()
{
   public char Separator { get; set; }

   public string Replacement { get; set; }

   private List<PropertyInfo> _properties;   

   public void Serialize(Stream stream, IList<T> data) { }

   public IList<T> Deserialize(Stream stream) { } 

   .
   .
   .
}

List of PropertyInfo

The first step carried-out by the class will be to assemble a list of properties that are to be rendered to CSV (in the form of a List of PropertyInfo). This step will be executed on construction.

This list of properties is used for three purposes:

  1. To create the CSV header.
  2. To get instance values by reflection on serialization.
  3. To set instance values by reflection on deserialization.
C#
public CsvSerializer()
{
   var type = typeof(T);

   var properties = type.GetProperties(BindingFlags.Public | BindingFlags.Instance 
		| BindingFlags.GetProperty | BindingFlags.SetProperty);

   _properties = (from a in properties
                  where a.GetCustomAttribute<CsvIgnoreAttribute>() == null
                  orderby a.Name
                  select a).ToList();
}

CsvIgnoreAttribute

As with XmlSerializer, we want to be able to nominate properties to exclude from serialization by decorating the property with a CsvIgnore attribute.

The CsvIgnore attribute is defined simply as follows:

C#
public class CsvIgnoreAttribute : Attribute    { }

CSV Header

The header row for the file will be a comma-separated string of the names of the properties. The list of properties will already be sorted by name, so we can be confident that the column order will match the order of values when it comes time to render each row.

We create a private GetHeader method that returns the header row as a string:

C#
private string GetHeader()
{
    var columns = Properties.Select(a => a.Name).ToArray();
    var header = string.Join(Separator.ToString(), columns);
    return header;
}

The Serialization Method

Once the CSV header row is created, the serializer iterates through the enumerable list of data items, and for each item retrieves the values for all properties in the list of PropertyInfo (discussed above). The values are added to a string array which is finally joined by the separator character.

The serialization method itself accepts a stream object and an IList of objects of Type T.

After iterating over the list and generating each row, it uses a stream writer to write the final CSV text to the stream object.

C#
public void Serialize(Stream stream, IList<T> data)
{
    var sb = new StringBuilder();
    var values = new List<string>();

    sb.AppendLine(GetHeader());

    var row = 1;
    foreach (var item in data)
    {
        values.Clear();

        foreach (var p in _properties)
        {
            var raw = p.GetValue(item);
            var value = raw == null ? 
                        "" :
                        raw.ToString().Replace(Separator.ToString(), Replacement);
            values.Add(value);
        }
        sb.AppendLine(string.Join(Separator.ToString(), values.ToArray()));
    }

    using (var sw = new StreamWriter(stream))
    {
        sw.Write(sb.ToString().Trim());
    }
}

The Deserialization Method

The deserialization method accepts a stream object that represents a CSV text file.

We read the first line as the CSV header row. The header is split with the separator character, then stored in a columns array, and later indexed to reference the properties by name.

The remaining text is split by NewLine and stored in 'rows'. Each row is then split using the separator character. The resulting parts then represent the values to be deserialized to a new instance of the Type T. The order of these values will appear in the same order as the columns array so that we can retrieve the column name by index, and use the column name to retrieve the corresponding PropertyInfo from the PropertyInfo list.

Each string part can then be converted to the correct value by using the .NET TypeConverter for the property and stored in the new instance of the target object using the PropertyInfo SetValue method.

C#
public IList<T> Deserialize(Stream stream)
{
    string[] columns;
    string[] rows;

    try
    {
        using (var sr = new StreamReader(stream))
        {
            columns = sr.ReadLine().Split(Separator);
            rows = sr.ReadToEnd().Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
        }
    }
    catch (Exception ex)
    {
        throw new InvalidCsvFormatException(
                "The CSV File is Invalid. See Inner Exception for more inoformation.", ex);
    }

    var data = new List<T>();
    for (int row = 0; row < rows.Length; row++)
    {
        var line = rows[row];
        if (string.IsNullOrWhiteSpace(line))
        {
            throw new InvalidCsvFormatException(string.Format(
                    @"Error: Empty line at line number: {0}", row));
        }

        var parts = line.Split(Separator);

        var datum = new T();
        for (int i = 0; i < parts.Length; i++)
        {
            var value = parts[i];
            var column = columns[i];

            value = value.Replace(Replacement, Separator.ToString());

            var p = _properties.First(a => a.Name == column);

            var converter = TypeDescriptor.GetConverter(p.PropertyType);
            var convertedvalue = converter.ConvertFrom(value);

            p.SetValue(datum, convertedvalue);
        }
        data.Add(datum);
    }
    return data;
}

Further Considerations and Enhancements

Here we discuss some useful settings that can be used to customize serialization for specific situations.

Syntactical and Functional Options

  • Separator
        The CSV Separator character
        default: comma
  • Replacement
        Replacement string if the Separator character appears in a field value.
        default:   (char)255
  • NewLineReplacement
        Replacement string if a NewLine appears in a field value
        default:   (char)254
  • UseLineNumbers
        An additional column can be inserted into the CSV for Row Number
        default: true
  • UseEofLiteral
        An EOF literal can be used to mark the end-of-file
        default: false
  • IgnoreEmptyLines
        If not true then exceptions will be thrown on encountering blank lines during deserialization
        default: true
  • IgnoreReferenceTypesExceptString
        Exclude reference type properties other than string.
        (default: true).
  • RowNumberColumnTitle
        Title of the Row-Number column if UseLineNumbers is true
        default: "RowNumber"
  • UseTextQualifier
    Place double inverted commas around all values. Default is false.

The Separator and Replacement options are already considered by exposing the Separator and Replacement properties, so that the consumer is able to use any other character for the delimiter and set the replacement string.

It is important that the consumer understand the range of values for the types being serialized such that they specify a replacement string that cannot occur in the data range, otherwise valid instances of the replacement string appearing in normal data will be erroneously converted to the separator character on deserialization.

RowNumberColumnTitle will work in conjunction with the UseLineNumbers option and introduce an additional first-column into the CSV file containing the row number.

It was found that CsvIgnore was consistently applied to Reference Types such that Reference Types make little sense in most CSV serialization contexts. This statement does not apply to the String type however. So it appeared to be sensible to include this property to allow the automatic exclusion of all Reference Types other than String.

It was specified in the requirements that prompted the creating of this class, that the last line of the CSV file contain the string literal "EOF". This is not a common CSV practice and so an option is introduced to allow for this possibility. The implementation simply adds an additional line to the CSV text during serialization. And allows for the possibility of an EOF line during deserialization.

The IgnoreEmptyLines option allows the deserializer to ignore empty or corrupt lines of text. Otherwise an InvalidCsvFormatException is thrown.

Usage

As with the XmlSerializer, it is a requirement that the target Type exposes a default paramaterless constructor.

Usage is then a two stage process. Firstly, create a CsvSerializer instance of the target Type. Then secondly, call either the Serialization or Deserialization methods with a valid stream object.

The Serialization method will of course require the list of items to be serialized:

C#
var data = GetPeople();
using (var stream = new FileStream("persons.csv", FileMode.Create, FileAccess.Write))
{    
    var cs = new CsvSerializer<Person>();
    cs.Serialize(stream, data);
}

While the Deserialization method will return a list of objects of the target Type:

C#
IList<Person> data = null;
using (var stream = new FileStream("persons.csv", FileMode.Open, FileAccess.Read))
{
    var cs = new CsvSerializer<Person>();
    data = cs.Deserialize(stream);
}

Note

Although the code provided here is geared towards a general solution it was prompted by the criteria for a particular use-case.

Please test and modify the code for other particular cases not considered here.

Further Work

Here is a list of ideas indicating possible further work and extension.

  • Performance: the solution may not be sufficiently optimized for speed so that further work will be required to address any performance issues.
  • ColumnAttribute: Introduce a Column attribute to name columns other than by property names.
  • IgnoreList: Rather than decorate properties with the CsvIgnore attribute, it may be desirable, rather, to reference a list of property names to ignore.
  • Use a character array to specify a list of separator characters for deserialization where the stream may contain various delimiters.

Updates

  • Add UseTextQualifier Property, by user Gogowater.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Axiom Manifold
Australia Australia
codes
is vegetarian
likes cats

Comments and Discussions

 
QuestionWorks great Pin
John Holliday, Jr.7-Feb-20 11:38
John Holliday, Jr.7-Feb-20 11:38 
QuestionHandling Line Break in CSV Pin
MickReid5-May-15 2:06
MickReid5-May-15 2:06 
AnswerRe: Handling Line Break in CSV Pin
CognitiveFeedback24-May-15 11:05
CognitiveFeedback24-May-15 11:05 
QuestionGetCustomAttribute Pin
edokt30-Dec-14 18:15
edokt30-Dec-14 18:15 
AnswerRe: GetCustomAttribute Pin
CognitiveFeedback11-Jan-15 23:52
CognitiveFeedback11-Jan-15 23:52 
GeneralRe: GetCustomAttribute Pin
yeorote25-Jan-15 17:49
yeorote25-Jan-15 17:49 
GeneralRe: GetCustomAttribute Pin
CognitiveFeedback25-Jan-15 19:27
CognitiveFeedback25-Jan-15 19:27 
QuestionA big help! Pin
DRH_K5-Nov-14 21:26
DRH_K5-Nov-14 21:26 
AnswerRe: A big help! Pin
CognitiveFeedback10-Nov-14 9:04
CognitiveFeedback10-Nov-14 9:04 
QuestionLoss of data Pin
mortensp12-Mar-14 23:34
mortensp12-Mar-14 23:34 
AnswerRe: Loss of data Pin
CognitiveFeedback29-Jul-14 14:33
CognitiveFeedback29-Jul-14 14:33 
QuestionGreat job! Pin
johngaffey12-Dec-13 4:54
johngaffey12-Dec-13 4:54 
AnswerRe: Great job! Pin
CognitiveFeedback29-Jul-14 14:34
CognitiveFeedback29-Jul-14 14:34 
QuestionOrder Class Properties Pin
Cristian Campos27-Aug-13 10:21
Cristian Campos27-Aug-13 10:21 
AnswerRe: Order Class Properties Pin
CognitiveFeedback20-Sep-13 14:43
CognitiveFeedback20-Sep-13 14:43 
QuestionConverted to class library, .Net 4.0 Support, TextQualifier support, xml comments, optimisations Pin
blueacorn4-Jul-13 15:30
professionalblueacorn4-Jul-13 15:30 
AnswerRe: Converted to class library, .Net 4.0 Support, TextQualifier support, xml comments, optimisations Pin
CognitiveFeedback11-Jul-13 18:38
CognitiveFeedback11-Jul-13 18:38 
QuestionA couple of suggestions. Pin
George Swan12-May-13 1:38
mveGeorge Swan12-May-13 1:38 
AnswerRe: A couple of suggestions. Pin
CognitiveFeedback12-May-13 21:33
CognitiveFeedback12-May-13 21:33 
General.NET 4.0 compatible and add TextQualifier property Pin
gogowater10-May-13 12:37
gogowater10-May-13 12:37 
GeneralRe: .NET 4.0 compatible and add TextQualifier property Pin
CognitiveFeedback10-May-13 14:37
CognitiveFeedback10-May-13 14:37 
SuggestionReflection Pin
Andrew Rissing26-Mar-13 11:48
Andrew Rissing26-Mar-13 11:48 
GeneralRe: Reflection Pin
CognitiveFeedback26-Mar-13 12:36
CognitiveFeedback26-Mar-13 12:36 
That's an awesome feature Andrew!

Thanks...

... at the moment it's serializing and deserializing over 10,000 rows per second on my machine,

modified 26-Mar-13 18:58pm.

QuestionQuoted Fields? Pin
Member 379640325-Mar-13 21:04
Member 379640325-Mar-13 21:04 
AnswerRe: Quoted Fields? Pin
CognitiveFeedback25-Mar-13 21:28
CognitiveFeedback25-Mar-13 21:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.