Click here to Skip to main content
Click here to Skip to main content

Implementing a generic CSV file importer using Reflection and attribute based programming in C#.NET

, 1 Aug 2007 CPOL
Rate this:
Please Sign up or sign in to vote.
The purpose of this article is to discuss the design and development of a generic CSV file importer using features like Reflection and attributes present in .NET. This article serves as an example on how Reflection and attribute based programming can be used to create very powerful constructs.

Attributes and Reflection

Simply put, an attribute is a mechanism by which it is possible to add metadata to various program elements. Reflection is a means by which this metadata can be accessed. The metadata information gets stored in the assembly.

Using these features, we shall strive to develop a generic base plate class structure that would allow us to import any type of CSV file. For this article, we shall be using a sample CSV file, Employees.csv, which has the following information in each line.

<Name>, <Location>, <Date of Birth>

Our class design allows us enough flexibility so that if even if this structure changes drastically, we would be able to maintain and incorporate the change with minimal effort. It should be simple to define and create, and easy to maintain. Added to this, basic validation support should be provided for the imported fields.

One class to define it all

And, how do we go about this? In a nutshell, we develop custom attributes and apply them to the properties in the Employee class we model, and then using Reflection, we enable ourselves to transform the file data into a list of Employee objects. First, let us have a peek at the final Employee class we will be using.

/// <summary>
/// Represents an Employee entity
/// </summary>

[ImportFile(FileType = ImportFileType.CSV)]
public class Employee
{
    #region Properties and fields

    public const int NAME_INDEX = 0;
    public const int ROLE_INDEX = 1;
    public const int DOB_INDEX = 2;

    private string _name;
    /// <summary>
    /// The name of the employee
    /// </summary>

    [ImportField(
        NAME_INDEX, 
        EnableTrimming=true, 
        EnableValidation=true, 
        ValidationPattern=@"^([ '-a-zA-z])+$")]
    public string Name
    {
        get { return _name; }
        set { _name = value; }
    }
        
    private string _role;
    /// <summary>
    /// The role of the employee
    /// </summary>        

    [ImportField(ROLE_INDEX)]
    public string Role
    {
        get { return _role; }
        set { _role = value; }
    }

    private DateTime _dob;
    /// <summary>
    /// Employee date of birth
    /// </summary>

    [ImportField(
        DOB_INDEX, 
        EnableTrimming=true, 
        DataType=DataType.DateTime)]
    public DateTime Dob
    {
        get { return _dob; }
        set { _dob = value; }
    }

    #endregion

    #region Ctor
 
    public Employee(string name)
    {
        _name = name;
    }

    public Employee()
    {
        _name = "";
        _role = "";
        _dob = new DateTime();
    }
 
    #endregion

    #region Methods

    public override string ToString()
    {
        return String.Format(
            "Name:'{0}'\nRole:'{1}'\nBorn:{2:dd-MM-yyyy}\n", 
            _name, _role, _dob);
    }

    #endregion
}

If you look at this class, you will notice that it has enough information within it (metadata) to completely describe the CSV file record it is trying to model. Let us examine this class in some detail. First, take a look at the Name property.

private string _name;
/// <summary>
/// The name of the employee
/// </summary>

[ImportField(
    NAME_INDEX, 
    EnableTrimming=true, 
    EnableValidation=true, 
    ValidationPattern=@"^([ '-a-zA-z])+$")]
public string Name
{
    get { return _name; }
    set { _name = value; }
}

It has an attribute ImportField, which has some parameters following it. The first one is NAME_INDEX, and corresponds to the position of this field in the import file. The next one, EnableTrimming, is set to true (indicating that the value in this field will be trimmed). The third parameter, EnableValidation, is set to true and a validation pattern Regular Expression is provided to validate the Name field. Thus, the ImportField attribute associated with the Name property adds on enough information for the field to be properly fetched from a CSV file. Using Reflection, this attribute will be used to perform the actual operation of loading the fields in the file for a record into an object of Employee.

Looking at this class, we can see how easy it is if a new field gets added to the CSV file. A new property has to be defined in the Employee class, with the proper ImportFieldAttribute assigned to it. If fields get swapped around, only the position parameter needs to get changed, which is really simple to do.

Getting to the root of it

Now, ImportField is a custom attribute that specifies that a corresponding property has an associated field in the CSV file, as well as specifies semantics about whether the data is to be trimmed on loading, whether it needs to be validated, and what is a valid format. ImportField is essentially implemented as a subclass of System.Attribute. (Note that the name of the class is ImportFieldAttribute and not ImportField; the compiler automatically appends Attribute if it does not find a definition for FieldImport.) This attribute class defines various properties that define a field in the CSV file.

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
public class ImportFieldAttribute : Attribute
{
    #region Properties and fields
    /// <summary>
    /// The position of the field
    /// </summary>

    private int _position;

    public int Position
    {
        get { return _position; }
        set { _position = value; }
    }

    private string _validationPattern;
    /// <summary>
    /// The regexp validation pattern for the field
    /// Validation happens only if EnableValidation is set to true
    /// </summary>

    public string ValidationPattern
    {
        get { return _validationPattern; }
        set { _validationPattern = value; }
    }

    private bool _enableValidation;
    /// <summary>
    /// Set to true if validation is required
    /// </summary>

    public bool EnableValidation
    {
        get { return _enableValidation; }
        set { _enableValidation = value; }
    }
 
    private bool _enableTrimming;
    /// <summary>
    /// Determines whether input should be trimmed
    /// </summary>

    public bool EnableTrimming
    {
        get { return _enableTrimming; }
        set { _enableTrimming = value; }
    }

    private DataType _dataType;
    /// <summary>
    /// The type of data for the field
    /// </summary>

    public DataType DataType
    {
        get { return _dataType; }
        set { _dataType = value; }
    }
        
    #endregion

    #region Ctor
    public ImportFieldAttribute(int position)
    {
        this._position = position;
        this.DataType = DataType.String;
    }
    #endregion
}

You will notice that various parameters that we have set before are actually the properties in the attribute class we have defined. However, the position parameter is passed in explicitly, as it is required because of the class constructor. All the other parameters are optional, and if specified, requires the property name to be used.

[ImportField(
NAME_INDEX, // required, and notice no property name

EnableTrimming=true, // optional, so provide the name also

EnableValidation=true, 
ValidationPattern=@"^([ '-a-zA-z])+$")]

Once we compile our Employee and ImportFieldAttribute classes, the complier automatically assigns the attributes to the properties in the Employee class. Later on, using Reflection, we will be able to get the extended information provided by these attributes for each of the properties in the Employee class.

Another key line to note is the very first line in the attribute class definition.

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false)] 

This line is an attribute stating that our custom attribute (ImportFieldAttribute) can only have properties as targets and multiple associations to the same property are not allowed. The elements to which an attribute is associated with is called a target; attributes can have many types of targets including classes, methods, parameters, fields, return values from methods, the entire assembly as a whole, and so on. More information can be found here.

Another attribute that was used on the Employee class, as you may have noticed, was the class-level attribute called ImportFile.

[ImportFile(FileType = ImportFileType.CSV)]

This is used to specify that the class contains data to be filled up from a CSV file. This is specified in this manner so that later on, when a new type of file needs to be imported, we can reuse much of our existing code and implement only very specific classes and/or methods to do the actual import. For the moment, the import file type can only be CSV.

That said, we are at a stage where we have a class that represents an Employee, which specifies how the employee data present in a CSV file relates to properties in the Employee file. All we need to do now is find a mechanism to do the actual import. Before we get into the core details of the implementation, note that we have a main class ImportFileManager that performs all the gritty details of the importing process. A sample snippet will, however, let you know how easy it becomes to process CSV files finally:

ImportFileManager<Employee> fileImporter = 
    new ImportFileManager<Employee>("Employees.csv");

List<Employee> list = fileImporter.Import();

foreach (Employee employee in list)
{
    Console.WriteLine(employee);
}

You create a new ImportFileManager object, passing along the name of the file we want to import, and since it is a generic class, the Employee type has to be specified along too. ImportFileManager will use Reflection to determine what type of file we are processing based on the generic type's attributes (in this case, our type is Employee, and we had used the ImportFile attribute on it to mark its data source as a CSV file; this information will be picked up by the ImportFileManager class to perform the appropriate import process).

The details

Now, let us proceed to get into more details about each of these classes.

public class ImportFileManager<EntityClass>
            where EntityClass : class, new()
{
    #region Properties and fields
    
    private EntityClass entity;
    private ImportFileAttribute importFileSettings;
    
    private FileImporter<EntityClass> _importer;
    /// <summary>

    /// The file importer

    /// </summary>

    public FileImporter<EntityClass> Importer
    {
        get { return _importer; }
        set { _importer = value; }
    }
   
    /// <summary>

    /// The name of the import file

    /// </summary>

    private string _fileName;
    public string FileName
    {
        get { return _fileName; }
        set { _fileName = value; }
    }

    #endregion

    #region Ctor

    public ImportFileManager(string fileName)
    {
        _fileName = fileName;
        entity = new EntityClass();

        // Get the import file attribute from the entity class

        importFileSettings = 
            ReflectionHelper.GetImportFileAttribute(entity);

        // Get the correct importer class instance

        // to import the file

        _importer = 
            FileImporterFactory<EntityClass>.
                CreateFileImporter(
                    _fileName, 
                    importFileSettings.FileType);
    }

    #endregion

    #region Methods

    public List<EntityClass> Import()
    {
        return _importer.Import();
    }

    #endregion
}

The ImportFileManager class listing is shown above. Of interest is the constructor, which does mainly two things:

  • Gets the ImportFile attribute settings we have specified for the generic type (Employee class, in our case). For this, it calls on a static method in the ReflectionHelper class.
  • /// <summary>
    /// Gets the import file attribute settings 
    /// that have been marked for a class
    /// </summary>
    /// <param name="entity">The object
    /// whose attribute will be returned</param>
    /// <returns></returns>
    
    public static ImportFileAttribute 
        GetImportFileAttribute(object entity)
    {
        object[] attributes = 
            entity.GetType().GetCustomAttributes(false);
        foreach (object attribute in attributes)
        {
            if (attribute is ImportFileAttribute)
            {
                return (ImportFileAttribute)attribute;
            }
        }
        return null;
    }
  • Next, it determines which FileImporter sub-class to use to perform the actual import.
  • // Get the correct importer class instance
    // to import the file
    
    _importer = 
        FileImporterFactory<EntityClass>.
            CreateFileImporter(
                _fileName, 
                importFileSettings.FileType);

The listing for FileImporterFactory is shown below:

class FileImporterFactory<EntityClass>
        where EntityClass : class, new()
{
    #region Methods

    public static FileImporter<EntityClass> 
        CreateFileImporter(
            string fileName, 
            ImportFileType fileType)
    {
        switch (fileType)
        {
            case ImportFileType.CSV:
                return 
                    new CsvFileImporter<EntityClass>(fileName);
            default:
                throw 
                    new ArgumentException(
                        "The import file type is not supported."
                        );
        }
    }

    #endregion
}

The purpose of the FileImporterFactory class is to determine what type of import mechanism we need to use, and this decision is based on the file type enumeration that was specified in the ImportFile attribute of the Employee class.

You may argue that all this abstraction adds a great deal of complexity and overhead and could lead to performance degradation. I agree. When compared to a more direct approach of opening the file and processing directly, the method and approach that is defined and presented here will be slow. However, the gains in terms of outright flexibility, ease of use, and maintainability would be significant. Finally, it all boils down to the actual performance requirements and implementation details, and a trade-off will have to be taken.

The FileImporter class is listed below.

public abstract class FileImporter<EntityClass>
        where EntityClass : class, new()
{
    #region Properties and fields
    
    private string _fileName;
    /// <summary>
    /// The file name
    /// </summary>

    public string FileName
    {
        get { return _fileName; }
        set { _fileName = value; }
    }

    private List<string> _errorRecords = 
        new List<string>();
    /// <summary>
    /// The list of failed records
    /// </summary>

    public List<string> ErrorRecords
    {
        get { return _errorRecords; }
        set { _errorRecords = value; }
    }

    /// <summary>
    /// Determines if an import was successful
    /// </summary>

    public bool ImportSuccess
    {
        get
        {
            return (_errorRecords.Count == 0);
        }
    }

    #endregion

    #region Ctor

    public FileImporter(string fileName)
    {
        this._fileName = fileName;
    }
 
    #endregion

    #region Methods

    public abstract List<EntityClass> Import();

    #endregion
}

A concrete implementation of FileImporter in the form of CsvFileImporter is listed below:

class CsvFileImporter<EntityClass> : FileImporter<EntityClass>
        where EntityClass : class, new()
{
    #region Properties and fields

    private ImportFileAttribute importFileSettings;

    #endregion

    #region Ctor

    public CsvFileImporter(string fileName)
        : base(fileName)
    {
        EntityClass fileRecord = new EntityClass();
        importFileSettings = 
            ReflectionHelper.GetImportFileAttribute(fileRecord);
    }

    #endregion

    #region Methods
    /// <summary>
    /// Imports the CSV record and returns a list of objects
    /// </summary>
    /// <returns></returns>

    public override List<EntityClass> Import()
    {
        string recordData;
        string[] dataElements;
        EntityClass fileRecord;
        List<EntityClass> theList = new List<EntityClass>();

        StreamReader streamReader = File.OpenText(base.FileName);

        while (!streamReader.EndOfStream)
        {
            // Read in a line of the record data

            recordData = streamReader.ReadLine();

            try
            {
                // Split the record data based on the column delimiter

                dataElements = 
                    recordData.Split(
                    importFileSettings.FieldDelimiter.ToCharArray());

                // Populate the record data elements into the object
                // and add it to the list

                fileRecord = new EntityClass();

                // For every data elements we find

                for (int i = 0; i < dataElements.Length; i++)
                {
                    ReflectionHelper.SetPropertyValue(
                        fileRecord, 
                        dataElements[i], 
                        i);
                }
                theList.Add(fileRecord);
            }
            catch (FieldValidationException)
            {
                ErrorRecords.Add(recordData);
            }
        }

        streamReader.Close();

        return theList;
    }

    #endregion

}

And finally, the RefectionHelper class, which provides all Reflection based functionalities:

public class ReflectionHelper
{
    /// <summary>
    /// Gets the import file attribute settings 
    /// that have been marked for a class
    /// </summary>
    /// <param name="entity">The object
    /// whose attribute will be returned</param>
    /// <returns></returns>

    public static ImportFileAttribute GetImportFileAttribute(
        object entity)
    {
        object[] attributes = 
            entity.GetType().GetCustomAttributes(false);
        foreach (object attribute in attributes)
        {
            if (attribute is ImportFileAttribute)
            {
                return (ImportFileAttribute)attribute;
            }
        }
        return null;
    }

    public static void SetPropertyValue(
        object entity, 
        object value, 
        int fieldIndex)
    {
        object[] attributes;

        // Search the properties for the correct position and fill the 

        // appropriate value

        foreach (PropertyInfo property in 
                                      entity.GetType().GetProperties())
        {
            attributes = property.GetCustomAttributes(
                typeof(ImportFieldAttribute), false);
            foreach (object attribute in attributes)
            {
                ImportFieldAttribute field = 
                    (ImportFieldAttribute)attribute;
                if (field.Position == fieldIndex)
                {                        
                    if (IsFieldValueValid(field, value))
                    {
                        value = PrepareFieldValue(
                            field, 
                            property, 
                            value);
                        property.SetValue(entity, value, null);
                    }
                    else
                    {
                        throw new FieldValidationException(
                            string.Format(
                            "Validation of field '{0}' failed, value "+
                            "'{1}' should match pattern '{2}'", 
                            property.Name, 
                            value, 
                            field.ValidationPattern)
                            );
                    }
                }
            }
        }
    }

    /// <summary>
    /// Determines if a field value to be populated to a 
    /// field is valid or not
    /// </summary>
    /// <param name="field"></param>
    /// <param name="value"></param>
    /// <returns></returns>

    public static bool IsFieldValueValid(
        ImportFieldAttribute field, 
        object value)
    {
        if (field.EnableValidation && 
            field.ValidationPattern != null && 
            field.ValidationPattern.Length > 0)
        {
            if (Regex.IsMatch((string)value, field.ValidationPattern))
            {
                return true;
            }
            else
            {
                return false;
            }
        }
        return true;
    }
        
    /// <summary> 
    /// Sets up the value object for setting to the property
    /// </summary> 
    /// <param name="field"></param> 
    /// <param name="value"></param> 
    /// <returns></returns> 

    public static object PrepareFieldValue(
        ImportFieldAttribute field, 
        PropertyInfo property, 
        object value) 
    { 
        if (field.EnableTrimming) 
        { 
            value = ((string)value).Trim(); 
        } 

        // Try to convert the input string value to the proper type 
        // of the data, only if data type is not string 

        if (value is IConvertible && field.DataType!=DataType.String) 
        { 
            value = Convert.ChangeType(value, property.PropertyType); 
        } 
        else 
        { 
            // Custom conversion types

        } 
        
        return value;
    }
}

We have almost come to the end of this article. Along the way, we looked at a generic design for importing CSV files. Using features like Reflection, attribute-based programming, generic types, and abstract classes, we have seen how to develop a flexible solution. The idea presented in this article is in no way perfect, and there are elements of design that could definitely take a re-look. That said, I hope I have conveyed the bigger picture of trying to design classes that allow flexible usage and adapts to various scenarios. There may be a better way to implement the same, so do leave a comment on what you feel and how you can improve upon this. Download the sample files and try them out, and come up with your own wacky ideas and design paradigms.

The actual import logic for CSV files is implemented in the CsvFileImporter class which is a derived class of the abstract FileImporter class. FileImporter simply defines an abstract method called Import which should be implemented by any derived class (and therefore, CsvFileImporter implements the same). Using this approach, we need not worry about the implementation details about which class is to be utilised and what logic is used for performing the import. All this is abstracted from the caller, and the only knowledge required is that by calling the Import method, data will get imported as model objects automatically. Thus, by invoking the Import method in the FileImportManager class, we end up with a list of Employee objects that are present in the CSV file.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Benzi K. Ahamed
Web Developer
United Kingdom United Kingdom
I work as a Technology Lead for an IT services company based in India.
 
Passions include programming methodologies, compiler theory, cartooning and calligraphy.

Comments and Discussions

 
GeneralNice, but... PinmemberWilliamSauron26-Sep-08 23:28 
GeneralPerformance PinmemberMarc Gravell6-Aug-07 23:46 
GeneralRe: Performance PinmemberBenzi K. Ahamed8-Aug-07 6:27 
Generalgreat minds think alike Pinmemberxenolinguist2-Aug-07 5:29 
GeneralRe: great minds think alike PinmemberBenzi K. Ahamed2-Aug-07 9:30 
GeneralRe: great minds think alike Pinmemberxenolinguist2-Aug-07 12:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.141015.1 | Last Updated 1 Aug 2007
Article Copyright 2007 by Benzi K. Ahamed
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid