Using Custom Attributes to Map Database Tables and Columns to Classes and Fields

Alaric Dailey

5.00/5 (2 votes)

Feb 16, 2010

CPOL

7 min read

34383

Building an O/R Mapper: Step 2.

Introduction

In my article on loading any object from any database, I made a rather large list of shortcomings of that code. In this article, I am going to fix a few of those by using custom attributes. By the end of this article, we will have fixed, or be able to fix, the following:

Being forced to use the poorly performing SELECT *
Inability to map table and column names that aren't valid variable or class names
Using slow performing properties
Inability to protect data

Background

In .NET, you have the ability to "decorate" your code, that is provide data about your code that you can get to through Reflection that will tell you more about your code. This sounds a little strange, but if you have ever used XML serialization, you have already seen decorations in code like this (which was borrowed from the MSDN website here):

public class Address
{
    // The XmlAttribute instructs the XmlSerializer to serialize the Name
    // field as an XML attribute instead of an XML element (the default
    // behavior).
    [XmlAttribute]
    public string Name;
    public string Line1;

    // Setting the IsNullable property to false instructs the 
    // XmlSerializer that the XML attribute will not appear if 
    // the City field is set to a null reference.
    [XmlElementAttribute(IsNullable = false)]
    public string City;
    public string State;
    public string Zip;
}

Of course, reusing those decorations wouldn't really be useful; we need need to create our own. The XML serialization code does exactly what we did in the step 1 article. It uses the variable and class names for nodes, and only needs the attributes for exceptions. I want to use my custom attributes to limit which fields are used. Defaulting the fields would leave me missing information later.

For classes, we need an attribute that will give us a minimum of two pieces of information:

The schema information, if there is any
The table name

For columns, we will need at least two pieces of information:

The name of the column in the table
The DbType of the column (for creating parameters, though I won't be touching on it in this article)

Since this article isn't about creating or reading custom attributes, I will let you research that on your own. You can find a couple of tutorials here and here.

I have chosen to enforce a couple of rules on my TableAttribute class, including only allowing the attribute on a class, not on a property or field, and only allowing a single instance of the attribute on the class. So, here is the code for this:

/// <summary>
/// Attribute for decorating classes, so they can be matched to DB tables.
/// </summary>
[AttributeUsage(AttributeTargets.Class, AllowMultiple = false)]
public sealed class TableAttribute : Attribute
{
    public string Schema { get; set; }
    public string Name { get; set; }
}

To avoid the inherent speed problem with using properties, and because I would like to be able to use properties to track changes, I chose to make the column attributes only valid for fields (instance variables). This gives me the ability to protect my data from accidental changes, because I can eliminate the 'set' portion of the property for columns that aren't allowed to be changed. Data protection is a good thing!

/// <summary>
/// Attribute for decorating fields,
/// so they can be matched to DB columns
/// </summary>
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public sealed class ColumnAttribute : Attribute
{
    /// <summary>
    /// Gets or sets the System.Data.DbType
    /// that will be used for the object for Parameters
    /// </summary>
    public DbType DbType
    {
        get;
        set;
    }
    /// <summary>
    /// Gets or sets the Name of the object
    /// </summary>
    public String Name
    {
         get;
         set;
    }
}

Note: For those who don't know there is something a little weird with creating your own attributes, you are always supposed to name them with the suffix Attribute, hence my two attributes are TableAttribute and ColumnAttribute. The weird part is that when you start using them to decorate your code, the "Attribute" portion of the name is dropped, so it will be Table and Column.

If we look at my article on passwords or my article on SQL Server indexed views, you couldn't use those tables with the code from my step 1 article. However, if we add the new attributes, the problematic table Vehicle can have its object look like this:

[Table(Schema = "dbo", Name = "Vehicle")]
public class Vehicle
{
    //private bool _isDirty;

    [Column(DbType = DbType.Guid, Name = "ID")] 
    private Guid _id;

    [Column(Name = "Vehicle Identification Number", 
     DbType = DbType.String)]
    private string _vehicleIndentificationNumber = string.Empty;

    public Guid ID
    {
        get { return _id; }
        set
        {
            if (value != _id)
            {
                //_isDirty = true;
                _id = value;
            }
        }
    }

    public string VIN
    {
        get { return _vehicleIndentificationNumber; }
        set
        {
            value = (value ?? string.Empty).Trim();
            if (string.Compare(value, _vehicleIndentificationNumber) != 0)
            {
                //_isDirty = true;
                _vehicleIndentificationNumber = value;
            }
        }
    }

    //public bool IsDirty
    //{
    //    get { return _isDirty; }
    //}
}

As you can see, I gave a hint as to how to implement tracking if the object is dirty. I also used a shorter name for a column, the more familiar VIN, rather than Vehicle Identification Number.

Now, to pull only the columns we need from our table, instead of using SELECT *. To do this, I will do the following:

Query all the fields, check if they have my ColumnAttribute, if they do, what the column name is.
Wrap the column names using the CommandBuilder function QuoteIdentifier.
Get the schema and table name from the TableAttribute and wrap them as well.

So, let's build the function to do just that.

private static BindingList<T> LoadObjectsFromDatabase<T>
       (string connectionString, string providerName) where T : new()
{
    //portion 1, get the table name and wrap it
    var factory = DbProviderFactories.GetFactory(providerName);
    var commandBuilder = factory.CreateCommandBuilder();
    Type type = typeof(T);
    string name = string.Empty;
    {
        var attributes = type.GetCustomAttributes(false);
        foreach (Attribute attribute in attributes)
        {
            TableAttribute ta = attribute as TableAttribute;
            if (ta != null)
            {

                if (!string.IsNullOrEmpty(ta.Schema))
                {
                    name = string.Format("{0}{1}",
                                  commandBuilder.QuoteIdentifier(ta.Schema),
                                  commandBuilder.SchemaSeparator
                        );
                }
                name += commandBuilder.QuoteIdentifier(ta.Name);
                break;
            }
        }
    }
    if (string.IsNullOrEmpty(name))
    {
        //could fall back to just the class name, but not for this example
        throw new
            Exception("Unable to build SQL because " + 
                      "custom attribute doesn't exist.");
    }

    //portion 2 - get the fields and their columns, append
    //the column names into the SQL statement as well as 
    //create the field map, so we don't have to look up the field 
    //every time.
    var fieldMap = new Dictionary<string, FieldInfo>();
    string comma = string.Empty;
    var sb = new StringBuilder("SELECT ");
    var fields = type.GetFields(BindingFlags.NonPublic| BindingFlags.Instance);
    foreach (var f in fields)
    {
        var attributes = f.GetCustomAttributes(false);
        foreach (Attribute attribute in attributes)
        {
            ColumnAttribute ca = attribute as ColumnAttribute;
            if (ca != null)
            {
                //create our fieldmap
                fieldMap.Add(ca.Name, f);

                //append a comma, if this is the first column, it will be empty.
                sb.Append(comma);
                //make sure additional columns will have comma seperators
                comma = ", ";
                sb.Append(commandBuilder.QuoteIdentifier(ca.Name));
            }
        }
    }
    if (fieldMap.Count < 1)
    {
        throw new Exception("No columns in query");
    }
    sb.AppendLine();
    sb.AppendLine("FROM");
    sb.Append(name);

    //portion 3 - create the connection, build execute reader, and pass
    //the new reader, ad its related field map to the LoadObjectsFromDatabase 
    //function
    using (var connection = factory.CreateConnection())
    {
        connection.ConnectionString = connectionString;
        connection.Open();
        using (var command = connection.CreateCommand())
        {
            command.CommandText = sb.ToString();
            command.CommandType = CommandType.Text;
            using (var dr = command.ExecuteReader(CommandBehavior.CloseConnection))
            {
                return LoadObjectsFromDataReader<T>(dr, fieldMap);
            }
        }
    }
}

This code is broken down in to three basic sections, which if I were building a full blown solution would be better organized for code reuse.

Portion 1

This section is dedicated to getting the custom attribute off the class, so we know what the table name is.

Portion 2

This is dedicated to getting the list of fields from the object and getting the custom attributes off them. I use those pieces of information to build a Select statement that selects only the columns we actually have fields into which to store the information. This is a huge benefit over SELECT * since it will execute faster, columns being reordered won't affect it, and neither will new columns in the table. I also store the column name and the field info for later use. This section finishes up by appending the FROM clause to the SQL statement. Note: Using SELECT * forces your database engine to expand the * into the list of column names. Thus, rebuilding this SQL statement every time you need it would cause you to lose the speed boost you get from your database engine for not using SELECT *. You should cache your freshly built statement, but I am not going to touch on that yet.

Portion 3

This should be reduced to just a few lines. First, a single call that executes the SQL statement. Then, passing the DbDataReader to the modified LoadObjectsFromDataReader function. As it stands, it uses the provider string and connection string to build a new connection to the database, making sure to set the CommandBehavior to CloseConnection; that way, the connection closes automatically when the data-reader no longer has any records, or is disposed.

Now, we need to modify the existing LoadObjectsFromDataReader function from the step 1 article.

private static BindingList<T> LoadObjectsFromDataReader<T>
       (IDataReader dr, Dictionary<string, 
        FieldInfo> fieldMap) where T : new()
{
    //get the type of the object, without having to create one
    BindingList<T> retval = new BindingList<T>();
 
    //data block
    object[] oo = new object[fieldMap.Count];
    while (dr.Read())
    {
        dr.GetValues(oo);
        //could be threaded block                
        T item = new T();
        int fieldIndex = -1;
        foreach (var kvp in fieldMap)
        {
            FieldInfo fi = kvp.Value;
            object o = oo[++fieldIndex];
            if (DBNull.Value.Equals(o))
            {
                o = null;
            }
            try
            {
                fi.SetValue(item, o);
            }
            catch
            {
                //eat data errors quietly
            }
        }
        retval.Add(item);
        //end of could be threaded block
    }
    return retval;

If you compare this to the original function, you see that I am no longer querying the object for a list of properties. Instead, I am using my map of column names to fields, and because I know that my DbDataReader and my fieldMap are in sync, I can simply iterate through them, filling the fields in as I go.

Using the Code

Using this code is easy. As I stated in the step 1 article, Linq2Objects is a great way to get data out of disparate data sources, so while this example is using only data from a single database, it doesn't matter; once they are loaded, we can query them all we like.

var vehicles = LoadObjectsFromDatabase<Vehicle>
             (connectionString, 
              providerName);

var vehicleLicensePlates =
    LoadObjectsFromDatabase<Vehicle_LicensePlate_Issuer>
             (connectionString,
             providerName);

var licenseIssuers = LoadObjectsFromDatabase<LicensePlate_Issuer>
             (connectionString, 
              providerName);

var licenses = LoadObjectsFromDatabase<LicensePlate>
             (connectionString,
              providerName);

//query the current licenseplate for the car with VIN "asdf"
//because "Current" is a boolean so a comparison isn't required
var query = from license in licenses
            join lincensePlateIssuer in licenseIssuers
            on license.ID equals lincensePlateIssuer.LicensePlateID
            join vehicleLicensePlate in vehicleLicensePlates
            on lincensePlateIssuer.ID equals vehicleLicensePlate.LicensePlate_Issuer_ID
            join vehicle in vehicles
            on vehicleLicensePlate.Vehicle_ID equals vehicle.ID
            where vehicle.VIN == "asdf" && vehicleLicensePlate.Current 
            select license;

This example simply loads a number of objects from my example database (the example database from that article contains data), and returns the one record that is meaningful for this query.

Points of Interest

This article addresses all of the shortcomings mentioned at the top, but I still haven't touched on the actual caching of queries, or using the custom attributes to create Insert or Update statements (though you could use the CommandBuilder to help with that).

The single largest glaring shortcoming of this code is that there is no way to limit your data to a subset of rows, in other words, I have no "WHERE" clause.

Before the band starts tuning up, let's look at some of the issues with this code. The DbDataReader and fieldMap could be out of sync, that should be checked and isn't. The LoadObjectsFromDatabase function should probably take just the connection name, rather than having to pass multiple values. The CommandBuilder.QuotePrefix and QuoteSuffix aren't always filled in, thus you have to make sure they are right before you use QuoteIdentifier. Not every database supports schemas, or if they do, they may not support them properly. In some cases, it is easier to ignore that; however, sometimes it is not an option. For example, if you are using a database that is designed like the new AdventureWorks demo database from Microsoft, schemas are absolutely required.

While there are still major shortcomings to this code, this code can now pull back any object from any database, and this includes the new types from SQL Server 2008. Obviously though, if you are using those database specific types, you won't be able to swap out database servers underneath your code. However, simply changing the connection string (and possibly the schema) will allow you to move your tables of data from one database to another.

Another great thing is, assuming you case your Name strings correctly, this code allows you to map a column to any variable name, it doesn't even have to be meaningful. Those characters like spaces and SQL keywords won't break your code.

History

2010-02-18: Fixed bug in LoadObjectsFromDataReader (changed field-index from post increment to pre-increment).
2010-02-17: Original article.