Last year I published the article ADO+.NET. In it I presented a library that in my opinion can "replace" ADO.NET.
In fact such library uses ADO.NET but it doesn't expose it to the users. We can say that this is very similar to what WinForms do compared to the Windows API (the user32 library). Yet Nicolas Dorier told me that ADO+.NET shouldn't try to replace ADO.NET, that it should add to it.
Well, part of me still disagrees because my purpose was to hide those methods and properties that cause confusion but I do understand that there are actually too many projects that use ADO.NET directly, so making something that simply makes it easier to access data the right way without replacing the ADO.NET components is a good thing.
So this time I am presenting a very small library meant to make it easier to actually read values with a data reader and generate a filled record in a very fast and yet configurable manner.
What does this library really do?
This library starts with a very simple purpose: Filling a .NET object by using an
IDataReader row easily, without manual code to do the right GetSomething call, without manual conversions for datatypes mismatches and with a great speed.
And on top of this it gives syntactic sugar methods that allows users to easily enumerate a database command receiving instances of a given type or to create a list with those results.
And to make things even better, the entire code has expansion points that allow you to tell how to process a given database column (so you can decide to call some specific method to read that column) or if you don't want to go that far you have other extension points that allow you to tell how to find a field or property by the database column or how to read a database column considering its actual database type. Or, maybe simpler, to tell how a needed conversion should be done.
Using the code
Before talking about performance, explaining how it works internally or explaining how to completely reconfigure this solution, let's understand how to use it for the basic situations.
To start (probably during the application initialization) you need to configure the library with its default values (or maybe with different values, but we will see that later) so, to do that, you must call:
Then, at any moment when you have an
IDbCommand already filled and being at the point of calling
ExecuteReader() you can use one of those methods:
Enumerate<T>() - This is an extension method that will call the
ExecuteReader() and will create a fast delegate to do the reads of each line.
As an Enumerable, you are free to use constructs like
foreach and even LINQ methods (methods like
SingleOrDefault() are good methods as they don't require to read all the records). But if you require methods that load the entire contents in memory, it is preferrably to use the
ToList<T>() - Well, this is very similar to the
Enumerate() method, but instead of only reading records when requested (enumerated) it will load all the records at once. If you actually want to have all the records in memory this method is preferred over the previous one, but it is not indicated for large batches or if you plan to call methods like
Single() etc, as it is useless to read all the records when you only want the first one.
EnumerateColumn<T>(columnIndex) - It is not uncommon to create a query that will read a single column from the database. In this case, why should we enumerate records that have a single property? Why not enumerate the columns directly?
This is what the
EnumerateColumn() does. I was unsure if I should call it EnumerateField or EnumerateColumn, but even if the
IDataReader has a
GetFieldType, I decided it was better to use the "columns" name.
When enumerating a command to read a column we generally read the single column the query is actually selecting, so the default value for the
columnIndex is zero, yet we are free to read another column index if we really want to select more than one column and decide to ignore all the others.
ColumnToList<T>(columnIndex) - As happens with the
ToList() methods, sometimes we may prefer to put all data into memory instead of iterating it. So, the
ColumnToList() method will read all the lines resulting from a command and will put those results into a single list.
The library organisation
Personally I like to have one type per file. I am even extremist as I usually put a delegate declaration, which is a single line of code, into its own file.
But I am lately "playing" a little with different ways of presenting codes and libraries and one of the things I know is that many users prefer to have a single file they can add in their projects instead of having many files to add or, even worse, having to add a new project to their solutions and/or a DLL reference. So I decided to divide things into only two files:
- DbReaderGeneratorDefinition.cs: This file contains all the interfaces, delegates and the "Default" definitions without actually having any implementation. The purpose of this design is to allow you to replace the default implementation if you decide to write your own implementation without having to rewrite code that depends on this library. This is very similar to the design I presented in the article Architecture of a Remoting Framework.
- DbReaderGeneratorDefaultImplementation.cs: This file is the actual implementation of the DbReaderGenerator. Well, at least the default one for all the interfaces and extension points. I really believe that in your own projects you will try to replace at least one of the classes presented in this file.
Well... actually there's a third file. The DbReaderExtensions.cs is where the methods that I presented in the Using the code topic are found. I was really unsure where I should put the class with those methods. Considering it is a default implementation, I though about putting it in the default implementation file. But considering it is able to work with any configuration, I though I should put it in the definition file... in the end I put it into its own file, but I still hope there aren't too many files in the solution, and you can always merge the contents into a single file if you really want to.
I always hear (and read) people saying that performance is not important. Yet in almost every place I work the main issue is performance. And the common points where performance is a problem are:
- Database access.
And considering we are not dealing with communication in this article (and databases have their own optimized way of communication) there's only one thing to try to improve: The database access.
Actually database accesses using the ADO.NET directly suffer from these things (among others):
- If using
DataSets all the value-type columns (int, booleans, tiny ints, chars etc) are boxed, which occupies much more memory and also always requires a cast when reading the data;
- If using data readers, considering that some databases change the actual data-type, it is not an uncommon practice to read data using the
GetValue() method (which also does boxing) and then use a method from the
Convert class to actually receive the data with the right type;
- Before the existence of
yield return it was hard to write enumerators that read one record at a time and so people used to read all the data and put it into lists. Now, even with the existence of
yield return, it is a common practice to continue to read all data and put it into a list (maybe because people is simply following the old "standard") and this can be very problematic for large batches.
So my purpose was to help users to avoid all those problems. Actually the code generates a single delegate at run-time to read the entire row filling an already existing object, so we can say that reading an entire line only adds a single virtual method call compared to the code done by hand.
But, differently from what happens when users do the code by hand, it is actually capable of analysing the types of the database and using the right
GetSomething method. Note that for
SQL Server to read a
char you actually can't use the
GetChar() method, you must read it as a
string and then get the first character, but that's not the case for other databases.
So, with the code generated at run-time you can benefit from the versatility of using the right access method for different databases without a performance hit and without having to write a different code for every database. And as an extra benefit you can avoid writing lots of repetitive code even if you don't use different databases, avoiding bugs caused by copy/paste and guaranteeing that the good pattern is always used.
My only performance comparison was with Dapper reading thousands of records, many, many times. When Dapper was taking 1.9 seconds this solution was taking 1.2 seconds. Yet I did it only to be sure that it is fast, as Dapper is more complete as it helps you fill parameters while this solution is more complete in respect to user data-types and configuration (in fact, I plan to present an article on the solution to fill the query parameters as another independent solution that can be combined with this one).
I already presented a list of extensions methods that you can use on with your
IDbCommands. But if you really care about performance there's one extra method, called
This method always returns the same instance for all the database rows (and it can even receive such instance as parameter). Its purpose is to be used in
foreach blocks considering the record is not going to be used outside such blocks, so it avoids the cost of creating a new instance at every database row, effectively becoming faster and also alleviating the pressure over the garbage collector.
But, as it always return the same instance, you can't use methods that expect different instances to be returned so, for example, it is useless to use LINQ methods like
ToArray(), as the returned array will have the right length but will be filled with only one instance (which will contain the values of the last record read).
So, use this method if you want the maximum performance, but use it with caution.
In this library everything starts with the
IDbReaderGenerator. When we do an
Enumerate() call or a
ToList() call what actually happens is that those methods will call an
ExecuteReader(), will ask to generate a record filler using the
IDbReaderGenerator and, well, they will iterate through all lines, calling the generated delegate and either
yield returning the records or adding them to a list.
So we can say that everything starts from the
But the default implementation will, in fact, do 2 things:
- Will decorate an inner solution with a cache of the results, so new executions of the same query will not lose time generating the delegate again;
- Will compile a delegate from an expression generated by the
So, our user start point is not the architecture start point. The architecture start point is the
This is the real heart of this library. The
IDbReaderExpressionGenerator is responsible for generating a single expression that represents the appropriate call. In fact there are two kinds of expressions that can be generated:
- Filler: A "filler" expression has the purpose of filling an already existing instance, so it receives such instance as input instead of generating a new result at each call. This has the advantage that allows users to avoid creating new records if they only want to keep a single record at a time in memory;
- Column Reader: Expressions of this type are optimized to read a single column, so they return that column value directly instead of filling a record instance.
The Extension Points
Actually the first extension points to the
DbReaderGenerator and the
DbReaderExpressionGenerator are the interfaces. As they start as interfaces their implementation can be completely replaced or decorated.
As already explained the default
DbReaderGenerator is in fact a decorator that cache the results generated by an implementation that redirects to the
DbReaderExpressionGenerator. So, let's see the extension points that are available for the
The default implementation of the
DbReaderExpressionGenerator has 2 constructors. One of them receives a delegate to generate expression to:
- Access the database column;
- Access the .NET member (field or property);
- Make a conversion from the database type to the .NET member type (if needed).
Note that generating the member access expression is the easiest one and it is actually implemented with this code:
(command, reader, columnIndex, instanceVariable) =>
string memberName = reader.GetName(columnIndex);
var member = instanceVariable.Type.GetMember(memberName);
if (member == null || member.Length != 1)
throw new InvalidOperationException("Can't find a single member named: " + memberName);
return Expression.MakeMemberAccess(instanceVariable, member);
And this is probably the code that you may want to replace if you have a different naming rule.
If you see this code, the name of the database column is used to find a .NET member with the same name (so, a field or property with the same name).
But I know many situations where users want to put different names in their properties, be it by using attributes or another application specific rule. So, it is enough to replace this delegate with one that finds the member with a different rule and everything will be done.
Also note that if the query has a column name that's not found an exception is thrown. Actually, if a null expression is returned it will simply ignore such database column but it will still be able to fill the object with the other columns, so those are possible extensions that you may want to try.
The other two parameters (readColumnGenerator and conversionGenerator)
The default implementation of the
readColumnGenerator will try to discover if the column type, as seen in the database, has an equivalent GetColumnType method on the reader.
That is, an
Int32 will use the
GetInt32() method, a
String will use the
GetString() method etc. If there is, it will use such a method to do the read. It doesn't care if this is not the type of the destination field or property as that's the responsibility of the conversion generator.
And so, if those types don't match, it is the responsibility of the conversion generator to generate the appropriate conversion call. And well, the default implementation uses the
Convert class, trying to find an appropriate ToSomeType or, if one is not available, using the
ChangeType() method. I plan to present another article showing how you can use a really expandable solution for the data type conversions to support all kinds of conversions (similar to the Expandable IoC Container, but for the conversion expressions).
What if the database column is not expected to set a field or property?
In some situations it is possible that a column in the database is not directly reflected as a field or property set. If this is the case, you can use the alternative constructor for the DbReaderExpressionGenerator. Such alternative constructor still needs the
ReadColumnGenerator and a ConversionGenerator(which are used by the
GenerateColumnReader method) but instead of trying to build the path (reading the database column and setting the field/property), it calls a delegate to do that. This means that you can actually read a column and call a method, if that's appropriate for your case.
I can say that I can see this happening if the last column is a column like
IsReadOnly which is responsible for calling a
Actually the code generated to read the
IDataReaders is pretty fast. The problem is that generating such reader takes time (not that much, yet we can say that it may be a problem). So the best thing to do is to cache the generated readers. The problem is: How do we cache them?
It is possible that you always use an object (like
Person) when reading a table
Person, which is always using the same list of columns and in the same order, independently on the
ORDER BY used or the
WHERE used. If this is the case, a generator for the destination type could be reused in different select clauses, as different
ORDER BY or
WHERE clauses don't affect the returned columns.
But if you use the same target object with different tables or with different column orders on the
SELECT clause you can't reuse the
IDataReader reader, so a new one must be used. In fact, if you use different databases the same select may actually return columns of different types, so the generated reader can't be used in those cases either.
So, to try to solve those problems the default cache will only reuse a cached generator for an identical SQL clause, considering it comes from a connection that has exactly the same connection string and, of course, for the exactly same destination type. The class that does such cache actually allows you to say that you don't want to consider the SQL clause or the connection string, effectively reusing the cached generators more frequently, but that's the user responsibility to ask for that and to guarantee that he will not use the same object to read different tables, databases or simply selects with different column orders.
Yet, even if you don't care about reusing the cache so often there's another problem: Items cached will never be collected, so if you are building different SQL clauses (maybe because the where clauses are using string concatenations instead of using parameters, which is another problem on its own) the cache can become too big. It is not hard to make a better cache by using a weak dictionary, but creating a good weak dictionary is not that easy and surely it is not that small, so to avoid giving a big solution I simply wrote a simple cache that's not weak. But remember such trait when using it, so you may prefer to reuse the queries for different SQLs or you may prefer to write your own cache with different rules, as this is something important.
The sample application is a speed and conversion comparison using this solution and Dapper with fake commands and data readers. I made it use fake commands and data readers because I don't want to force users to create a real database to do the tests, yet I tried to use all the important methods so it is possible to see how the library can be used.
The fake reader actually treats the "int" columns as
decimal columns, so it is necessary to do some conversions to make the values work and, with enums, Dapper simply fails while this solution works.
It is important to note that this application will run really fast to process millions of records as it doesn't actually lose time querying a real database, so we can see that both mappers are extremely fast. In real situations most of the time is spent doing the real query and receiving data through TCP/IP, yet I put the speed comparison to prove this is not going to make things slower while it will be useful by using the right get methods and doing the conversions easily, if they are necessary.
I will not promise anything, but in the future I plan to present an implementation for the
ConversionGenerator that will allow users to easily register new conversions without having to provide an entire new implementation and I also plan to present a solution to fill the database parameters (which actually is something that Dapper has but this solution doesn't).
My purpose is to let each one of those solutions to live isolated so you can use anyone of them without using the others while still providing syntactic sugar methods more similar to how Dapper works, so you can easily execute a query, giving typed parameters and receiving typed results with ease, which of course will use this fast solution for reading and, if needed, will also use the fast solutions to fill parameters and to do configurable datatype conversions.