One of the most repetitive, boring and thus error prone tasks when accessing databases from C# (or other languages) is writing SQL commands. And when there is explicit SQL, no matter whether it's in stored procedures or inside your code, it has to be maintained. For instance, if a column is added to a database, that column must be added to all four commands (
DELETE - a.k.a. CRUD), and there are often more than one
SELECTs with different filters. There are certainly times when you must write some SQL, but it would be better if machine could create as much as possible.
This article is about writing some code that can do that job. It was made for SQL Server and ADO.NET SqlClient provider.
Update 15/1/2005: I've changed it to support OLEDB too. Wherever in text
SqlCommandBuilder is referenced, it also applies to
OleDbCommandBuilder. You shouldn't have too much problems to customize it to other databases and data providers.
What's already included in ADO.NET
ADO.NET comes with one solution for this problem,
SqlCommandBuilder class. You still have to provide
SELECT command, but
DELETE are generated for you. Behind the scenes,
CommandBuilder goes to database to fetch the required metadata.
CommandBuilder has lots of issues, discussed in William Vaughn's article "Weaning Developers from the CommandBuilder". In my opinion, the greatest problem is, if for some reason, the code it generates doesn't fit your needs, there's no way to customize or fix it - you have to abandon it completely. For example, if you want to find out the value of autoincrement column in SQL Server after
INSERT, you're out of luck. Another serious problem is that it requires roundtrip to database to fetch metadata, so even Microsoft doesn't recommend using it in production code.
Another possible solution is to use Data Adapter Configuration Wizard (DACW), as suggested in Vaughn's article. It works similar to
SqlCommandBuilder, but in design time. So you can see generated code and tweak it to your needs. Problem is, you have to make same tweaks whenever you generate SQL (e.g., to add retrieval of autogenerated column to
INSERT commands). Once you have tweaked it, you are in manual maintenance mode - the computer can't add new columns to these commands for you.
A word or two about Strongly Typed DataSets
When working with plain, untyped
DataSets, there are lots of casts and finding columns by name. Both of these activities can't be checked at compile time. If you like, when compiler finds as many bugs in your code as possible, then you must be using Strongly Typed DataSet (typed DS). It's one of my favorite features of .NET. If something changes in the DB, you just update the typed DS, compile, and let the compiler find where your code doesn't work with the new version of DB. Also, when you use typed DS, you have Intellisense (which is hard to live without). Although it affects performance, benefits usually compensate that difference. If your application can afford to use typed datasets, you'll see here how they can reduce the amount of SQL that should be written manually.
Typed DS is created from the database in a similar way to how
CommandBuilder works - database is queried for metadata. Main difference is it's happening on design time instead of run time. This essentially means, on run time, there already exists enough metadata inside typed DS to construct commands - you can find out names of tables, list of columns, their types, which columns are primary keys, etc. If
SqlCommandBuilder knew about them, it wouldn't have to make roundtrip to the database. Also,
SqlCommandBuilder has no way of knowing which columns you really need: if some typed DS represents only a subset of some database table, only columns represented in that DS should be included in commands.
Code presented here will work similar to
SqlCommandBuilder, but under our control. Therefore, we can make custom changes. For
SELECT command, we can add filtering or ordering, and still have auto generation of column list. For
INSERT, we can add support for auto increment fields. For
SqlCommandBuilder is using "optimistic locking" support, i.e., only rows that are not changed since we fetched them are written to the DB. We can choose other kinds of concurrency controls, e.g., using timestamps or no concurrency control at all.
All relevant classes are in
Cogin.Data.DataAccess namespaces. Class
CustomCommandBuilder is one that generates SQL. It's used by
SqlDataAccess class, which contains low level ADO.NET code. These two classes are not application specific and can be reused in different projects. There are two similar classes for OleDb:
OleDbDataAccess. DataAccess classes are not used directly, but through
IDataAccess interface, so that application specific classes don't have to be changed when we switch from SqlClient to OleDb data provider. They just have to use
DataAccessFactory to get the appropriate version, which checks config file to see which data provider will be used.
Example of data access classes are
OrdersViewDataAccess. These classes are handling all CRUD operations for the sample application, and yet there is no SQL in any of them! That was our goal. Methods you have to write are as simple as:
public void FillOrdersForCustomer( string customerId )
dataAccess.SelectWithFilter( dataSet.Orders, "CustomerID=@customerId",
new GenericSqlParameter( "@customerId", customerId ) );
public void updateCustomers()
dataAccess.fullUpdate( dataSet.Customers );
Sample Win Forms application is included. It uses Northwind database which you should already have on SQL Server. It displays customers and their orders (filtered and loaded only when needed) and allows updates, inserts and deletes. Unfortunately, Northwind doesn't have cascade updates/deletes, so you can successfully play only with rows you created. Since that wasn't the point of this article, I didn't bother to write code to cascade changes to all affected tables, e.g., to Order Details.
CustomCommandBuilder can be used even if you want to use stored procedures instead of plain SQL. It can create calls to stored procedures, and generic procedures as well. Creating these autogenerated stored procs in the DB can be done manually or automatically on application start.
When working with DB, usually the bottleneck is database itself. However, I was still interested to find out the performance hit
CustomCommandBuilder introduces. In order to minimize DB overhead, the test database was on the same machine as the test program (no network latency). Fastest way is to use stored procedures, so alternatives are compared to that approach. All test cases use typed DS. I didn't try to measure typed DS versus DataReader or some other approach. I tried to answer the following questions with this test:
- Is it slow if SQL commands are generated every time? It's what
CustomCommandBuilder is doing, and if command generation is too slow, some caching mechanism should be used.
- Is it slow to recreate data access and typed DS classes for each request?
- Does it matter if filters are made using
SqlParameters compared to plain text?
Last question was important because you can make filters by concatenating strings, e.g.:
SelectWithFilter( dataSet.Orders, "CustomerId='" + customerId + "'" );
This is slow not because of string concatenation, but because SQL Server receives always different commands and must compile them (parse and determine execution order) every time. Better way to make parameterized filters is:
SelectWithFilter( dataSet.Orders, "CustomerId=@customerId",
new GenericSqlParameter ( "@customerId", customerId ) );
SQL Server can cache this kind of commands, because parameters are transferred separately from command string. As you'll see from benchmark results, this is almost as fast as using stored procedures, but much more flexible. Using
SqlParameter is not only faster, but safer too because you don't have to worry if parameter contains some illegal character, like ' sign.
GenericSqlParameter here is just Facade that allows us to use it for both SqlClient and OleDb.
I created a small benchmark that also uses Northwind database, to fetch 1000 times random row from Orders table. Stored procedure
CustOrdersOrders (which already exists in Northwind) is compared to selecting using generated SQL command, generated again for each request. Variants with and without reusing data access and typed DS classes are measured. Test configuration: Athlon XP 1800+, 512MB RAM, 7200rpm HDD, Win XP SP1, SQL Server 2000. Database is running on the same machine. All tests are performed thrice. Results:
This example is very simplified and limited, but it gives some hints. You should perform tests on your setup with typical real life load, to get relevant data for your case.
As you can see, differences are rather small, and worst case is only 25% slower than best case. I consider 25% to be a small difference because in real world, database is on a separate server, queries are much more disk-intensive (during this benchmark, disk activity was practically nonexistent because very soon entire Orders table was cached in system memory), so differences should be much smaller. You should notice that "plain text filters" have largest deviation. That's because they are very dependent on "randomness" of required CustomerID, i.e., whenever same filter condition is repeated, previously compiled SQL command is used. This also means that this kind of filtering is trashing SQL Server's compiled commands cache.
If this performance difference is significant in your case, or you can not afford additional memory overhead of typed DS, you can go
DataReader/Stored proc way, but that requires much more work and is much harder to maintain. Best advice is to measure before optimizing anything.
If performance is an issue, I would rather focus on a way typed data sets are usually used - as a set of several data tables which directly map to database tables, i.e., without using
JOINs in queries. Each
DataTable object is filled/updated with separate SQL commands, and each one means roundtrip to the server. This can be improved by packaging several commands together - multiple resultsets will be received with single roundtrip.
CustomCommandBuilder frees you of writing generic SQL code, and yet allows you to choose how it will be generated. Performance hit is usually minimal in real life. Only requirement is that you're using typed data sets, which I would recommend anyway if you can trade some performance hit for increased productivity. If you're using SQL Server, everything is already provided, but contrary to ADO.NET's
SqlCommandBuilder, you can change it for other servers.