NOTE: This article was originally part of the VistaDB.Net blog and has been moved here.
ADO.NET is a set of libraries included in the .NET Framework to facilitate the communication of applications with various data storage mechanisms. These libraries form the basis for all third parties to provide data access services to users of .NET applications.
Visual Studio 2005 and 2008 did not change the data access model. In fact, ADO.NET 2 is the longest running Microsoft data access technology without a major revision. I don’t know if this is going to change in .NET 4, but the stability of ADO.NET is a major reason for its adoption in VistaDB.
In this article, I going to give a high level overview of the ADO.NET object model and how VistaDB supports that model. I will then follow up with more articles discussing specifics for this model.
The ADO.NET Object Model
ADO.NET at its most basic level was designed to help developers work efficiently with multi tier databases, across intranet or Internet scenarios. This is a tall order for a single framework, but I think they managed to pull it off quite well.
I often view the object model as being split into two parts – A connected and a disconnected side of the model. Even though there is no distinction between the two “sides”, it helps me to visualize how the data object was intended to be used.
Disconnected ADO.NET Classes
The objects located in the disconnected half of the model do not communicate directly with the connected objects. This is a big deal because it means each side is isolated through interfaces to communicate. I have found this is often the most confusing part of the ADO.NET spec because users think each side should be able to directly communicate.
The ADO.NET DataSet for example does not ever communicate directly with the database. To fetch data from the database into a
DataSet, you must pass the
DataSet into the
Fill method of a connected object (the
DataSets are a container class for a number of
DataTable objects (stored in the Tables collection). You can think of this as an in memory representation of multiple tables, their constraints, relationships, and the actual rows.
The data stored in the
DataSet is a disconnected copy of the actual database. Any changes you make to a
DataSet only exist in RAM. You can quickly get a list of just the changes by calling the
GetChanges method to return only modified
DataRows. This allows you to submit smaller change sets back to the database.
Merge method also allows you to combine multiple change sets into a single
DataSet. This is very useful in multi tier scenarios where you need to receive partial updates from lots of different systems and merge them into a single database change.
DataSet is also unique in that you can read and write this disconnected data cache to disk. A common use is to put the
DataSet into an XML document for transmission to multiple systems to merge into their local database.
DataTable allows you to examine the actual rows of a
DataSet through rows and columns collections. You can store the results of a query in a
DataTable by calling the Adapters
Fill method. Once the
DataTable is filled, the database connection is released and operates disconnected only. You can then continue to examine the data without any further communication between the
DataTable and the database.
This is important for scenarios where you want to work across a network share. Caching the data on the local machine is vital to performance of the application. Whenever possible, you should not be using connected objects against a network shared database, the traffic is very expensive.
DataTable has a collection of
DataColumns associated with it. This represents a single column in the database. But the object doesn’t actually contain any of the data stored in the
DataTable. This is just a metadata storage class about the column, its constraints, types, etc. It is quite useful for finding when a single column allows
nulls, has a unique constraint, etc. The autoincrement property is also implemented at the
DataColumn level since you want to ensure each row has a unique value.
One interesting usage for this class is the
Expression property. This allows you to define how the data for this column is calculated. This would enable you to compute the value of an items total price based upon units sold and price per unit at the
DataColumn level rather than through a SQL command. The added benefit to using this scenario is that updates to the
Quantity columns will recalculate the total price column in RAM without having to return to the database to rerun the SQL.
Each row in a
DataTable is represented in the Rows collection as a
DataRow. To examine the data in a specific column, use the
Item property of the appropriate
DataRow object. You can lookup the columns in a row through an integer index, or by the name of the column.
DataRow class is also where you perform updates to a
DataSet. When you are preparing to edit a row, the
BeginEdit method should be called on the
DataRow object. The
EndEdit method then can be used to save the changes back to the
CancelEdit provides an undo facility. This is key to how the
DataGrid and other grid controls allow users to edit, commit, and cancel their changes in memory.
A key concept often missed by developers is that changes to the
DataRow even when they are “
AcceptChanges” calls are only committing them to the RAM copy of the database (the
DataSet). You must still commit the changes back to the database in order to store them permanently.
DataColumn may have multiple constraints. Conditions such as unique are applied through this class. Constraint objects are maintained through the
DataTables constraints collection.
Most tables in a single
DataSet will be related in some way. The
DataSet can handle mapping things like
OrderDetails with a little help from the
DataRelation class. This is a way to enforce referential integrity from the database, without having to make a complete round trip to the database.
Although excluded in the diagram above, this is an important class. You can use multiple
DataView objects to example the same
DataTable in a
DataSet. This can result in massive memory savings since only one copy of the data needs to be maintained in the
DataSet. You can do a lot of view type operations with the in memory
Connected ADO.NET Classes
Connected classes in ADO.NET are designed to communicate directly with the data source. Most of these classes map closely to basic data access concepts such as the
Connection to the database, a
New in ADO.NET 2, this is an object factory that allows all .NET applications to generically load a provider without knowing much about it. Each Provider Factory includes a way to create
It is this
ProviderFactory object that allows VistaDB to be swapped with SQL Server at runtime through your app.config or web.config files. It is not a perfect match for communicating directly against the strongly typed classes, but it is very close.
Connection is a representation of the actual physical connection to the database. This may be through a server, or locally. You use this class to connect and disconnect from the actual database.
connection also acts as the handle to the database for other objects like the
Command objects. They do not communicate directly to the database, they must go through a
This is another new class in ADO.NET 2 that simplified the process of building connection strings and remembering the options for a specific provider, it is a great utility class to save you time. Once you have built up this object, you can assign the
ConnectionString property to this object.
DataAdapter is a new concept, there are no matching concepts in ADO or DAO. The
DataAdapter is the bridge between your database and the disconnected ADO.NET objects. The
Fill method provides an efficient mechanism to fetch the results of a query into
DataTable so you can work with the data offline (disconnected). You also use the
DataAdapter to submit pending changes from the disconnected objects back to the database.
This is a complex class with lots of internal jobs to track. We will cover it in more detail in another blog post.
This class represents a question or query against a database, a call to a stored procedure, or a direct request to return the contents of a specific table. This is probably the hardest class to port your usage between database vendors as most of the calls at this level are vendor specific syntax.
Command object with a database is pretty easy. You set the
Connection property to an existing opened
Connection object, and then specify the command you want to execute in the
CommandText. You can supply SQL commands, or just the name of a stored procedure or table (although you have to then also change the
CommandType property to what you wish to accomplish).
Commands have many ways to execute them, but all of them basically call
ExecuteQuery under the hood. If you don’t care about the results, or only want the first row and column, there are short cuts you can take (
Note that VistaDB does not support the
ExecuteXML syntax at this time.
To simplify the process of putting parameters into a
Command object, the
Parameter class allows you to quickly put parameters into a query without
string concatenation or worry about SQL Injection attacks. Most developers rely heavily on parameterized queries because of the added benefit of SQL Injection protection;
strings are automatically quoted to the specs of the underlying database.
There are many ways to create
Command objects, but they are mostly just different ways of doing the same thing. Some people prefer one syntax over another, but they all end up as
Parameter objects on a
Command object at execution in the database.
DataReader is built as a way to retrieve and examine the rows returned in response to your query as quickly as possible. Not all of the rows have to be ready before you can start working with the data. Only a single row is exposed at a time, so the database engine can get you the first record quickly and then continue to work in the background to find the rest of the answers over time.
The data returned by a
DataReader is always read only. This class was built to be a lightweight forward only, read only, way to run through data quickly (this was called a firehose cursor in ADO).
Transactions are used to ensure that multiple changes to database rows occur as a single unit of work. The
Connection class has a
BeginTransaction method that can be used to create a
A definite best practice is to ensure that
Transactions are placed in
Using statements for rapid cleanup if they are not committed. Otherwise the objects (and any internal locks that may be needed) will remain active until the GC gets around to cleaning it up.
What about Cursors?
You will notice that I never mentioned database cursors in the above discussion. That is because they do not exist within the ADO.NET Framework. ADO.NET does not support server side cursors by design. The
DataTable classes most closely resemble a cursor, and the
DataReader class is a close match for a client side forward only, read only, cursor, but it is not the same thing.