Click here to Skip to main content
Click here to Skip to main content

Professional .NET Framework - Chapter 10 - Working with Data in .NET

, 13 Nov 2001
Rate this:
Please Sign up or sign in to vote.
This chapter explains what is involved in working with data in the .NET framework.
<!-- Article image -->
Sample Image
Title Professional .NET Framework
Authors Jeff Gabriel, Denise Gosnell, Jeffrey Hasan, Kevin Hoffman, Christian Holm, Ed Musters, Jan D Narkiewicz, Jonothon Ortiz, John Schenken, Thiru Thangarathinam, Scott Wylie
Publisher Wrox
Published September 2001
ISBN 1861005563
Price US 59.99
Pages 1000

Working with Data in .NET

Data comes in a myriad of forms. Today, virtually anything can be described by data of some kind. In short, data is information and, as we all know, information is power. Any statistic, number, fact, figure, or seemingly unimportant little detail can also be considered data. Data isn't just the bits and bytes that you store in your mammoth data store high in your Ivory Tower, locked away behind an army of IT professionals.

Traditionally, many of us grew up thinking that data consisted of the DATA statements at the end of a BASIC (I dropped off the trailing "A" so as not to date myself too much), or the holes in a set of punched cards or the sequential storage of bits and bytes in a binary file on disk. In our modern world of RDBMSs, XML, and remote data stores, we've come to think that we have a more advanced view of data now

As more and more people needed quick, easily programmed, reliable access to popular data stores, Microsoft provided DAO, or Data Access Objects, based on their JET Database Engine. From this evolved ADO, a faster, more efficient system of accessing data that allowed a wider community of programmers to easily access SQL servers, Oracle databases, and many more data sources, all with a uniform API. The world was good.

Unfortunately, the reliance on ADO for all things data gave many people tunnel vision. If you ask your average programmer the question, "What is data?", they are very likely to respond with a lengthy discourse on SQL, Oracle, mySQL, or any other host of database servers or database file formats like Btrieve or Paradox. ADO actually provided a high level abstraction from a concrete data source, advancing Microsoft's concept of Universal Data Access. Unfortunately, many of the features of ADO that dealt with non-relational data went unnoticed.

With the release of the .NET framework, Microsoft is trying to widen people's perception and remove some of the tunnel vision and "data crutch" created by reliance on ADO. While ADO was (and still is) a remarkably good tool, it is not the solution to all problems. With the .NET framework, Microsoft is trying to reinforce the statement that data is data, and nothing more. They believe that programmers should be able to program data without the hang-ups of relying on the nuances of a given data provider to allow them to do so. Whether you obtain the circumference of your thumb from an Oracle server, a SQL Server, a Btrieve file, an XML file, or an XML stream pumped to your application over a network socket, the fact remains that you are still simply dealing with the circumference of your thumb. The source of that information should not be allowed to impact how you handle it.

Throughout this chapter you will see how the System.Data namespace is structured and how to utilize many of the classes within it. As well, we'll go over the pros and cons of using ADO.NET, and then compare and contrast current ADO functionality with that of ADO.NET. After that we'll go into using the System.Xml namespace. While reading this chapter, keep in mind that this is a high-level overview of working with data in .NET.

For truly in-depth coverage of all the material in this chapter and then some, consult Professional ADO.NET Programming from Wrox Press, ISBN 186100527x due to be published shortly after this book.

What this chapter hopes to accomplish is to present the idea that we have yet again returned to the days where people think of data as data without regard for the physical means by which that data is obtained. By the time you complete this chapter, you should:

  • Be familiar with the basic architecture of ADO.NET
  • Know the benefits and drawbacks for using ADO.NET
  • Know when to use ADO versus ADO.NET and why
  • Be familiar with the core XML classes provided by the framework
  • Be aware of how the System.XML and System.Data namespaces are intertwined

System.Data

There is a specific reason why this section of the chapter is not referred to as ADO.NET: the reason being is that ADO is actually a bit of a misnomer. It has been kept in marketing campaigns, documentation and other references to keep programmers within a familiar frame of reference. However, there is no longer any reference to the acronym ADO within the System.Data namespace.

This fact is not an oversight. If there is one important thing you take away from this chapter before moving on to the next, it should be the memory that System.Data is not ADO's direct successor. The System.Data namespace is so much more than that. It is a highly object-oriented, extensible library of classes for manipulating structured data. That data can be from SQL, Oracle, text files on your disk, XML streams, or virtually anywhere else.

System.Data Architecture

Illustrated in the following diagram are a couple of things that you may not be familiar with. On the right are what are called Managed Providers (or Data Providers, depending on which documentation you're reading during Beta 2). These managed providers provide a managed, object-oriented set of functionality for accessing a data store. The two that currently ship with the .NET framework's System.Data namespace are the OleDb managed provider and the SqlClient managed provider. In addition, Microsoft is offering the Odbc Data Provider as a separate download. All of its functionality can be found in the System.Data.Odbc namespace. On the left is the keystone of ADO.NET, the DataSet object. The diagram illustrates a clear separation (yet easy communication) of the DataSet and the actual store from which the data originates.

Data Providers

Data Providers within the context of the .NET framework are managed, object-oriented wrappers around database functionality. They essentially provide a layer of abstraction above their specific data source. A data provider provides methods for connecting to a database, retrieving data from the database, committing changes to a database, and executing database commands.

Shipping with the .NET framework are two data providers, the OleDb provider and the SqlClient provider.

OleDb Provider
The OleDb provider uses the native OLE DB functionality through the COM Interoperability services of the .NET framework. Just as with previous uses of OLE DB providers, you still need to specify which OLE DB provider (not to be confused with the OLE DB managed provider. Clear as mud?) to use. You can use the SQLOLEDB provider or one for Oracle or one for the Microsoft JET Database engine. There is a limitation on the use of the OLE DB provider in the Beta 2 release of the framework SDK. The OleDb Managed Data Provider cannot be used to access OLE DB providers that require a version 2.5 interface, such as the OLE DB provider for Microsoft Exchange.

The OleDb managed provider is an excellent choice when using SQL Server 6.5 or earlier, Oracle, or Microsoft Access. This is the recommended provider for single-tier, Access-based solutions or for other data sources that have OLE DB providers such as Active Directory, LDAP, etc.

SqlClient Provider
The SqlClient managed provider is a lightweight, highly optimized Data Provider. It uses its own protocol (Tabular Data Stream) to directly access SQL Server 7.0 (or later) without requiring an OLEDB provider or the use of the ODBC layer. The bottom line is that if you're using SQL Server 7.0 and above, you should absolutely be using the SQL Server .NET Data Provider because of its improved efficiency, smaller memory footprint, and improved speed over the SQL OLE DB provider.

Odbc Provider
Available as a separate download from Microsoft, this provider exposes traditional ODBC data sources to managed code.

You might be thinking that once more and more data providers start to appear, we'll be back in the same mess we used to be before the advent of ODBC for standardizing data access across multiple vendors. This might be true, had Microsoft not enforced some rules on the .NET Data Providers.

At the core of each Data Provider are four classes (discussed shortly): The Connection, the Command, the DataAdapter, and the DataReader. Even though each provider implements their own version of these classes, each class implements a specific interface to which it must conform. Remembering your knowledge of Interfaces, we can then say that a Connection class, regardless of which provider it came from, can be programmed with the same code because it conforms to the IDbConnection interface.

Therefore, the following snippet of code illustrates that the programmer doesn't really care which provider originally supplied the class. We can treat the classes as their basic interfaces and program to the common denominators.

IDbCommand GenericCommand = Connection.CreateCommand();
GenericCommand.CommandText = "SELECT Flavor FROM Coffees";
// . . . continue on with source-agnostic data access code

The connection in our sample could have been an OleDbConnection, a SqlConnection, or if we're thinking ahead, it could have been some third party extension like BDEConnection (for the Borland Database Engine).

We won't cover the how or the why in this book, but you should be aware of the fact that you can create your own custom .NET Data Provider if you choose. All you really need to do is fill a namespace with your own derivatives of the core classes such as Connection, Command, DataReader and DataAdapter and you can then easily begin writing data access code against your own proprietary, optimized provider.

The Connection

A Connection is a class that implements the IDbConnection interface. This should be one of the most familiar concepts in ADO.NET to classic ADO programmers. The Connection provides the essential link between your code and the data store housing your data. In traditional implementations, the Connection typically connects to an RDBMS across a network wire. However, it is best if you don't make that assumption about connections. With the ability for vendors to be producing their own managed .NET Data Providers, it is possible that this Connection could be a link to something else entirely (an Active Directory, Exchange Server, a proprietary security system, or anything else you can think of that might be considered a viable source of data).

The IDbConnection interface dictates that a Connection provides implementations for properties like the database connection string, the connection timeout, the name of the database, and methods for opening and closing the connection. Second, only to providing the link to the data store, in importance is the Connection's ability to create a Command object based on that Connection. This way, the consumer of the Connection object is guaranteed to get a Command object that belongs to the same managed provider.

We'll put the Connection object and the other components of the managed providers together in some samples once we've had a chance to go over each component individually. The following is a quick example of how a SQL connection can be instantiated and used, and a couple of its useful properties:

using System;
using System.Data;
using System.Data.SqlClient;

namespace ConnectionExample
{
class ConnectionSample
{
static void Main(string[] args)
{
   // declare and instantiate a new SQL Connection.
   SqlConnection Connection = new SqlConnection();

   Connection.ConnectionString =
      "Data Source=localhost; Initial Catalog=Northwind; User id=sa;
                                              Password=;";

   Connection.Open();

   // Use a couple of SQL specific connection properties.
   Console.WriteLine("SqlClient Connected from {0} to SQL Server v{1}",  
                      Connection.WorkstationId, Connection.ServerVersion );

   Connection.Close();
}
}
}

The output generated by this program on my test machine produces the following line of text:

SqlClient Connected from MEPHISTOPHOLES to SQL Server v08.00.0100

The Command

The Command object represents a SQL statement or stored procedure that is executed through its Connection. The IDbCommand interface requires that any deriving class must implement properties to set and retrieve the command string and command type, as well as the ability to execute the command, storing the results in a Reader object or executing the command without any return results.

Programmers of ADO should find this concept familiar as well, as they had to create an instance of the ADO Command object to utilize a stored procedure, etc. The provider-specific versions of the Command class are SqlCommand and OleDbCommand. Of all the ADO.NET classes, this has the most resemblance to an ADO counterpart, the ADO Command object. Aside from syntactic differences, the two classes perform generally the same roles.

The following example is a quick glimpse at the operation of a Command object, enabling us to execute one of the stored procedures against the Northwind database:

using System;
using System.Data;
using System.Data.SqlClient;

namespace Wrox.ProDotNet.Chapter10.CommandExample
{
class CommandSample
{
static void Main(string[] args)
{
   // Create a connection to Northwind on SQL 2000
   SqlConnection Connection = new SqlConnection();
   Connection.ConnectionString =
      "Data Source=localhost; Initial Catalog=Northwind; User id=sa; 
                              Password=;";
   Connection.Open();

   SqlCommand MyCommand = new SqlCommand( "CustOrderHist", Connection );
   // Need to set the command type, otherwise it defaults to text.
   MyCommand.CommandType = CommandType.StoredProcedure;
  
   // The stored procedure we're calling asks for the CustomerID parameter.
   SqlParameter CustID = new SqlParameter( "@CustomerID",
                SqlDbType.NChar, 5);
   // Pick one of the customers in Northwind to grab history for.
   CustID.Value = "DRACD";
   MyCommand.Parameters.Add( CustID );

   SqlDataReader MyReader = MyCommand.ExecuteReader();

   // Do something with the data returned from the command.

}
}
}

The DataAdapter

The DataAdapter can be considered to be like the electrical "plug" of the ADO.NET architecture. The Adapter's sole purpose in life is to fill DataSets and propagate DataSet changes to the data source.. As we'll find out shortly, the DataSet is a completely source-agnostic data store. This means that it can never know the nature of the source of the data it contains. The job of the DataAdapter is to adapt the data from any of its Command objects (which as we now know, have an associated Connection object) and pour that data into the DataSet by using the DataSet's native methods for creating data, as well as take data from the DataSet and place it into the data source using Command objects.

The Adapter actually is a container for four separate Command objects. It contains a SelectCommand, an InsertCommand, an UpdateCommand, and a DeleteCommand. Any class implementing the IDbDataAdapter interface must provide these four properties. By housing these commands, it can successfully adapt any changes made in an associated DataSet by transferring those changes through the appropriate command (and hence to the final destination on the other side of the appropriate Connection object).

Therefore, by looking at the following diagram, we can see how the flow of data travels up a food chain of sorts, housed by a managed provider:

The figure may look a little confusing initially, but once you get the hang of working with data in .NET, it will seem like second nature to you. Firstly, we can follow the food chain from the RDBMS (note that this could easily be an Access database or some other custom database from a third party vendor) all the way back to the DataSet.

Another interesting thing to point out about our diagram is that we only have a single Connection object in use. One of the really incredible things that we can do with the DataAdapter and its command objects is actually have the Command objects referencing different connections. The implications of this are enormous, but you can see how it might facilitate possibly having a highly optimized read-only store on one connection, and the modifiable store on the other connection.

Without knowing anything about the performance implications of ADO.NET, we can immediately form the guess that if we only need single-direction, read-only loading of DataSets or controls, we can bypass the Adapter entirely and utilize a DataReader.

The DataReader

The DataReader is an implementation of a way of navigating a forward-only stream of data. There are many cases in which you would opt to use the DataReader over the DataSet. The DataReader is an extremely fast, low-overhead way of retrieving information from a data stream. You might be traversing a result set to display it on a web page or you could be pulling information to dump into a text file, etc. Chances are that, if your application doesn't need anything beyond forward-only, read-only data access for a given purpose, it can achieve significant performance benefits by using a DataReader.

It can accomplish this performance boost because it doesn't need to maintain the overhead of caching previously traversed records. The DataReader only maintains a single row of data in memory at any given time.

The previous example we used for the Command object lends itself very well to being used with a DataReader. If we add a few lines of code to the previous sample, we can actually display the results of the stored procedure. The new code for our CommandExample project looks like this:

using System;
using System.Data;
using System.Data.SqlClient;

namespace Wrox.ProDotNet.Chapter10.CommandExample


{
class CommandSample
{
static void Main(string[] args)
{
   // Create a connection to Northwind on SQL 2000
   SqlConnection Connection = new SqlConnection();
   Connection.ConnectionString =
      "Data Source=localhost; Initial Catalog=Northwind; User id=sa;
                              Password=;";
  Connection.Open();

   SqlCommand MyCommand = new SqlCommand( "CustOrderHist", Connection );
   // Need to set the command type, otherwise it defaults to text.
   MyCommand.CommandType = CommandType.StoredProcedure;
  
   // The stored procedure we're calling asks for the CustomerID parameter.
   SqlParameter CustID = new SqlParameter( "@CustomerID",_
                SqlDbType.NChar, 5);
   // Pick one of the customers in Northwind to grab history for.
   CustID.Value = "DRACD";
   MyCommand.Parameters.Add( CustID );

   SqlDataReader MyReader = MyCommand.ExecuteReader();

   // Do something with the data returned from the command.
  
   Console.WriteLine("Order History for Customer ID: DRACD");
   Console.WriteLine("Product:Quantity");
   Console.WriteLine("----------------");
   while (MyReader.Read())
   {
      Console.WriteLine("{0}:{1}", MyReader.GetString(0),
              MyReader.GetInt32(1) );
   }
   Connection.Close();
}
}
}

The output of the above code returns the following results:

Order History for Customer ID: DRACD
Product:Quantity
----------------
Gorgonzola Telino:20
Gumbär Gummibärchen:12
Jack's New England Clam Chowder:9
Konbu:25
Lakkalikööri:12
Perth Pasties:20
Queso Cabrales:20
Raclette Courdavault:30
Rhönbräu Klosterbier:12

Using the SQL Server .NET Data Provider

Now that we've had a quick sample of some of the facilities available to use for accessing data in the .NET framework, let's put it to the test. Our sample is fairly simple. We've written a simple console application that opens up a connection to the SQL Server 2000 Northwind sample database using the SQL Server .NET Data Provider. Once we have the connection, we create a command based on that connection. Continuing to follow the food chain as we described above, we create a DataReader based on that new command and simply iterate through our newly obtained results.

Let's look at the output of our sample application and then we'll get into the code that produced it.

From this we can see that there are three fields (CustomerID, ContactName, and CompanyName) that are being displayed.

To build our sample application, all we did was create a new Console Application using the VS.NET New Project wizard. After that, we added a reference to the System.Data Assembly (Project | Add Reference …). From there, we entered the following code into our new class:

/*
 * SqlProviderTest
 * Console application to demonstrate simple DataReader access
 * of the Sql managed provider
 */

using System;

// lets us reference System.Data and Sql managed provider class names
// without long prefixes.
using System.Data;
using System.Data.SqlClient;

namespace Wrox.ProDotNet.Chapter10.SqlProviderTest
{
   /// <summary>
   /// Summary description for Class1.
   /// </summary>
   class MainApp
   {


      static void Main(string[] args)
      {
      string SqlQuery = "SELECT CustomerID, ContactName, CompanyName FROM
                         Customers";

This will create a new instance of the Connection object implemented by the SQL data provider and then use that connection object to create a new Command object.

      SqlConnection Connection = new SqlConnection("Data Source=localhost;
                    Initial Catalog=Northwind; User id=sa; Password=;");
      SqlCommand Command = Connection.CreateCommand();
      Command.CommandText = SqlQuery;
                 
      Connection.Open();
      SqlDataReader Reader = Command.ExecuteReader();

Working with the forward-only model of the DataReader, we loop until Reader.Read() returns a false value. While in the loop, we use the Getxxx methods to access the various columns of our result set from the reader. In our case, we're using GetString with the 0 indicating the first column.

      while (Reader.Read())
      {
         Console.WriteLine( "{0}\t{1}\t\t{2}", Reader.GetString(0),
                 Reader.GetString(1), Reader.GetString(2) );
      }

      Reader.Close();
      Connection.Close();
   }
  }
}

The code is pretty straightforward. The first thing we do is create a new connection, providing a connection string indicating that we want to use the Northwind database (note that the server must be configured for SQL Server and Windows security). After that, we create a new command object by asking the Connection to create one for us and set its command text to our SQL query. Opening the connection is a simple matter of just calling Connection.Open ().

Once we have an open connection, we create a new DataReader by calling the ExecuteReader() function. This function will execute the SQL statement in our Command object and store the result set in a new DataReader object.

From there, we call the Read() method in a while loop to fetch new rows continuously. The Read() function returns a boolean indicating a successful (or not) read operation. After it returns true, we can then use the GetXXX() functions to retrieve a given column, casting the value to a given data type. In our case all of our columns are strings, so we can just fetch them all using GetString(x), where x is the ordinal position of the column we want to retrieve.

Coding Provider-Agnostic

Earlier on in this chapter, we made the bold statement that you could, in theory, write code to the level of the common interfaces, which would allow your code to function the same whether or not you were using an OleDb Data Provider pointing at an OLEDB Provider or the SQL Data Provider working against a SQL Server 2000 database.

Some of you are probably thinking that it's all well and good to say these things, but you'll believe it when you see it. Just to set you at ease, we've prepared an example that demonstrates exactly this point. Our example is an enhancement of the code above. We prompt the user for a provider type, and then create a connection, and from there all of the code is the same, regardless of the provider. Our example demonstrates running code and queries against the Northwind MS Access database (converted to Access 2000 format) and the SQL 2000 Northwind example.

Let's take a look at the output and then we'll go into the code. You'll notice that the output is identical to the previous sample, even though we're programming generically without prior knowledge of the Data Provider.

You can see that the application prompts the user for which type of provider they want to use. This in turn affects the connection string provided to the connection. However, once the connection has been established, all of the code from that point on is generic and is coded against the interfaces and not the provider-specific features.

using System;
using System.Data;

Here we can see that we're using both the OleDb and the SqlClientnamespaces. This is only to make our code for creating a connection a little easier; we're still only referencing the System.Data Assembly.

using System.Data.OleDb;
using System.Data.SqlClient;

namespace Wrox.ProDotNet.Chapter10.DataProviders
{
   class MainExample
   {


      static void Main(string[] args)
      {
         string Input;
         string OleConnectionString  = @"PROVIDER=Microsoft.Jet.OLEDB.4.0;
                DATA SOURCE=C:\NorthWind2k.Mdb;";
         string SqlConnectionString = "Data Source=localhost; Initial
                Catalog=Northwind; User id=sa; Password=;";
         string SqlQuery = "SELECT CustomerID, ContactName, CompanyName FROM
                Customers";

Rather than instantiate an OleDbCommand or a SqlCommand, or an OleDbConnection or a SqlConnection, we instead declare our variables at the interface level. Thinking back to our basic component and class designs, we know that we can treat a class as an interface that it implements without changing its internal structure or implementation. Being able to treat a class as one of its ancestors, or an interface that it implements, is a feature of object-oriented programming called polymorphism.

         IDbCommand GenericCommand;
         IDataReader GenericReader;
         IDbConnection GenericConnection;

         Console.Write("How do you take your data? (1 - Slap some OleDb on
                        it!, 2 - Slathered in SqlClient):");
         Input = Console.ReadLine();

         // our only conditional code will be in building our connection,
         // we'll use common interface-based code for the rest of the sample
         if (Input == "1")
            GenericConnection = new OleDbConnection( OleConnectionString );
         else
            GenericConnection = new SqlConnection( SqlConnectionString );
        
         GenericCommand = GenericConnection.CreateCommand();
         GenericConnection.Open();

         GenericCommand.CommandText = SqlQuery;
         GenericReader = GenericCommand.ExecuteReader();

         while (GenericReader.Read())
         {
            Console.WriteLine( "{0}\t{1}\t\t{2}",
                    GenericReader.GetString(0), GenericReader.GetString(1),
                    GenericReader.GetString(2) );
         }

         GenericReader.Close();
         GenericConnection. Close ();

      }
   }
}

Taking a close look at the code, you can see that the only thing that is different between the two providers is the connection string supplied to the connection, and the original connection type. The true beauty of this architecture is that even though we can refer to a SqlConnection as an IDbConnection, when we invoke the CreateCommand() function on it, we actually get back a class that internally deals with SqlConnections while conforming to the standard basic set of functions defined by the IDbCommand Interface.

The DataSet

The DataSet is a class devoted to providing a fully functional in-memory data store. It maintains a completely disconnected cache of data. The basic structure of a DataSet actually looks quite a bit like a miniature relational database on its own. It contains a set of tables, relationships between those tables, constraints and keys on those tables, and each of those tables contains a set of rows and columns very much like that of SQL Server or Access or any other relational database. It is important that you keep in mind that the DataSet is absolutely nothing like the ADO RecordSet and you should avoid trying to draw comparisons.

You can see from the illustration above that the DataSet is the parent to both tables and relations. The tables contained within the DataSet belong to the DataTable class. In turn, this class is home to collections of Rows, Columns, and Parent and Child Relations.

The internal data within a DataSet is maintained entirely in XML. In addition, the structure of the DataSet is actually defined by an XSD (XML Schema Definition language) schema. We'll talk more about how XML plays into the role of the DataSet later. For now, it's sufficient to say that a DataSet is built entirely upon XML and you cannot separate one from the other.

The truly important things to remember about the DataSet are that it is source-agnostic and that it operates in a completely disconnected model. By source-agnostic, we mean that the DataSet is designed so that it has no information as to where the data contained within it came from. It operates on its own set of data without concern for whom or what supplied the data to begin with, or where the data is eventually going to reside. The ability for the DataSet to operate without specific knowledge of the source or destination of its data makes it an incredibly versatile tool that can not only be marshaled easily through XML serialization, but it can in theory be decomposed and reconstructed on any platform that supports the DataSet without losing any of its functionality.

In addition to providing a completely disconnected, source-neutral data store with an internal XML format, the DataSet provides a facility for establishing complex hierarchies of data. Using Relations (discussed shortly), a programmer can establish master/detail relationships and parent or child relationships simply by adding new items to collections in the appropriate tables.

You might be asking yourself at this point, "What's so good about a source-neutral data store?"After all, the more a program knows about the nuances of the environment in which it is operating, the better it can perform, right? Well, yes, but no. Creating code that is too dependent on a specific condition causes maintenance problems. In addition to the benefits of the source-neutral data store, you also gain all the benefits of a completely disconnected operating model. This allows your data to potentially come from multiple disconnected and disparate sources on multiple platforms. If you hard-code the directory where your data files are, you can't change that directory without crashing your app. Having source-neutral data actually insulates the DataSet programmers from many headaches that other programmers might have to deal with.

Consider the following: millions of people own handheld devices or PDAs. One popular brand is the PalmPilot. Having done extensive programming for this little gem, I found out the hard way that when integers are stored on the Palm, they are actually stored in reverse byte order than those of their Windows counterparts. Therefore, an integer that comes from a PalmPilot is not the same as an integer that originated on Windows. Programmers have to know about this quirk and program extensively for it and make sure they don't forget about it. If they wrote all of their code to work against a source-neutral, in-memory representation of their data, they simply wouldn't have to care whether the native integer storage is different on Windows than it is on PalmOS – the DataAdapter responsible for populating the DataSet would have made that translation ahead of time. This is just the tip of the iceberg of the long list of problems that source-neutral data storage solves.

Once we've established a good groundwork of information on the individual pieces of a DataSet and how to use their basic functionality in the code, we'll cover in more detail the relationship between the DataSet class, XML, and XML Schemas.

DataTable

The DataTable is one of the core objects within the System.Data namespace. The table, in standard database design terms, is an organized collection of columns that can have any number of associated rows. We're all quite familiar with what a table constitutes in SQL or Access terms, so it should be fairly easy to understand the function of a table. The DataTable exposes the following properties that you should pay close attention to when learning the ADO.NET architecture:

  • Columns
    The Columns collection contains the list of all of the columns contained within a table. For example, in a Customers table, the Columns collection might contain column definitions for the Customer's ID, name and address. This property contains a collection of DataColumn objects.

  • Constraints
    Constraints are data rules that are applied to the table. A constraint enforces a certain condition on a table that cannot be broken. If changes to a DataTable will violate a constraint, then an exception will be thrown. This is also a collection object.

  • ChildRelations
    The DataSet provides the unique ability to establish a hierarchy of tables within its schema or structure. The ChildRelations collection is a collection of relations that define the relationships that establish the child tables of this DataTable.

  • ParentRelations
    This is the opposite of the ChildRelations collection. This collection contains a list of all of the relations necessary for establishing the parent tables of this DataTable.

  • PrimaryKey
    If you've done very much work with databases before, you know that most tables have primary keys. Tables within a DataSet are no different. In fact, there is complete support for composite keys, or primary keys that are composed of more than one column. For example, if you require not only an e-mail address, but also a credit card number to establish a unique user identity, you would create a composite key of the user's e-mail address and their credit card number.

    The PrimaryKey property is an array of DataColumn objects that represents all of the columns that comprise the primary key of a particular DataTable, allowing for incredibly robust and complex representations of data to be stored in the DataSet.
  • Rows
    Where the Columns collection provides the schema structure of the DataTable, the Rows collection provides the actual data contained within that table. It contains a collection of DataRow objects.

  • TableName
    The TableName is fairly self-explanatory. It represents the name of the DataTable. Even though it appears simple, it is actually a crucial property. The TableName of a given table can be used as the index in the parent DataSet's DataTableCollection (DataSet's Tables property).

    When this property is used to locate a table within a DataSet's Tables property, it is conditionally case-sensitive. That means that if two tables exist in the collection, that differ in name only by case, then the search for just those two tables is case-sensitive. In all other cases, locating tables by name is case- insensitive.

This list of properties is by no means complete, and is here to give you somewhere to start when researching the DataTable in more detail and a quick reference as to some of the most often used properties.

DataColumn

The DataColumn is the core of building the schema (structure) for the DataTable and, indirectly, the DataSet in which the parent table belongs. When you define DataColumns, you provide data type, size, and name information that will be used to control how data is accessed within the table. The DataColumn exposes quite a few properties and methods, and we've listed a few of the most often used ones here for reference:

  • AllowDBNull
    A Boolean flag indicating whether nulls are allowed in the column for rows in the column's parent table.

  • AutoIncrement
    Another Boolean flag, indicating whether or not the column will automatically increment the column's value for new rows inserted into the column's parent table.

  • Caption
    The caption of a column is its human-readable description. This is the description of the column that will be displayed in databinding controls such as a DataGrid, etc. Due to limitations of the architecture in the past, many programmers would select columns from SQL using a syntax like SELECTColumnNameas 'ColumnName' FROMTable in order to obtain a "pretty-printable" name for the column. With the use of the caption property, you can appropriately separate the data-store relevant column name from the UI-relevant column caption.

  • ColumnName
    This is the actual name of the column, rather than its displayable caption. This name is used to locate the column in the parent table's Columns property (of type DataColumnCollection). This property is also used by DataAdapters for transferring data to and from Connections.

  • DataType
    Rather than using an enumeration or some error-prone data type encoding system, the DataType property is actually able to store an actual .NET framework data type. This is possible through the use of Reflection and the System.Type class. Not only does this allow the programmers to specify actual data types rather than trying to remember strange enumeration names and mappings, but because System.Type represents language-neutral, CTS types, the columns of a DataSet thereby can take on language-neutral data types as well, making the DataSet fully portable across any .NET framework language.

    The following .NET framework data types are supported as valid column data types:

    • Boolean

    • Byte

    • Char

    • DateTime

    • Decimal

    • Double

    • Int16

    • Int32

    • Int64

    • SByte

    • Single

    • String

    • TimeSpan  UInt16
    • UInt32

    • UInt64

  • ReadOnly
    A Boolean flag indicating whether or not the value in the column of a row in the table can be modified. An exception will be thrown if code attempts to modify the value of a column that is marked as ReadOnly.

  • Unique
    A Boolean flag indicating that all values in the column for all rows in the table must remain unique. If a change to the table causes duplicate values in this column for the rows of the table then an exception will be thrown.

DataRow

The DataRow is the essential component of a DataTable object. The DataRow is a formatted, structured data container for rows of data within a table. The DataRow object is what houses the Create, Retrieve, Update, and Delete functionality (affectionately abbreviated as CRUD by many programmers) for rows of data within a table.

Before you start to think that you've seen it all before, and you know what rows are, you should stop right there. This isn't your grandmother's old-fashioned DataRow! One of the most impressive things about ADO.NET DataRows is that they are versioned. No, you are not seeing things, and no you don't have to write any additional code to take advantage of row versioning support. It's all there for you, like a big cake on your birthday. Versioning allows for various different states of a given DataRow to be retained in memory and retrieved. This provides the programmer with the ability to at any given time, view or cancel the changes pending to a row, or when rows are selected, only a specific version of that row can be retrieved, etc. Versioning is indicated by the DataRowVersion enumeration, which contains the following values: Current, Original, Default, and Proposed.

We've selected a few of the most frequently used and important methods and properties of the DataRow class to give you a feel for how it is used and where it sits in the ADO.NET scheme of things.

RowState – the RowState property is of the enumerated type DataRowState. It indicates to the programmer (and to the GetChanges() and HasChanges() methods) the current state of the DataRow. That state can be indicated by any one of the following enumerated values:

Added – the row has been added to the table and the AcceptChanges() method has not yet been called.

Deleted – the row was deleted from the table using the Delete() method of the DataRow.

Detached – the row is not actually part of a table. It has either just been created without being part of a collection or has just been removed from a collection.

Modified – the data within the row has been modified and AcceptChanges() has not yet been called.

Unchanged – the data within the row has not changed since the last call to AcceptChanges()

Item – the Item property is heavily overloaded to allow for many different ways of obtaining the data stored in a specific column in the row. You can supply a column's name, allowing you to both set and retrieve the value. You can supply a DataColumn object, which also allows you to both set and retrieve the value. In addition, you can supply the ordinal position of the column and it will allow you to set and retrieve the value. If you also supply a DataRowVersion enumerated value as well as one of the previous three indexing methods, you will be allowed to retrieve a specific version (Current, Original, Default, or Proposed) of the data stored in the specified column. In C# you can choose to either use the Item property or use the indexer property, which affords you a syntax that resembles array indexing.

BeginEdit() – this function places the DataRow into edit mode, which temporarily suspends event triggering, allowing the code to modify data for more than one row at a time. A programmer might want to do this to suspend validation rules by placing several rows in edit mode, loading the data, and then committing the changes.

CancelEdit() – this function will take the DataRow out of edit mode and discard any changes made to the row since the BeginEdit() method was called.

Delete() – this function will delete the current row.

EndEdit() – this function will complete edit mode for the current row, saving changes made to the DataSet since the BeginEdit() method was called.

AcceptChanges() – this function will implicitly invoke the EndEdit() method. If the RowState of the row before this function was called was Added or Modified, the RowState then becomes Unchanged. If the RowState was Deleted before the function call, it will then be physically removed.

DataRelation

The DataRelation class is an incredibly powerful tool, yet it is equally simple to use. The DataRelation class is designed to contain the data required to describe a relationship between two DataTable objects through DataColumn objects. The DataRelation functions within a DataSet in much the same way as relationships between tables function in SQL Server.

When creating a DataRelation, you are creating a parent/child relationship between two tables where the matching columns used to establish the relationship must be of the same data type. Therefore, you cannot have a primary key in the parent table as an integer and the related column in the child table as a string. Using DataRelations in combination with Constraints, you can actually allow for changes to cascade downward from parent to child or upward from child to parent.

The following example snippet of code demonstrates how to create a new DataRelation and place it into the Relations collection of the associated DataSet. As usual, we'll be using tables from the Northwind database.

DataColumn Parent;
DataColumn Child;

Parent = CustomerDataSet.Tables["Customers"].Columns["CustID"];
Child = CustomerDataSet.Tables["Orders"].Columns["CustID"];

DataRelation CustomerOrders =
   new DataRelation( "CustomersOrders", Parent, Child );
CustomerDataSet.Relations.Add( CustomerOrders );

Once the relation is in place, you can navigate to a given row in the Customers table and obtain all of that customer's orders by using the GetChildRows() function. Simply specify the name of the relationship that you're using to obtain the child rows as follows and everything else is handled for you.

DataRow[] Orders;
Orders = CustomerDataSet.Tables["Customers"].Rows[0].GetChildRows( "CustomersOrders" );
foreach (DataRow OrderRow in Orders)
{
   // display or process an order
}

Another one of the great features of using a DataRelation is that when using the GetChildRows() function, you can actually pass a value from the DataRowVersion enumeration. This allows you to effectively obtain all child rows with the Current values, or Default values, or Original values, or even Proposed values (changes have not been accepted yet). So, you could change the line above that retrieves child rows to only retrieve the Original values of the rows:

Orders = CustomerDataSet.Tables["Customers"].Rows[0].GetChildRows(
   "CustomerOrders", DataRowVersion.Original );

Updating Data with the DataSet

As we've said before, the DataSet is a completely disconnected, source-neutral data store. So, if it is completely disconnected, how can we use it to update data? The answer is through the use of the utility class, DataAdapter. The DataAdapter can be compared to a communication wire in that its sole responsibility in life is to relay information across a wire between two normally disconnected entities.

Using the UpdateCommand, DeleteCommand, InsertCommand, and SelectCommand properties of the IDbDataAdapter implementation, you can then use that adapter to populate your DataSet as well as transfer all of the changes made to it back to the original data source. It is important to note that a connection to the database (or some other data store) will only be maintained for the duration of the specific command being executed. The DataSet will remain completely disconnected the entire time.

We'll show a small snippet of code for updating information in a database from a DataSet, and then later we'll show a fully functional sample of updating data in both a connected and disconnected fashion using the DataSet. This particular example uses a fictitious table so don't fret if you can't find it in the Northwind database.

// The default constructor for the adapter populates the SelectCommand
// property.
SqlDataAdapter FileDA = new SqlDataAdapter( "SELECT FileName, Description FROM Files", Connection );
FileDA.UpdateCommand = new SqlCommand("UPDATE Files SET Description = @Description WHERE FileName = @FileName", Connection );
FileDA.UpdateCommand.Parameters.Add( "@Description", SqlDbType.VarChar, 30 );

SqlParameter KeyParam = FileDA.UpdateCommand.Parameters.Add( "@FileName", SqlDbType.VarChar, 8 );
KeyParam.SourceColumn = "FileName";
KeyParam.SourceVersion = DataRowVersion.Original;

Connection.Open();
FileDA.Fill( MyDS, "Files" );

// just because we can, modify the 12th row.

The following line of code obtains a reference to a specific row within the "Files" table.

DataRow MyFile = MyDS.Tables["Files"].Rows[11];

Use the MyFile reference to modify the value of a single column within that row.

MyFile["Description"] = "Semi-Useful File.";
// Don't be fooled by the name, the Update method calls
// ALL of the appropriate UPDATE, DELETE, or INSERT commands
// for those records with the appropriate state.
FileDA.Update(MyDS);

One thing that might not be immediately obvious is that the UpdateCommand (and all other command properties respectively) can take any form of database command. This includes stored procedures. What this means is that not only can you rig up your own SQL statements to carry your data changes across to the database, but you can also supply the prototypes for stored procedures, allowing for some incredibly advanced (and efficient) processing in response to changes made to your DataSet.

XML Schemas

We gave a brief indication earlier that the core structure and data within an ADO.NET DataSet is entirely built on XML. It is very useful that any programmer working with a DataSet knows that its internal structure is represented by an XML Schema. Knowing how the DataSets form their internal structure gives you more control over what you can do with DataSets, including creating strongly-typed DataSets (another topic entirely).

For those not familiar with XSD (XML Schema Definition), it is an XML dialect for describing relational data structures. Not only is it the internal format in which the structure of a DataSet is maintained, but XSD is used extensively throughout the entire .NET framework, including Web Services, Remoting, and in many other places.

For a detailed reference and excellent examples of how to write your own XSD and learn all of the ins and outs of this technology, you should pick up a copy of Professional XML Schemas from Wrox Press.

There are two different types of XML Schemas utilized by the DataSet object:

Inferred Schema – when a DataSet loads data from a source from which it cannot also determine the schema (such as an XML file that does not include an in-line schema), the DataSet must infer this schema. It does this by examining the data that it has found and extrapolating relations and constraints. However, you should know that when a DataSet reads from an XML file without a schema, all of the column data types will be strings as there isn't enough information to reliably infer data type. The following is a quick example that loads some data from a standard XML document and outputs a .XSD file that represents the schema that the DataSet inferred from the data. Note that we didn't have to do anything to generate the schema. As we've said before, the schema is an integral part of the DataSet, and you cannot ever have a DataSet with no schema.

using System;
using System.Data;

namespace Wrox.ProDotNet.Chapter10.Schemas2
{
   class ClassMain
   {
      static void Main(string[] args)
      {
         DataSet WidgetDS = new DataSet();
        
         WidgetDS.ReadXml( "WidgetSource.XML", XmlReadMode.InferSchema );
         WidgetDS.WriteXmlSchema( "Schema2.XSD" );
      }
   }
}

The above code is only a few lines, but there's a lot of stuff going on in the background. When the DataSet is loaded, a schema representing the structure and relations of the data is inferred and stored as the DataSet's internal structure. In order to compile and run the above code, place the WidgetSource.XML file in the obj\debug directory beneath the project directory and compile with a reference to System.Data.dll. The output of the Schema2.XSD file looks like this:

<?xml version="1.0" standalone="yes"?>
<xsd:schema id="WIDGETS" targetNamespace="" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xsd:element name="WIDGETS" msdata:IsDataSet="true">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element name="WIDGET">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="WIDGETID" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="DESCRIPTION" type="xsd:string"
                           minOccurs="0" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

The raw XML File, WidgetSource.XML, that generated the above schema looks like this:

<WIDGETS>
   <WIDGET>
      <WIDGETID>1</WIDGETID>
      <DESCRIPTION>Nice Widget</DESCRIPTION>
   </WIDGET>
   <WIDGET>
      <WIDGETID>2</WIDGETID>
      <DESCRIPTION>Angry Widget</DESCRIPTION>
   </WIDGET>
</ WIDGETS>

Supplied Schema – instead of letting the DataSet infer a (possibly inaccurate) schema, you can supply your own. You can invoke the ReadXmlSchema() method on the DataSet itself or load a schema from a stream or from a TextReader. If a DataAdapter fills the DataSet, it is the responsibility of the DataAdapter to supply the schema. The SqlDataAdapter will supply the DataSet with a completely accurate schema, including data types. The other way to supply your own schema is to build the DataTable and DataColumn objects on your own. Using these objects, you will implicitly build the schema in your DataSet. The following is a quick sample that shows manually building a DataSet and then outputting its schema to a text file. You can build the following example by simply compiling with a reference to System.Data.dll.

using System;

using System.Data;

namespace Wrox.ProDotNet.Chapter10.Schemas1
{
   /// <summary>
   /// Summary description for ClassMain.
   /// </summary>
   class ClassMain
   {
      static void Main(string[] args)
      {
         DataSet SampleDS = new DataSet();
         DataTable Widgets = new DataTable( "Widgets" );
         Widgets.Columns.Add( "WidgetID", typeof(int) );
         Widgets.Columns.Add( "Description", typeof(string) );
        

         SampleDS.Tables.Add( Widgets );
         SampleDS.DataSetName = "WidgetDataSet";

         SampleDS.WriteXmlSchema( "schema1.xsd" );
      }
   }
}

The above code creates a DataSet and then calls the WriteXmlSchema() method, which creates the following schema:

<?xml version="1.0" standalone="yes"?>
<xsd:schema id="WidgetDataSet" targetNamespace="" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xsd:element name="WidgetDataSet" msdata:IsDataSet="true">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element name="Widgets">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="WidgetID" type="xsd:int" minOccurs="0" />
              <xsd:element name="Description" type="xsd:string"
                           minOccurs="0" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

You've already seen code that reads from a SQL Server, so we won't bore you with that (it is readily available in the download section for this chapter, anyway). What we would like to demonstrate is the result of a supplied schema through the use of a SqlDataAdapter. Remember that we said that the DataSet cannot infer data type, it can only infer columns and table relations. So, based on the following XSD file generated by our third schema sample, we know that the SqlDataAdapter is actually retrieving schema information from SQL Server and placing it directly into the DataSet. Quite a lot of work is being done on your behalf without you even needing to worry about it.

<?xml version="1.0" standalone="yes"?>
<xsd:schema id="NewDataSet" targetNamespace="" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xsd:element name="NewDataSet" msdata:IsDataSet="true">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element name="Customers">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="EmployeeID" type="xsd:int" minOccurs="0" />
              <xsd:element name="LastName" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="FirstName" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="Title" type="xsd:string" minOccurs="0" />
              <xsd:element name="TitleOfCourtesy" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="BirthDate" type="xsd:dateTime"
                           minOccurs="0" />
              <xsd:element name="HireDate" type="xsd:dateTime"
                           minOccurs="0" />
              <xsd:element name="Address" type="xsd:string" minOccurs="0" />
              <xsd:element name="City" type="xsd:string" minOccurs="0" />
              <xsd:element name="Region" type="xsd:string" minOccurs="0" />
              <xsd:element name="PostalCode" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="Country" type="xsd:string" minOccurs="0" />
              <xsd:element name="HomePhone" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="Extension" type="xsd:string"
                           minOccurs="0" />
              <xsd:element name="Photo" minOccurs="0">
                <xsd:simpleType>
                  <xsd:restriction base="xsd:base64Binary" />
                </xsd:simpleType>
              </xsd:element>
              <xsd:element name="Notes" type="xsd:string" minOccurs="0" />
              <xsd:element name="ReportsTo" type="xsd:int" minOccurs="0" />
              <xsd:element name="PhotoPath" type="xsd:string"
                           minOccurs="0" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

If you've got good eyes, you'll notice that the parent element in this schema is actually called "Customers", and not "Employees", even though the code actually selects all columns from the Employees table. This was done deliberately to show you that we can control the name of the table into which the results from SQL are placed, but the rest of the schema is supplied entirely by the data structure of the SQL table itself.

One other tip on XML Schemas and DataSets before we move along: if you aren't all that comfortable with generating your own schemas by hand, or you don't like using Visual Studio.NET's Schema editor, then you have one other option. You saw how easy it was for us to use the DataSet to infer and store to disk a schema based on existing data. If you want to build a schema based on a pre-existing XML file or set of data that you have, you can simply load the data into the DataSet, save the schema, and then make minor tweaks and changes such as configuring the data types properly.

ADO.NET Pros and Cons

So far throughout this chapter, you've been given a fairly straightforward, objective overview of the many features and uses for ADO.NET. Believe it or not, there may actually be a few occasions where you might not actually want to use ADO.NET. I know how you feel; I was shocked nearly to the point of fainting when I first found out myself!

No programmer should ever jump ship on tried and true methods without first exploring both the positive and the negative side of any new technology. This next section of the chapter should give you a good idea of not only the good things that come with ADO.NET, but some of its drawbacks.

Pros

ADO.NET is rich with plenty of features that are bound to impress even the most skeptical of programmers. If this weren't the case, Microsoft wouldn't even be able to get anyone to use the Beta. What we've done here is come up with a short list of some of the more outstanding benefits to using the ADO.NET architecture and the System.Data namespace.

Performance – there is no doubt that ADO.NET is extremely fast. The actual figures vary depending on who performed the test and which benchmark was being used, but ADO.NET performs much, much faster at the same tasks than its predecessor, ADO. Some of the reasons why ADO.NET is faster than ADO are discussed in the ADO versus ADO.NET section later in this chapter.

Optimized SQL Provider – in addition to performing well under general circumstances, ADO.NET includes a SQL Server Data Provider that is highly optimized for interaction with SQL Server. It uses SQL Server's own TDS (Tabular Data Stream) format for exchanging information. Without question, your SQL Server 7 and above data access operations will run blazingly fast utilizing this optimized Data Provider.

XML Support (and Reliance) – everything you do in ADO.NET at some point will boil down to the use of XML. In fact, many of the classes in ADO.NET, such as the DataSet, are so intertwined with XML that they simply cannot exist or function without utilizing the technology. You'll see later when we compare and contrast the "old" and the "new" why the reliance on XML for internal storage provides many, many advantages, both to the framework and to the programmer utilizing the class library.

Disconnected Operation Model – the core ADO.NET class, the DataSet, operates in an entirely disconnected fashion. This may be new to some programmers, but it is a remarkably efficient and scalable architecture. Because the disconnected model allows for the DataSet class to be unaware of the origin of its data, an unlimited number of supported data sources can be plugged into code without any hassle in the future.

Rich Object Model – the entire ADO.NET architecture is built on a hierarchy of class inheritance and interface implementation. Once you start looking for things you need within this namespace, you'll find that the logical inheritance of features and base class support makes the entire system extremely easy to use, and very customizable to suit your own needs. It is just another example of how everything in the .NET framework is pushing toward a trend of strong application design and strong OOP implementations.

Cons

Hard as it may be to believe, there are a couple of drawbacks or disadvantages to using the ADO.NET architecture. I'm sure others can find many more faults than we list here, but we decided to stick with a short list of some of the more obvious and important shortcomings of the technology.

Managed-Only Access – for a few obvious reasons, and some far more technical, you cannot utilize the ADO.NET architecture from anything but managed code. This means that there is no COM interoperability allowed for ADO.NET. Therefore, in order to take advantage of the advanced SQL Server Data Provider and any other feature like DataSets, XML internal data storage, etc, your code must be running under the CLR.

Only Three Managed Data Providers (so far) – unfortunately, if you need to access any data that requires a driver that cannot be used through either an OLEDB provider or the SQL Server Data Provider, then you may be out of luck. However, the good news is that the OLEDB provider for ODBC is available for download from Microsoft. At that point the down-side becomes one of performance, in which you are invoking multiple layers of abstraction as well as crossing the COM InterOp gap, incurring some initial overhead as well.

Learning Curve – despite the misleading name, ADO.NET is not simply a new version of ADO, nor should it even be considered a direct successor. ADO.NET should be thought of more as the data access class library for use with the .NET framework. The difficulty in learning to use ADO.NET to its fullest is that a lot of it does seem familiar. It is this that causes some common pitfalls. Programmers need to learn that even though some syntax may appear the same, there is actually a considerable amount of difference in the internal workings of many classes. For example (this will be discussed in far more detail later), an ADO.NET DataSet is nothing at all like a disconnected ADO RecordSet. Some may consider a learning curve a drawback, but I consider learning curves more like scheduling issues. There's a learning curve in learning anything new; it's just up to you to schedule that curve into your time so that you can learn the new technology at a pace that fits your schedule.

ADO.NET Samples

So far, we've covered the basic architecture of the System.Data namespace, and we've gone into a bit of detail on some of the major players in ADO.NET's data access scheme, like the Connection class, the Command class, the DataAdapter and more. Now its time to put all of this overview together into some coherent samples. Our first sample shows the creation, updating, deleting, and retrieval of data using a DataSet and an adapter that communicates with SQL Server 2000. The next example illustrates the same CRUD principle utilizing an offline data store (an XML file on disk).

A Server Connection Data Example

The first of our ADO.NET review examples will demonstrate using a DataSet and a DataAdapter to retrieve data from SQL Server, and post changes such as inserts, updates, and deletes through the adapter and their appropriate IDbCommand objects.

This is the listing for the ClassMain.cs file, the main file of our little sample Console Application. A few of the tabs have been removed to try to fit all the code into the space of the pages here.

   

using System;
using System.Data;
using System.Data.SqlClient;



namespace Wrox.ProDotNet.Chapter10.ADONET_Connected
{
class ClassMain
{
static void Main(string[] args)
{
   DataSet SampleDS = new DataSet( "SampleDS" );
   string ConnectionString = "Data Source=localhost; Initial
          Catalog=Northwind; User id=sa; Password=;";
   SqlConnection Connection = new SqlConnection( ConnectionString );
   SqlDataAdapter SampleDA = new SqlDataAdapter( "SELECT RegionID,
          RegionDescription FROM Region", Connection );
   DataRow TempRow;
   // now lets provide SQL statements that support the rest of the
   // UPDATE, DELETE, and INSERT functionality.

There is a way to get VS to build all these commands for us, but it will only work if VS is connected directly to the source to interrogate it. If we were working on an n-tier solution where VS only sees derived classes, we'd have to do this manually anyway.

It's worth noting here than when creating commands to commit changes for the DataAdapter, it is important to indicate the source column. You can do that by setting the SourceColumn property on the Parameter object, or by passing it as the argument immediately following the parameter's data type size.

For example, the parameter, @RegionDescription, when the DataAdapter attempted to invoke the Update command, couldn't decipher on its own that the value to be supplied for @RegionDescription was the RegionDescription column. By supplying the SourceColumn to the Parameters.Add() method on our own, we guarantee that the adapter will supply the right column values for the right arguments.

   // -- UPDATE --
   SqlCommand UpdateCommand = new SqlCommand( "UPDATE Region SET
              RegionDescription = @RegionDescription WHERE RegionID =
              @RegionID", Connection );
   SqlParameter UpdateKeyParam = new SqlParameter( "@RegionID",
              SqlDbType.Int );
   UpdateKeyParam.SourceColumn = "RegionID";
   UpdateKeyParam.SourceVersion = DataRowVersion.Original;
   UpdateCommand.Parameters.Add( "@RegionDescription", SqlDbType.NChar, 50,
              "RegionDescription" );
   UpdateCommand.Parameters.Add( UpdateKeyParam );
   SampleDA.UpdateCommand = UpdateCommand;

The other thing to keep in mind in the code, both above and below, is that instead of supplying raw SQL statements, we could easily be supplying stored procedures. For example, if we wanted to use a stored procedure for the above UpdateCommand, we would change the first line of code to the following two lines:

SqlCommand UpdateCommand = new SqlCommand( "sp_UpdateRegion", Connection );
UpdateCommand.CommandType = CommandType.StoredProcedure;

Using this particular construct, we can create incredible amounts of functionality with very little work on our part and very few lines of code.

   // -- DELETE --
   SqlCommand DeleteCommand = new SqlCommand( "DELETE Region WHERE RegionID
              = @RegionID", Connection );
   SqlParameter DeleteKeyParam = new SqlParameter( "@RegionID",
              SqlDbType.Int );
   DeleteKeyParam.SourceColumn = "RegionID";
   DeleteKeyParam.SourceVersion = DataRowVersion.Original;
   DeleteCommand.Parameters.Add( DeleteKeyParam );
   SampleDA.DeleteCommand = DeleteCommand;
  
   // -- INSERT --
   SqlCommand InsertCommand = new SqlCommand( "INSERT INTO
              Region(RegionDescription, RegionID) VALUES(@RegionDescription,
              @RegionID)", Connection );           
   InsertCommand.Parameters.Add( "@RegionDescription", SqlDbType.NChar, 50,
              "RegionDescription" );
   InsertCommand.Parameters.Add( "@RegionID", SqlDbType.Int, 4,
              "RegionID" );
   SampleDA.InsertCommand = InsertCommand;

   // now load the dataset with the results of our SQL Query.
   // notice that we're not explicitly opening our conection. Our   
   // DataAdapter is doing all that work for us, closing it as soon as it
   // completes its task.
   SampleDA.Fill( SampleDS, "Region" );
  
   // create a new row
   // the Region table doesn't have an autonumbering identity,
   // so we have to supply our own region ID.
   DataRow NewRow;
   NewRow = SampleDS.Tables["Region"].NewRow();
   NewRow["RegionDescription"] = "Central";
   NewRow["RegionID"] = 5;
   Console.WriteLine("New Row Created using NewRow(), RowState is: {0}",
           NewRow.RowState );

   SampleDS.Tables["Region"].Rows.Add( NewRow );
   Console.WriteLine("New Row Added to Table RowState is: {0}",
           NewRow.RowState );

   // modify the first row
   TempRow = SampleDS.Tables["Region"].Rows[0];
   Console.WriteLine("Modifying First Row, Pre-Change State is: {0}",
           TempRow.RowState );
   TempRow["RegionDescription"] = "Reeeeeaalllly Far East";
   Console.WriteLine("Modifying First Row, Post-Change State is: {0}",
           TempRow.RowState );
   // call the update method to save the new row and update the first
   Console.WriteLine("Calling Update() to Commit New Row and First Row Change.");
   SampleDA.Update( SampleDS, "Region" );

   // delete the second row
   Console.WriteLine("Deleting Our New Row, Post-Delete State is: {0}",
           NewRow.RowState );
   NewRow.Delete();
   Console.WriteLine("Deleting Our New Row, Post-Delete State is: {0}",
           NewRow.RowState );
   // now call the update method.
   Console.WriteLine("Calling Update() - this will post ALL of our changes, not just the update.");
   SampleDA.Update( SampleDS, "Region" );
   Console.WriteLine("Region Table after Update()\n------------------------------");
     
   foreach (DataRow tRow in SampleDS.Tables["Region"].Rows )
   {
      Console.WriteLine( tRow["RegionDescription"] );
   }

   SampleDS.Dispose();
   SampleDA.Dispose ();
   }
}
}

After going through the source code above, you can take a look at what the output of the example produces. It is especially interesting to see how the RowState property changes with each line of code, and that it changes implicitly without the programmer having to keep track of changes on his/her own.

The code creates a new region called "Southern". Then it modifies the first region to change its name to "Reeeeaalllly Far East". Then the Update() method is called to commit both the new row, and the change to the name of the first region. After that, the code deletes the "Southern" region simply by calling the Delete() method on the DataRow object, and then Update() is called again to commit that change. If you don't believe what's going on, you can always add a breakpoint to this code immediately before the second Update() that deletes the new row and go and verify that it actually has been inserted into the database.

New Row Created using NewRow(), RowState is: Detached
New Row Added to Table RowState is: Added
Modifying First Row, Pre-Change State is: Unchanged
Modifying First Row, Post-Change State is: Modified
Calling Update() to Commit New Row and First Row Change.
Deleting Our New Row, Pre-Delete State is: Unchanged
Deleting Our New Row, Post-Delete State is: Deleted
Calling Update() - this will post ALL of our changes, not just the update.
Region Table after Update()
------------------------------
Reeeeeaalllly Far East
Western                                          
Northern                                         
Southern

An Offline Data Example

What we've done for our offline example is take the regions and place them into an XML file. Also, we've created an XSD schema representing the Regions table (we did this by adding a line of code to the above connected example calling the WriteXmlSchema() method to create the .XSD file).

For practical purposes, you will want to use the DataSet's ability to save its own data as an XML file (or stream) when you are exporting data, possibly for some kind of business-to-business document exchange or for later import into another application. Also, some small desktop applications that don't connect to high-end RDBMSs for their data use XML as their sole data store in files on disk.

This example is going to perform the same exact operations on an offline data store (XML file) as the previous example. The only differences will be in that we don't need to use an adapter, we'll use the AcceptChanges() method instead of Update(), and we'll use the WriteXml()method to save our DataSet to disk rather than stream it to a database.

Let's look at the code we wrote for the offline, XML-based ADO.NET sample. To make sure that this code works properly, make sure that the Region.XSD file is in the obj\debug directory of the project so that the executable can find the file, and make sure that it is compiled with a reference to the System.Data DLL.

using System;
using System.Data;

namespace Wrox.ProDotNet.Chapter10.ADONET_Offline
{
class ClassMain
{
static void Main(string[] args)
{
   DataRow TempRow;
   DataSet SampleDS = new DataSet( "SampleDS" );

ReadXmlSchema()supplies the DataSet with the XSD indicated by either the stream or text file that we supply as an argument. An interesting thing to do is put a watch on your DataSet variable and breakpoint before and after this call. Immediately following this call, you'll see that all of the column definitions have been created, data types have been modified, and any DataRelations or Constraints that are applicable have also been placed into the DataSet.

   SampleDS.ReadXmlSchema( "Region.XSD" );
   // loading the data from the XML file after importing a supplied schema
   // accomplishes the same thing for the DataSet as having an Adapter
   // plug in data from a connected store like SQL.
   SampleDS.ReadXml( "Region.XML" );

Note that all of the code from this point on, with the exception of calling AcceptChanges(), is accomplished in the same fashion as our previous example. It is another illustration of one of the benefits of having a source-neutral data store.

   DataRow NewRow;
   NewRow = SampleDS.Tables[ "Region" ].NewRow();
   NewRow["RegionDescription"] = "A little to the Left";
   NewRow["RegionID"] = 5;
   Console.WriteLine("New Region Created using NewRow(), RowState is: {0}",
           NewRow.RowState);
   SampleDS.Tables[ "Region" ].Rows.Add( NewRow );
   Console.WriteLine("New Region RowState After adding to Table: {0}",
           NewRow.RowState);

   // modify the first row
   TempRow = SampleDS.Tables[ "Region" ].Rows[0];
   Console.WriteLine( "First row Pre-Modify State is : {0}",
           TempRow.RowState );
TempRow["RegionDescription"] = "Reeeeeaalllly Far East";
   // commit changes to dataset.
   Console.WriteLine( "Calling AcceptChanges() in DataSet to commit


           values." );

   SampleDS.AcceptChanges();

   // now delete the row we created.
   Console.WriteLine( "New Region, Pre-Delete RowState is: {0}",
           NewRow.RowState );
   NewRow.Delete();
   Console.WriteLine( "New Region, Post-Delete RowState is: {0}",
           NewRow.RowState);

   Console.WriteLine(" Calling AcceptChanges() in DataSet to commit deletion
           of new region" );

   SampleDS.AcceptChanges();

   Console.WriteLine( "Writing new DataSet to XML." );

WriteXml()serializes the internal data storage format of the DataSet to either a stream or a text file. In our case, we're writing to a different file to the one we opened to allow the reader to compare and contrast the XML files. We could just as easily have written our modified XML to the source document, truly committing the changes to disk.

   SampleDS.WriteXml( "Region_Changed.XML" );

   Console.WriteLine( "SampleDS Regions after changes:" );
   Console.WriteLine( "-------------------------------" );

   foreach (DataRow tRow in SampleDS.Tables[ "Region" ].Rows )
   {
   Console.WriteLine( tRow["RegionDescription"] );
   }

   SampleDS.Dispose();

   }
}
}

Now let's take a look at the output we generated by running our offline ADO.NET sample:

New Region Created using NewRow(), RowState is: Detached
New Region RowState After adding to Table: Added
First row Pre-Modify State is : Added
Calling AcceptChanges() in DataSet to commit values.
New Region, Pre-Delete RowState is: Unchanged
New Region, Post-Delete RowState is: Deleted
Calling AcceptChanges() in DataSet to commit deletion of new region
Writing new DataSet to XML.
SampleDS Regions after changes:
-------------------------------
Reeeeeaalllly Far East
Western                                          
Northern                                          
Southern                                         

In our sample output, you can see that the modification of the first region was successful, changing its description to "Reeeeeaalllly Far East". You can also see that the "A little to the Left" region is not in the DataSet. As with our previous sample, if you really aren't convinced of what's going on, you can breakpoint the sample immediately before the second AcceptChanges() method is called and examine the contents. You will see the fifth region in there, waiting to be removed.

As for the final XML, the resulting XML document looks like this:

<?xml version="1.0" standalone="yes"?>
<SampleDS>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Reeeeeaalllly Far East</RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                         
    </RegionDescription>
  </Region>
</ SampleDS>

You should notice that after the Western, Northern, and Southern regions, there is a large amount of whitespace in the XML file, the reason being is that in SQL, those were originally defined as Nchar(50), which will actually pad remaining room in the string with whitespace. Region 1, which we edited without such a restriction, does not have padded whitespace. Situations similar to this where the DataSet loads an XML file and saves one in a slightly different format is what is known as a loss of fidelity in the underlying XML document. We'll discuss fidelity loss and preventing occurrences like this (and other far more damaging issues) later when we discuss using the XmlDataDocument.

ADO versus ADO.NET

You might be thinking that we took our sweet time getting to this particular section of the chapter. The reason it's so late in the chapter is so that you really appreciate all of the differences between ADO and ADO.NET, and when might be the appropriate time to use either of them: you need to be familiar with the basic features and functionality of ADO.NET. We're assuming that you're familiar with ADO already. If you aren't familiar with ADO and are just learning ADO.NET, you may want to skim this section and read those things that apply to ADO.NET and ignore the ADO descriptions.

Choose your Weapon

When we were looking at the Pros and Cons of ADO.NET previously, we mentioned that there might actually be times when you don't want to use ADO.NET. Despite the startling nature of this statement, it is true.

When to use ADO.NET – if the code that you are writing that needs underlying data access is managed (that is, it runs managed by the CLR) then you really don't want to be using anything but ADO.NET. ADO.NET is a set of managed classes, and they are optimized to be run in a managed environment. The only time this doesn't apply is if your data source simply cannot be reached by anything that the two ADO.NET Data Providers can offer. Note that it is physically impossible to use any ADO.NET feature through COM InterOp or in any other way from an unmanaged application.

When NOT to use ADO.NET – there may be times where you are writing your native .NET code, but the data source to which you need access cannot be reached by any of the three ADO.NET Data Providers (though Odbc's Data Provider must be downloaded separately). In this case, you could use ADO through COM InterOp (though this is not recommended), or you could simply write classic VB or C++ code to utilize ADO.

The general rule is that if you are writing code in a managed language like C# or VB.NET, then you should try to use ADO.NET for as much as you possibly can. Only use ADO from .NET when there is no other alternative. If you are writing your code in unmanaged languages like VB6 or C++, then you should continue using classic ADO.

ADO.NET DataSet versus ADODB.RecordSet

You can't discuss ADO without discussing the RecordSet, nor can you discuss ADO.NET without discussing the DataSet. They are both core components of their respective technologies. What we're going to provide here is a feature-by-feature comparison of how each of these objects implements those features. The ultimate goal of this exercise is to prove that the ADO.NET DataSet is NOT the same thing as an ADORecordSet, and that they are in fact two different solutions to two different paradigms.

Table Multiplicity – the traditional ADO Recordset provides the ability to navigate through a single table of information. Sure, that single table may be the result of multiple joins done at the server level, but it is still a single table access. The DataSet, on the other hand, has support for many, many tables within the same DataSet that can be iterated separately or by navigating parent/child relationships in a robust hierarchy of tables, rows, constraints and relations. With its ability to support multiple tables with keys, constraints and interconnected relationships, the DataSet can be considered a small, in-memory relational database cache.

Navigation – Navigation in the ADO Recordset is based on the cursor model. Even though you can specify a client-side cursor, you must still move a navigation pointer from one location to the next. Primitive seeking and filtering routines allow you to navigate in bursts, but the model is still based on the cursor. The DataSet, on the other hand, is an entirely offline, in-memory, cache of data. As such, all of its data is available all the time. At any point, the programmer can retrieve any row or column, constraint or relation simply by accessing it either ordinally or by retrieving it from a name-based collection.

Incidentally, it is worth pointing out that the ADO RecordSet was not originally designed to operate in a disconnected fashion. All of that functionality was added later as demand for more features rose. This caused a bloat in the featureset of the ADO RecordSet that made it try to accomplish too many tasks. Early users of ADO will remember that the Recordset was originally designed to be a simple, server-side iterator tool and not much more than that. As demand for offline functionality and extra features rose, features were slapped onto the original design, bloating the functionality and internal code of the Recordset.

Connectivity Model – we mentioned briefly in the navigation section that the DataSet is a completely offline, in-memory cache of data. As well, the DataSet is designed specifically to have information about the source of the data contained within it. It has one purpose and one purpose only. The ADO RecordSet was originally designed without the ability to operate in a disconnected fashion. This functionality was added later, and many programmers complain that the offline functionality seems kludgy and stamped on like an afterthought. Even the ability to create a recordset manually column-by-column without a database connection was something that was added on afterward. As we demonstrated earlier, the DataSet can accept and post information to/from connected servers through the help of a DataAdapter, and it can maintain a completely disconnected, offline XML representation of data being defined by an XSD schema and an XML document. The pure disconnected connectivity model of the DataSet gives the DataSet much more scalability and versatility in the amount of things it can do and how easily it can do them.

Marshaling and Serialization – in the world of COM programming, any time you pass data from one COM component to another that data needs to undergo a process called marshaling. Marshaling involves copying and processing data so that a complex type can appear to the receiving component the same as it appeared to the sending component. In fact, many dictionaries list "assembly" as a synonym for marshaling. A marshaler is, in a sense, decomposing and re-assembling your data in a new location. Any COM programmer can tell you that marshaling is an expensive operation. The DataSet and the DataTable components support Remoting (discussed in the next item) in the form of XML serialization. Rather than spending all the time and effort using COM marshaling to transfer the DataSet across a wire, it is simply converted into XML (an easy task, since we know it is internally stored as XML already) and sent across process boundaries.

Firewalls and DCOM and Remoting Oh My! – those who've worked with DCOM enough will know that it is nearly impossible to marshal a DCOM component across a router. Obviously people have come up with their own tricks and workarounds to get around this limitation, but the inherent limitation is still there. As such, it is very, very difficult to utilize DCOM components or transfer DCOM components in and out of a firewall, across the Internet itself, etc. Through the use of Remoting (.NET's answer to DCOM), a DataSet or DataTable component can be serialized into XML, sent across the wire to a new AppDomain, and then de-serialized back into a fully functional DataSet. Because the DataSet is completely disconnected, and has no external dependencies, you lose absolutely nothing by serializing it and transferring it through Remoting.

Rich Object Model (or lack thereof) – COM is based on a binary standard that has no support for inheritance. As such, any COM object can't actually externally inherit from any other COM object. This essentially forces a very flat object model and one that can be cumbersome at times for people to use. As you saw earlier in this chapter, ADO.NET is based on a very logical, well-designed hierarchy of classes and interface implementation. This type of robust object model fosters code re-use and actually makes it far easier for programmers to use the classes and learn their functions. As such, ADO can be said to have a fairly flat object model while ADO.NET has a large, robust object hierarchy.

System.Xml

This chapter is all about working with data in the .NET framework. It might not be immediately obvious, but the System.Xml namespace is an integral player in the data access scheme for the .NET framework. We've covered pretty well that the DataSet relies entirely on XML for its own internal schema and data representation.

We'll learn in this section what the difference is between System.Xml and System.Data, as well as their similarities and how their functionality is intertwined with somewhat parallel hierarchies. We will also learn that XML is worth just as much consideration in a chapter about data access as is SQL Server or Oracle or OLE DB.

Before we get into more detail on the System.Xml namespace and its components, we should make sure that you are reminded that System.Xml is NOT simply the managed version of MSXML. Its functionality may overlap that of the MSXML COM library, but it contains much more and sports a rich object model and hierarchy. The System.Xml namespace provides standards-based support for many kinds of XML processing, including:

  • XML 1.0

  • XML Namespaces

  • XML Schemas (XSD)

  • XPath

  • XSL/T Transformations

  • DOM Level 2 Core (XmlDocument)

  • SOAP 1. 1

If our coverage of the System.Xml namespace seems a bit cursory, that's intentional. This isn't the right forum for an in-depth chapter on XML (an entire book could be written on .NET XML, in fact look out for Professional .NET XML from Wrox Press in XXX), so we're just going to skim the surface as it relates to data and ADO.NET DataSets.

System.Xml and System.Data

The two different namespaces both provide different answers to various data issues. The key to getting a handle on where they diverge and where they meet is in determining the needs of your application. Some applications may use nothing but the System.Data namespace, working purely in a DataSet with its internal XML format and DataAdapters, Commands and Connections.

In the not-so-distant past, many people thought that XML was just another buzzword; a fad that was going to quickly fade from sight and never be used again. Now, XML is a part of millions of people's lives, whether they know it or not. Knowing the abilities of an XmlDocument, what can be done using XML, SOAP, XSL/T and related technologies, will go far toward helping you make the right decision as to which technology you need to use and when.

There isn't nearly enough room in this chapter to provide a full coverage of all of the various classes and technologies residing in the System.Xml namespace. We're going to cover two things that are very closely related to the System.Data namespace – the XmlDocument and the XmlDataDocument. One of the samples earlier in this chapter loaded a DataSet with a standard XML document. In the .NET framework, we have the System.Data namespace, which provides us with the ability to manage structured data with a relational model or view by working with DataSets. As well, there is the System.Xml namespace, which provides us with the ability to manage hierarchical XML documents and streams. In addition, we also have classes that provide a bridge between relational data in DataSets and hierarchical data in XML documents. The XmlDocument class provides the ability to work with DOM documents, while the XmlDataDocument provides a synchronization bridge between relational and XML data.

The XmlDocument

The XmlDocument provides us with an in-memory representation of an XML document according to the W3C Core DOM Level 1 and Core DOM Level 2 standards. It is essentially a container for a list of XmlNodes. Of course, each of those nodes can then have its own list of XmlNodes, and so on and so on. This section of the chapter is going to assume that you have some familiarity with XML and have used the MSXML.DOMDocument object before, as it provides a good frame of reference when dealing with the .NET XmlDocument. For more information on the XML DOM, you can consult the W3C (http://www.w3.org) or the Wrox Press book Professional XML.

Let's get right into a sample and then we'll analyze it and pick it apart afterward. Our sample is actually going to take the Region.XML source file from the last DataSet example and load it into an XmlDocument. From there, it is going to use the document model to make some changes to the document. The end result of this experiment should look very much like the result of our last DataSet example. This, of course, is not a coincidence.

Here's our sample code in C#.

using System;
using System.Xml;

namespace Wrox.ProDotNet.Chapter10. XmlDocument1
{
  
class XmlSample
{
static void Main(string[] args)
{
   XmlDocument SampleDoc = new XmlDocument();
   XmlNode RootNode;
   XmlNode RegionNode;
   XmlNode RegionDescription;
   SampleDoc.Load( "Region.XML" );

   // This is going to be our "SampleDS" root node
   RootNode = SampleDoc.DocumentElement;

   // modify the first region, like we did in the previous sample.
   RegionNode = RootNode.ChildNodes[0];
   // select the RegionDescription child of the Region node.

   RegionDescription = RegionNode.SelectSingleNode("RegionDescription");
   RegionDescription.InnerText = "Reeeeeaaalllly Far East";
   SampleDoc.Save( "Region_Changed.XML" );
}
}
}

The code is pretty straightforward. We first load the XmlDocument with the XML in the Region.XML file. Then, grabbing a reference (careful not to say pointer!) to the DocumentElement object, we then find the first Region by grabbing the first child of the RootNode. We use just about the simplest XPath statement possible (we'll go into a pinch of XPath later) to select the "RegionDescription" node. Once we have a reference to that node, we change its InnerText property and save the document to disk.

The code generates a Region_Changed.XML file. Note that it is identical to the one we created with our DataSet example earlier.

<?xml version="1.0" standalone="yes"?>
<SampleDS>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Reeeeeaaalllly Far East</RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                         
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                          
    </RegionDescription>
  </Region>
</SampleDS>

The XmlDataDocument

The XmlDataDocument class is designed specifically to work in tandem with a DataSet. It allows a structured XML document to be viewed and manipulated through a DataSet object, retaining the original fidelity (we talked about that earlier) of the underlying XML document. It can be considered a relational view of a subset of an XML document.

The XmlDataDocument allows for a two-way link between the DataSet and an XML document. In terms of data access, the relationship between the DataSet and the XmlDataDocument is similar in nature to the relationship between the DataSet and the DataAdapter. The following diagram illustrates a sample use of the XmlDataDocument where, through the use of a DataSet and an XML document, the DataSet is limited in its scope to only the main <Customers> node of the document. All other data in the document, including the <Orders> node, is completely inaccessible to the DataSet. By providing full fidelity on the XML document, we can be guaranteed that any change made to the <Customers> node via the DataSet will have absolutely no impact on the <Orders> nodes. Assuming that we configure our mapping types properly, we can also be assured that attributes in the document will remain attributes and elements will remain elements, even though they may all appear as columns in the DataSet.

There are many reasons for implementing a solution using an XmlDataDocument rather than simply using the XmlDocument DataSet component. If you link an XML document to the DataSet through this class, then your XML document then can become versioned in the same way as the DataSet. This allows you to track changes to a custom XML document. In addition, with the XmlDataDocument, you are able to run XPath queries (covered shortly) against your data. You can also choose which method you use to traverse your data, and at any point you can load a DataRow for a given node, and the reverse.

The quick sample we're going to walk through shows you in detail the effects of using an XmlDataDocument against a schema-loaded DataSet. The issue here is that if your DataSet represents, and has access to, everything in your XmlDocument, you may not need to use the XmlDataDocument. Also, keep in mind that the XmlDataDocument is a complex thing and we could spend dozens of pages using it. It works bi-directionally, so you can modify the DataSet and see the changes in the XmlDocument, and if you modify the XmlDocument outside the scope defined by the DataSet's schema, the DataSet doesn't care.

Let's walk through the code and then take a look at the output.

using System;
using System.Data;
using System.Xml;

namespace Wrox.ProDotNet.Chapter10XmlDataDocument
{
class ClassMain
{
static void Main(string[] args)
{

Here we create a new DataSet, which we've seen before. Then we have the DataSet load a pre-defined schema (which we have also seen in previous examples). This schema, included further below, describes only information about a region table that exists one level beneath the root document. It contains no information about anything else. Once we load the schema, we create an XmlDataDocument based on the RegionDSDataSet. This links the DataSet and the XmlDataDocument. What is happening under the hood is that the data document is subscribing to the DataSet's events in order to link the two.

   DataSet RegionDS = new DataSet();
   DataRow RegionRow;
   RegionDS.ReadXmlSchema( " Region_More.XSD" );
   XmlDataDocument DataDoc = new XmlDataDocument( RegionDS );
   DataDoc.Load("Region_More.XML" );

   Console.WriteLine("DataSet has data now, even though we never loaded the
           DataSet.");
   Console.WriteLine("Because of the Schema in our XSD file, RegionDS has no 
           access to the MoreData table.");
   Console.WriteLine("DataSet Tables Found: {0}", RegionDS.Tables.Count );

This is where some people's alarm bells start going off. We're iterating through the list of tables in the DataSet, and we're iterating through the list of rows in those tables. All of this is fine and good, but we never actually loaded any XML explicitly from the DataSet, nor did we use a DataAdapter to load it. Remember what was said a little earlier in this section about the similarity between the XmlDataDocument and the DataAdapter. The XmlDataDocument populated the DataSet with the data that was appropriate and performed a good deal of other work behind the scenes.

   foreach (DataTable tTable in RegionDS.Tables)
   {
      Console.WriteLine("Table: {0}", tTable.TableName );
      Console.WriteLine("-----------------------------------");
      foreach (DataRow tRow in tTable.Rows)
      {
         Console.WriteLine( tRow["RegionDescription"] );
      }
   }

   RegionRow = RegionDS.Tables[0].Rows[0];

   Console.WriteLine();
   Console.WriteLine("Without doing anything, first row pre-mod state: {0}",
           RegionRow.RowState);
   RegionDS.AcceptChanges();
   Console.WriteLine("About to Modify the first region, pre-mod state: {0}", 
           RegionRow.RowState);
   RegionRow["RegionDescription"] = "Reeeeeaalllly Far East!";
   Console.WriteLine("Just modified the row, post-mod state: {0}",
           RegionRow.RowState );
   RegionDS.AcceptChanges();
   Console.WriteLine("Commit changed to disk, using the XmlDataDocument,
           NOT the DS!");
   DataDoc.Save( "Region_More_Modified.XML" );
}
}
}

This is another area that may confuse some programmers. We're actually calling the Save method on the XmlDataDocument, which should look pretty familiar to anyone who has used the XmlDocument class. The reason we're calling Save() on the XmlDataDocument is that we need to preserve fidelity. If we had saved the DataSet itself, we would have only saved the subset of information to which the DataSet's schema allows us access. As an exercise for yourself, modify this sample code so that the DataSet saves its own copy of the data in a file called Region_More_DSModified.XML and compare the two. If you can answer the question Why are the two files different? then you have really understood the content of this section.

Now that we've looked at the code, let's take a look at the output:

The first thing that should strike you immediately is that the RowState of the row is set to Added. This differs from our previous DataSet examples where the RowState was Unchanged. The reason for this is that because the DataSet was loaded and linked by virtue of the XmlDataDocument, it has no original state, therefore all of the data in the DataSet at that time is considered to be Added data. We call AcceptChanges() once and then the rows look normal with the Unchanged state.

To illustrate the full impact of what this application does, let's take a look at the source XML document that this console application loads.

<?xml version="1.0" standalone="yes"?>
<Root>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Eastern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                         
    </RegionDescription>
  </Region>

Again, keep in mind here that because the schema we provided didn't contain a definition for MoreData, the dataset cannot see it when linked to the XmlDataDocument for this XML. Not only can it not see it, but changes made to the DataSet cannot modify this section of the XML.

<MoreData>
  <Column1>12</Column1>
  <Column2>Hi There</Column2>
</MoreData>

We've trimmed some redundant data here to make the file more readable.

<MoreData>
  <Column1>12</Column1>
  <Column2>Hi There</Column2>
</MoreData>
</Root>

What it looks like we have here is an XML file that contains the data for two tables. One table, Region, contains two columns and we've seen this before in our previous examples. The second table we made up and stuffed into the XML file for demonstration purposes. It too has two columns.

When we apply the following schema to the DataSet, we are in effect completely hiding the MoreData table from the XmlDataDocument because, as we know, DataSets will not allow visibility to, nor modify access to, any piece of data not described in its internal schema.

The Region_More.XSD schema that we "pre-filter" our DataSet with:

<?xml version="1.0" standalone="yes"?>
<xsd:schema id="Root" targetNamespace="" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xsd:element name="Root" msdata:IsDataSet="true">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element name="Region">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="RegionID" type="xsd:string"
              minOccurs="0" />
              <xsd:element name="RegionDescription" type="xsd:string"
              minOccurs="0" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Finally we come to the conclusion of our output, the Region_More_Modified.XML file. This file contains the output as saved by the XmlDataDocument, not by the DataSet itself. You'll notice that fidelity has been preserved in that the MoreData information still remains. This is a significant point to remember, because if we had done the same load, modify, save operation on only the DataSet, the MoreData table would have evaporated into thin air and we would have lost all of that data!

Here is the final output of our program in the form of the modified XML Data File.

<?xml version="1.0" standalone="yes"?>
<Root>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Reeeeeaalllly Far East!</RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                         
    </RegionDescription>
  </Region>
  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>

Again, we've skimmed out some data here to make the file more readable on the printed page.

  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>
  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>
</Root>

A Brief Introduction to XPath

Obviously we don't have the room in this chapter to give any form of detailed discussion on XPath. Wrox has other books that cover Xpath in much greater detail that we can here. XPath is a simple query language used for selecting lists of nodes from an XmlDocument. Many people have dubbed it "SQL for XML" or "Internet SQL".

We're not going to go into detail at all on the semantics or tokens involved in the XPath language. Instead, we'll show a brief example of how you can use an XPath query on an XmlDataDocument to retrieve a list of matching nodes. For syntax information on the XPath language itself, you can consult any number of Wrox books or the W3C (www.w3c.org) home page for more information.

Our XPath sample takes the previous sample and modifies the Region_More.XSD schema such that the RegionID column is no longer a child element but an attribute of the <Region> tag. This also demonstrates an incredibly powerful use for schemas in that we can have a mix of attribute columns and element columns defined by the schema. We then load the Region_More.XML file and select those region's whose RegionID attribute is greater than 2.

using System;
using System.Data;
using System.Xml;

namespace Wrox.ProDotNet.Chapter10.XPath_XmlDataDocument
{
class ClassMain
{
static void Main(string[] args)
{
   DataSet RegionDS = new DataSet();
   DataRow RegionRow;
   RegionDS.ReadXmlSchema( " Region_More.XSD" );
   XmlDataDocument DataDoc = new XmlDataDocument( RegionDS );
   DataDoc.Load( " Region_More.XML" );

The XPath statement in the below code translates roughly into English as "Select all Region nodes that are children of a Root node that have a RegionID attribute value greater than 2". The NodeMatches variable will then contain an array of XmlNode objects. Because the DataSet is linked with this XmlDataDocument, we can obtain any DataSet row for any given node so long as the node represents the uppermost level of that row (for example, you cannot select a child-element column node and obtain a row for it, that will result in a null).

   // Now we'll select all of the regions that have
   // a RegionID > 2.
   XmlNodeList NodeMatches =
          DataDoc.SelectNodes("//Root//Region[@RegionID>2]");
  
   Console.WriteLine( "------------------------" );

   foreach ( XmlNode tmpNode in NodeMatches )
   {
      // pull up corresponding row in the DS.
      RegionRow = DataDoc.GetRowFromElement( (XmlElement)tmpNode );
      Console.WriteLine(RegionRow["RegionDescription"]);
   }
  
}
}
}

The code above results in the following output:

-----------------------
Northern
Southern

Obviously the XPath queries can become quite complex and provide the programmer with an incredibly valuable resource for locating specific data and selecting batches of data that match a certain criteria. The ability to utilize XPath on an XmlDataDocument is not only useful when working against on-disk offline stores, but can be extremely valuable to query offline XML that was populated from a server to avoid making multiple round-trips to re-query and re-sort various pieces of data.

We won't go into it in this chapter, but you should at least be aware of the fact that just as easily as you can perform XPath queries against a given XmlDataDocument, you can also perform XSLT (eXtensible Stylesheet Language Transformations) transformations against that data.

The possibilities for uses for all of these XML technologies are limited only by your ability to think of them. There are thousands of different things that you can utilize the XmlDataDocument / DataSet combination for, and when you consider that you can then add XPath and XSLT support to it, the number of possibilities grows exponentially.

Summary

In the first section of this chapter, you were given an overview of the System.Data namespace (ADO.NET), which included overviews of the Connection, the Command, the DataAdapter, and the DataReader. That section then gave you an overview of the DataSet and its capabilities. Then you were given some information to help you compare and contrast ADO and ADO.NET, as well as the benefits and drawbacks of ADO.NET

Once we completed our brief tour of ADO.NET, we then went through a very quick introduction to the System.Xml namespace, introducing you to the XmlDocument and XmlDataDocument classes. Once we got into the XmlDataDocument class and discovered how it can be used to link directly with a DataSet, and provide XPath and XSLT services against an XmlDataDocument (thereby indirectly an ADO.NET DataSet), we realized and discovered the true depth of the link between the System.Xml and System.Data namespaces.

Data is simply a representation of information. ADO.NET (System.Data) allows you to represent facts in DataSets, tables, rows, columns and relational data. System.Xml allows you to represent facts in the form of XML documents. Linking the two namespaces, you can perform XML-based operations on relational data, and relational operations on XML data.

Hopefully you now have a good idea of what is involved in working with data in the .NET framework, and how you cannot simply exclude XML when considering the concept of "data" as a whole. At this point you should be ready to start working with and learning more about ADO.NET and the .NET framework XML support.

Copyright and Authorship Notice

This chapter is taken from "Professional .NET Framework" by Jeff Gabriel, Denise Gosnell, Jeffrey Hasan, Kevin Hoffman, Christian Holm, Ed Musters, Jan D Narkiewicz, Jonothon Ortiz, John Schenken, Thiru Thangarathinam, Scott Wylie, published by Wrox Press Limited in September 2001; ISBN 1861005563; copyright © Wrox Press Limited 2001; all rights reserved.

No part of this chapter may be reproduced, stored in a retrieval system or transmitted in any form or by any means -- electronic, electrostatic, mechanical, photocopying, recording or otherwise -- without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.

void Main(string[] args) { XmlDocument SampleDoc = new XmlDocument(); XmlNode RootNode; XmlNode RegionNode; XmlNode RegionDescription; SampleDoc.Load( "Region.XML" ); // This is going to be our "SampleDS" root node RootNode = SampleDoc.DocumentElement; // modify the first region, like we did in the previous sample. RegionNode = RootNode.ChildNodes[0]; // select the RegionDescription child of the Region node. RegionDescription = RegionNode.SelectSingleNode("RegionDescription"); RegionDescription.InnerText = "Reeeeeaaalllly Far East"; SampleDoc.Save( "Region_Changed.XML" ); } } }

The code is pretty straightforward. We first load the XmlDocument with the XML in the Region.XML file. Then, grabbing a reference (careful not to say pointer!) to the DocumentElement object, we then find the first Region by grabbing the first child of the RootNode. We use just about the simplest XPath statement possible (we'll go into a pinch of XPath later) to select the "RegionDescription" node. Once we have a reference to that node, we change its InnerText property and save the document to disk.

The code generates a Region_Changed.XML file. Note that it is identical to the one we created with our DataSet example earlier.

<?xml version="1.0" standalone="yes"?>
<SampleDS>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Reeeeeaaalllly Far East</RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                         
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                          
    </RegionDescription>
  </Region>
</SampleDS>

The XmlDataDocument

The XmlDataDocument class is designed specifically to work in tandem with a DataSet. It allows a structured XML document to be viewed and manipulated through a DataSet object, retaining the original fidelity (we talked about that earlier) of the underlying XML document. It can be considered a relational view of a subset of an XML document.

The XmlDataDocument allows for a two-way link between the DataSet and an XML document. In terms of data access, the relationship between the DataSet and the XmlDataDocument is similar in nature to the relationship between the DataSet and the DataAdapter. The following diagram illustrates a sample use of the XmlDataDocument where, through the use of a DataSet and an XML document, the DataSet is limited in its scope to only the main <Customers> node of the document. All other data in the document, including the <Orders> node, is completely inaccessible to the DataSet. By providing full fidelity on the XML document, we can be guaranteed that any change made to the <Customers> node via the DataSet will have absolutely no impact on the <Orders> nodes. Assuming that we configure our mapping types properly, we can also be assured that attributes in the document will remain attributes and elements will remain elements, even though they may all appear as columns in the DataSet.

There are many reasons for implementing a solution using an XmlDataDocument rather than simply using the XmlDocument DataSet component. If you link an XML document to the DataSet through this class, then your XML document then can become versioned in the same way as the DataSet. This allows you to track changes to a custom XML document. In addition, with the XmlDataDocument, you are able to run XPath queries (covered shortly) against your data. You can also choose which method you use to traverse your data, and at any point you can load a DataRow for a given node, and the reverse.

The quick sample we're going to walk through shows you in detail the effects of using an XmlDataDocument against a schema-loaded DataSet. The issue here is that if your DataSet represents, and has access to, everything in your XmlDocument, you may not need to use the XmlDataDocument. Also, keep in mind that the XmlDataDocument is a complex thing and we could spend dozens of pages using it. It works bi-directionally, so you can modify the DataSet and see the changes in the XmlDocument, and if you modify the XmlDocument outside the scope defined by the DataSet's schema, the DataSet doesn't care.

Let's walk through the code and then take a look at the output.

using System;
using System.Data;
using System.Xml;

namespace Wrox.ProDotNet.Chapter10XmlDataDocument
{
class ClassMain
{
static void Main(string[] args)
{

Here we create a new DataSet, which we've seen before. Then we have the DataSet load a pre-defined schema (which we have also seen in previous examples). This schema, included further below, describes only information about a region table that exists one level beneath the root document. It contains no information about anything else. Once we load the schema, we create an XmlDataDocument based on the RegionDSDataSet. This links the DataSet and the XmlDataDocument. What is happening under the hood is that the data document is subscribing to the DataSet's events in order to link the two.

   DataSet RegionDS = new DataSet();
   DataRow RegionRow;
   RegionDS.ReadXmlSchema( " Region_More.XSD" );
   XmlDataDocument DataDoc = new XmlDataDocument( RegionDS );
   DataDoc.Load("Region_More.XML" );

   Console.WriteLine("DataSet has data now, even though we never loaded the
           DataSet.");
   Console.WriteLine("Because of the Schema in our XSD file, RegionDS has no 
           access to the MoreData table.");
   Console.WriteLine("DataSet Tables Found: {0}", RegionDS.Tables.Count );

This is where some people's alarm bells start going off. We're iterating through the list of tables in the DataSet, and we're iterating through the list of rows in those tables. All of this is fine and good, but we never actually loaded any XML explicitly from the DataSet, nor did we use a DataAdapter to load it. Remember what was said a little earlier in this section about the similarity between the XmlDataDocument and the DataAdapter. The XmlDataDocument populated the DataSet with the data that was appropriate and performed a good deal of other work behind the scenes.

   foreach (DataTable tTable in RegionDS.Tables)
   {
      Console.WriteLine("Table: {0}", tTable.TableName );
      Console.WriteLine("-----------------------------------");
      foreach (DataRow tRow in tTable.Rows)
      {
         Console.WriteLine( tRow["RegionDescription"] );
      }
   }

   RegionRow = RegionDS.Tables[0].Rows[0];

   Console.WriteLine();
   Console.WriteLine("Without doing anything, first row pre-mod state: {0}",
           RegionRow.RowState);
   RegionDS.AcceptChanges();
   Console.WriteLine("About to Modify the first region, pre-mod state: {0}", 
           RegionRow.RowState);
   RegionRow["RegionDescription"] = "Reeeeeaalllly Far East!";
   Console.WriteLine("Just modified the row, post-mod state: {0}",
           RegionRow.RowState );
   RegionDS.AcceptChanges();
   Console.WriteLine("Commit changed to disk, using the XmlDataDocument,
           NOT the DS!");
   DataDoc.Save( "Region_More_Modified.XML" );
}
}
}

This is another area that may confuse some programmers. We're actually calling the Save method on the XmlDataDocument, which should look pretty familiar to anyone who has used the XmlDocument class. The reason we're calling Save() on the XmlDataDocument is that we need to preserve fidelity. If we had saved the DataSet itself, we would have only saved the subset of information to which the DataSet's schema allows us access. As an exercise for yourself, modify this sample code so that the DataSet saves its own copy of the data in a file called Region_More_DSModified.XML and compare the two. If you can answer the question Why are the two files different? then you have really understood the content of this section.

Now that we've looked at the code, let's take a look at the output:

The first thing that should strike you immediately is that the RowState of the row is set to Added. This differs from our previous DataSet examples where the RowState was Unchanged. The reason for this is that because the DataSet was loaded and linked by virtue of the XmlDataDocument, it has no original state, therefore all of the data in the DataSet at that time is considered to be Added data. We call AcceptChanges() once and then the rows look normal with the Unchanged state.

To illustrate the full impact of what this application does, let's take a look at the source XML document that this console application loads.

<?xml version="1.0" standalone="yes"?>
<Root>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Eastern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                         
    </RegionDescription>
  </Region>

Again, keep in mind here that because the schema we provided didn't contain a definition for MoreData, the dataset cannot see it when linked to the XmlDataDocument for this XML. Not only can it not see it, but changes made to the DataSet cannot modify this section of the XML.

<MoreData>
  <Column1>12</Column1>
  <Column2>Hi There</Column2>
</MoreData>

We've trimmed some redundant data here to make the file more readable.

<MoreData>
  <Column1>12</Column1>
  <Column2>Hi There</Column2>
</MoreData>
</Root>

What it looks like we have here is an XML file that contains the data for two tables. One table, Region, contains two columns and we've seen this before in our previous examples. The second table we made up and stuffed into the XML file for demonstration purposes. It too has two columns.

When we apply the following schema to the DataSet, we are in effect completely hiding the MoreData table from the XmlDataDocument because, as we know, DataSets will not allow visibility to, nor modify access to, any piece of data not described in its internal schema.

The Region_More.XSD schema that we "pre-filter" our DataSet with:

<?xml version="1.0" standalone="yes"?>
<xsd:schema id="Root" targetNamespace="" xmlns="" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xsd:element name="Root" msdata:IsDataSet="true">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element name="Region">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="RegionID" type="xsd:string"
              minOccurs="0" />
              <xsd:element name="RegionDescription" type="xsd:string"
              minOccurs="0" />
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Finally we come to the conclusion of our output, the Region_More_Modified.XML file. This file contains the output as saved by the XmlDataDocument, not by the DataSet itself. You'll notice that fidelity has been preserved in that the MoreData information still remains. This is a significant point to remember, because if we had done the same load, modify, save operation on only the DataSet, the MoreData table would have evaporated into thin air and we would have lost all of that data!

Here is the final output of our program in the form of the modified XML Data File.

<?xml version="1.0" standalone="yes"?>
<Root>
  <Region>
    <RegionID>1</RegionID>
    <RegionDescription>Reeeeeaalllly Far East!</RegionDescription>
  </Region>
  <Region>
    <RegionID>2</RegionID>
    <RegionDescription>Western                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>3</RegionID>
    <RegionDescription>Northern                                          
    </RegionDescription>
  </Region>
  <Region>
    <RegionID>4</RegionID>
    <RegionDescription>Southern                                         
    </RegionDescription>
  </Region>
  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>

Again, we've skimmed out some data here to make the file more readable on the printed page.

  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>
  <MoreData>
    <Column1>12</Column1>
    <Column2>Hi There</Column2>
  </MoreData>
</Root>

A Brief Introduction to XPath

Obviously we don't have the room in this chapter to give any form of detailed discussion on XPath. Wrox has other books that cover Xpath in much greater detail that we can here. XPath is a simple query language used for selecting lists of nodes from an XmlDocument. Many people have dubbed it "SQL for XML" or "Internet SQL".

We're not going to go into detail at all on the semantics or tokens involved in the XPath language. Instead, we'll show a brief example of how you can use an XPath query on an XmlDataDocument to retrieve a list of matching nodes. For syntax information on the XPath language itself, you can consult any number of Wrox books or the W3C (www.w3c.org) home page for more information.

Our XPath sample takes the previous sample and modifies the Region_More.XSD schema such that the RegionID column is no longer a child element but an attribute of the <Region> tag. This also demonstrates an incredibly powerful use for schemas in that we can have a mix of attribute columns and element columns defined by the schema. We then load the Region_More.XML file and select those region's whose RegionID attribute is greater than 2.

using System;
using System.Data;
using System.Xml;

namespace Wrox.ProDotNet.Chapter10.XPath_XmlDataDocument
{
class ClassMain
{
static void Main(string[] args)
{
   DataSet RegionDS = new DataSet();
   DataRow RegionRow;
   RegionDS.ReadXmlSchema( " Region_More.XSD" );
   XmlDataDocument DataDoc = new XmlDataDocument( RegionDS );
   DataDoc.Load( " Region_More.XML" );

The XPath statement in the below code translates roughly into English as "Select all Region nodes that are children of a Root node that have a RegionID attribute value greater than 2". The NodeMatches variable will then contain an array of XmlNode objects. Because the DataSet is linked with this XmlDataDocument, we can obtain any DataSet row for any given node so long as the node represents the uppermost level of that row (for example, you cannot select a child-element column node and obtain a row for it, that will result in a null).

   // Now we'll select all of the regions that have
   // a RegionID > 2.
   XmlNodeList NodeMatches =
          DataDoc.SelectNodes("//Root//Region[@RegionID>2]");
  
   Console.WriteLine( "------------------------" );

   foreach ( XmlNode tmpNode in NodeMatches )
   {
      // pull up corresponding row in the DS.
      RegionRow = DataDoc.GetRowFromElement( (XmlElement)tmpNode );
      Console.WriteLine(RegionRow["RegionDescription"]);
   }
  
}
}
}

The code above results in the following output:

-----------------------
Northern
Southern

Obviously the XPath queries can become quite complex and provide the programmer with an incredibly valuable resource for locating specific data and selecting batches of data that match a certain criteria. The ability to utilize XPath on an XmlDataDocument is not only useful when working against on-disk offline stores, but can be extremely valuable to query offline XML that was populated from a server to avoid making multiple round-trips to re-query and re-sort various pieces of data.

We won't go into it in this chapter, but you should at least be aware of the fact that just as easily as you can perform XPath queries against a given XmlDataDocument, you can also perform XSLT (eXtensible Stylesheet Language Transformations) transformations against that data.

The possibilities for uses for all of these XML technologies are limited only by your ability to think of them. There are thousands of different things that you can utilize the XmlDataDocument / DataSet combination for, and when you consider that you can then add XPath and XSLT support to it, the number of possibilities grows exponentially.

Summary

In the first section of this chapter, you were given an overview of the System.Data namespace (ADO.NET), which included overviews of the Connection, the Command, the DataAdapter, and the DataReader. That section then gave you an overview of the DataSet and its capabilities. Then you were given some information to help you compare and contrast ADO and ADO.NET, as well as the benefits and drawbacks of ADO.NET

Once we completed our brief tour of ADO.NET, we then went through a very quick introduction to the System.Xml namespace, introducing you to the XmlDocument and XmlDataDocument classes. Once we got into the XmlDataDocument class and discovered how it can be used to link directly with a DataSet, and provide XPath and XSLT services against an XmlDataDocument (thereby indirectly an ADO.NET DataSet), we realized and discovered the true depth of the link between the System.Xml and System.Data namespaces.

Data is simply a representation of information. ADO.NET (System.Data) allows you to represent facts in DataSets, tables, rows, columns and relational data. System.Xml allows you to represent facts in the form of XML documents. Linking the two namespaces, you can perform XML-based operations on relational data, and relational operations on XML data.

Hopefully you now have a good idea of what is involved in working with data in the .NET framework, and how you cannot simply exclude XML when considering the concept of "data" as a whole. At this point you should be ready to start working with and learning more about ADO.NET and the .NET framework XML support.

Copyright and Authorship Notice

This chapter is taken from "Professional .NET Framework" by Jeff Gabriel, Denise Gosnell, Jeffrey Hasan, Kevin Hoffman, Christian Holm, Ed Musters, Jan D Narkiewicz, Jonothon Ortiz, John Schenken, Thiru Thangarathinam, Scott Wylie, published by Wrox Press Limited in September 2001; ISBN 1861005563; copyright © Wrox Press Limited 2001; all rights reserved.

No part of this chapter may be reproduced, stored in a retrieval system or transmitted in any form or by any means -- electronic, electrostatic, mechanical, photocopying, recording or otherwise -- without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Wrox

United States United States
No Biography provided
Group type: Organisation

1 members


Comments and Discussions

 
Question:) Pinmemberwisinet6-Sep-13 6:00 
GeneralMy vote of 5 PinmemberRay Guan25-Jan-11 20:43 
GeneralHandling One to Many RelationShips Pinmembermuhammad adnan khattak2-May-07 5:16 
GeneralRelational Rose PinmemberDiamond119-May-06 23:21 
Generalredundant text Pinmembergclpixel5-May-06 5:44 
GeneralWOrking with Oledb DataSet PinmemberVishalBarot11-Dec-05 22:18 
GeneralDeleted Records in Datasource PinmemberDoug Panacea17-Feb-04 16:16 
GeneralADO.NET how to add source from dataset to datagrid linking it with ms access database Pinmemberlebogang22-Oct-03 0:34 
GeneralThis article ist a blob... PinmemberKarstenK8-Apr-02 4:30 
GeneralOleDb Provider PinmemberKhurram Aziz4-Apr-02 23:19 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141223.1 | Last Updated 14 Nov 2001
Article Copyright 2001 by Wrox
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid