Click here to Skip to main content
Click here to Skip to main content

XML data type tips in SQL Server 2005

By , 9 Nov 2006
 

Introduction

This article outlines some techniques for using the new XML data type provided in SQL 2005, when creating VS2005 applications. The examples here are written in VB.NET but there is a C# version of the source.

Background

If you develop using object-oriented (OO) languages such as VB.NET or C# and store the data on a relational database system such as SQL Server, one issue that often crops up is how to deal with the differences between the flexible nature of OO and the more rigid hierarchical structure of databases.

For example, let's say your application is an online computer hardware shop and you need to store the details for a set of Products. The common fields, such as ProductID, StockCode, ProductName etc., are relatively easy to map onto a database structure.

What happens however, when the field requirements vary a lot depending on the product nature? The fields for Digital Cameras (number of pixels, maximum zoom, type of storage) would be very different from say, Hard Disks (capacity, interface type, speed, form-factor). You can't really add these as columns to the table, as soon the table would be very large with lots of redundant fields.

One traditional approach is to create a child table, such as ProductAttributes and store the attributes in this as distinct rows. Then the problem becomes one of storing different data types e.g., strings, dates, monetary values, integers etc. in a common way. It also requires a SQL JOIN every time we need to examine a single Product, and a lot more code process, sort and display this.

Using XML to store such data in a more 'flexible' way is one way to keep the values closer to the object without having to resort to sub-tables. Instead of the child table, we could just add an XML column ProductAttributes and store some XML with the attributes.

Origins

I developed this technique when I had to create a Job processing system for a database application. The job processing 'engine' would load pending jobs from the database table, and execute them. A 'job' was essentially a base class in VS2005, with common properties such as JobID, JobName, RunAfter, etc. The problem was that actual instance classes of jobs such as ExportAccountsDataJob or SendCustomerEmailAlertsJob all needed to store and retrieve very different types of data.

One of the new features in SQL 2005 was support for XML as a data type. I realised that if I stored the data for each instance in a class that could be serialized to and deserialized from XML, I could use an XML field to store this data within the Job table. This kept the Job table structure very simple and the complexities of each implementation would deal with what data went into the XML field.

If you don't have SQL 2005 you can use a standard database Text field, although you won't have the SQL 2005 capability to query the XML data. This article isn't intended to show all the power and flexibility of this data type (XQuery alone needs its own book), but to show one way in which it can be used.

How It Works

  • First, we create an XML field in our table to store our data
  • Next, we define a class in .NET that can be serialized to/from XML so we can convert the XML into an object instance and back again
  • Then, to make this more accessible we create a property in our database object this will be explained in more detail below
  • You can then add whatever else you need like re-usable user controls for that data type

Address Example

The code included with this article demonstrates using XML for storing the address of a customer in a SQL 2005 database field. Traditionally you might have Line1, Line2, Town etc. as database columns in the Customers table. This works for one instance of an address, but isn't very re-usable.

For example, imagine that you might also need to store a DeliveryAddress when placing an order, or have to add an optional InvoiceAddress for a customer. This leads to columns called InvoiceAddressLine1, InvoiceAddressLine2 or DeliveryAddressLine 1 etc. - you know you've been there. Then along come addresses for suppliers, business partners, shareholders, delivery companies etc.

Each time you'd have to define all the same fields in each new database table (with different names) and then build and bind different controls to display and validate them. Not much re-usability there!

Database Table Design

So, instead of defining our customer table with individual address fields (Line1, Line2 etc.) we simply create a single field using the XML data type. I'll call it AddressXML. Here is the SQL code to create our Customers table:

/* SQL to create simple customers table (with XML datatype field) */
CREATE TABLE [dbo].[Customers](
    [CustomerID] [int] IDENTITY(1,1) NOT NULL,
    [CustomerName] [varchar](64) COLLATE Latin1_General_CI_AS NOT NULL,
    [AddressXML] [xml] NULL,
 CONSTRAINT [PK_Customers] PRIMARY KEY CLUSTERED 
(
    [CustomerID] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

In the example source, you'll find CreateDatabase.SQL in the App_LocalResources directory which sets up the example database for you and creates the tables and logins.

Note that the SQL 2005 XML data type can support the linking of XML Schemas (XSDs) to a specific XML field to provide data-integrity checking and indexing support, but to keep it simple for this example we're just using an untyped XML column.

Code

First, we create an Address class in code which contains the public properties that we wish to store: Line1, Line2, Line3 etc.. Note that as we're using XML serialization we can store any type of property including other classes, provided they support serialization.

The App_Code directory contains a Customer class which maps the main table columns to properties in this manner, and I have written a few simple ObjectDataSource CRUD functions for database access.

Note: the ObjectDataSource functions don't use the SqlClient.SqlDataTypes.XML type when creating parameters, they use Varchar or Text. This is because SQL 2005 XML data types store in UTF16 encoding, and the strings that our classes use are UTF8. If you try to use an XML data type you'll get an error when SQL tries to convert it. If you pass varchar/text with UTF8, SQL 2005 automatically converts for you.

To make the address values more usable from the customer object, we have a CustomerAddress property which will load the address by deserializing the contents of the AddressXML field.

    Public ReadOnly Property CustomerAddress() As Address
        Get
            GetAddress()
            Return _address
        End Get
    End Property
    Private _address As Address = Nothing

    'ensure _address has a valid value
    Private Sub GetAddress()
        If _address Is Nothing Then
            'check for blank
            If Me.AddressXML Is Nothing Then
                'create new instance
                _address = New Address()
            Else
                'deserialize existing XML field
                _address = Address.Deserialize(Me.AddressXML)
            End If
        End If
    End Sub

    Public Sub SaveAddress()
        If _address IsNot Nothing Then
            'write to field
            Me.AddressXML = Address.Serialize(_address)
        End If
    End Sub

This means instead of writing

myCustomer.Line1 = "123 High Street"

we have the more elegant and logical approach of

myCustomer.CustomerAddress.Line1 = "123 High Street"

I did it this way so that repeated references to CustomerAddress will deseralize the AddressXML only once (on demand) and then retain this value. The disadvantage to this technique is that you have to explicitly tell the Customer class to save the data back to the AddressXML if you update the address in this way (see the SaveAddress method).

The Address class also has an example property AddressAsHTML which isn't serialized because it's read-only. This is used in the FindCustomers page to display the address as HTML in a label.

As I use the serialization technique with several different classes I created a generic XMLserializer(Of T) class (XMLSerializer<T> in C#) which can be re-used for serializing/deserializing any object type. This means that the code for serializing and deserializing the Address class is now fairly simple:

    Private Shared _serializer As New XMLserializer(Of Address)

    Public Shared Function Serialize(ByVal myobject As Address) As String
        Return _serializer.Serialize(myobject)
    End Function

    Public Shared Function Deserialize(ByVal xml As String) As Address
        Return _serializer.Deserialize(xml)
    End Function

Creating an XML Serializer in .NET requires a .DLL to be created and compiled at runtime, so we really want to minimise the number of times this is done. Using a Shared/static variable is one way to do this, as it will only be created once for each application instance.

Example Web Pages

To demonstrate this the application has two pages, AllCustomers.aspx and FindCustomers.aspx. The first shows all customers in a GridView, and allows you to select, insert, update and delete them.

The DetailsView control that displays the customer is quite simple, as we use the AddressControl.ascx for displaying and editing the AddressXML value.

AddressControl.ascx

AddressControl.ascx does not know anything about the database, or the field names, it just interacts with the Address class. It has a single bindable property AddressXML that we bind to the database field. In my example it binds to Customer.AddressXML but can easily bind to say Order.DeliveryAddressXML in the same way.

When the control is displayed, the XML is passed in (AddressXML is set). This deserializes the XML and copies the values to the controls. When the AddressXML value is requested (e.g. for an insert or update) the control values are read, the Address object created, serialized and returned as a string.

SQL queries on XML data

One argument against using XML in databases prior to SQL 2005 was that XML data in a text field was not easily queried. The SQL query

SELECT * FROM Customers WHERE Town = 'London'
is very simple and efficient in the traditional approach. With XML data in a text field, finding customers whose Town was 'London' would require reading all customers, deserializing the address and checking the value - this would obviously be much less efficient.

SQL 2005 supports the querying of XML data (either typed or untyped), so this would translate to

SELECT * FROM Customers 
    WHERE AddressXML.value('(/Address/Town)[1]', 'nvarchar(1000)') = 'London'

The FindCustomers page shows an example of searching in this way. You can enter a Town and search for customers who have that value. The data source Customers.vb code uses an XQuery search on the XML field.

Note that I asked the Address class to provide the actual XPath Address/Town in case we later change "Town" to something else. This avoids coding address logic into the Customer class.

The Benefits of Using XML

Re-Use

Having most of the address structure and functionality in the Address class, we can now make re-use pay. Need an invoice address in your customer details? Simply add a column InvoiceAddressXML to the database table, and the supporting code to the Customer class to access it as Customer.InvoiceAddress.

Control binding etc.

A cool trick is that we can treat an AddressXML as a single property and pass it to the AddressControl web user control. If we had gone down the separate-columns approach we would need to create several properties in our address control and pass the values of each one when using a control in a databound form.

This is a major benefit as I've previously had to create some very complicated user controls in this way, and it is really time-consuming binding Line1, Line2, etc..

Another point is that web controls always pass values as strings. Incorrect formatting of strings (especially dates!) can be a real issue when regional settings can cause problems. XML allows the string data to pass through and be re-assembled into safe .NET data types.

Flexibility

In this example we have only used AddressXML to store a single type of data. However, we can in theory store almost anything in an XML field. To go back to our Products example earlier, we needed to store different attributes for different types of products. Let's say our Products table might have ProductID, ProductName, ProductCategory and ProductAttributesXML.

We would then use ProductCategory to decide what type of class we store in ProductAttributesXML. So for ProductCategory='Digital Cameras" we could define a class DigitalCameraAttributes with fields such as Pixels, MaximumZoom, StorageType, and a special user control to show these details. Likewise, a class called HardDiskAttributes for the Hard Disks category.

Conclusion

I hope you find this approach a useful one. It provides a new way to handle the complexity and flexibility of object-oriented code when used with a relational database such as SQL 2005.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Howard Richards
Architect
United Kingdom United Kingdom
Member
Developer, business person, etc.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionWhy do we need serilize the address field?membersyedsikki9 Jun '10 - 7:34 
OMG | :OMG:
Generaldynamics deserializationmemberEbube27 Nov '09 - 7:12 
you say "Let's say our Products table might have ProductID, ProductName, ProductCategory and ProductAttributesXML.
 
We would then use ProductCategory to decide what type of class we store in ProductAttributesXML. So for ProductCategory='Digital Cameras" we could define a class DigitalCameraAttributes with fields such as Pixels, MaximumZoom, StorageType, and a special user control to show these details. Likewise, a class called HardDiskAttributes for the Hard Disks category."
 
i want to allow user to specifies this fields from form. so how can i define the class base on their input
thanks
GeneralRe: dynamics deserializationmemberHoward Richards30 Nov '09 - 23:42 
Depends on what type of form and how you databind.
 
In ASP.NET you can use ObjectDataSource for example. You'd define a class that serialises to/from XML and add the ODS functionality.
 
With a windows form you could do something similar.
 
'Howard

GeneralRe: dynamics deserializationmemberEbube1 Dec '09 - 2:13 
i dont know the fields at design time .
GeneralRe: dynamics deserializationmemberHoward Richards4 Dec '09 - 0:10 
Hmm.. if you don't know the fields, how would you determine the values/validation to use? There must be some sort of meta-data available that tells you this?
 
As XML is heirarchical and text there is no way to use say, reflection. As LINQ returns an XElement for an XML column in LINQ to SQL, you could use this to see the element names, values and attributes. But without knowing what type each field is, or if they are nested, you're in the same position.
 
'Howard

GeneralRe: dynamics deserializationmemberEbube8 Dec '09 - 4:40 
Productid (int)
productName (varchar)
productAttribute (xml)
 
I don’t know at design time the content of the xml column but when inserted in to the database may look like

sea
5

 

pattribute>
sd
5

 

 

How can i produce the sample report in crystal report
productid productName maker size
 
5 hardisk Seagate 20
 
6 hardisk Seagate 40
 

productid productName type size
 
7 ram dd 512
 
7 ram sd 256
GeneralNeed query/value helpmembermrissmann15 Aug '08 - 12:49 
Say my xml is:
 
<properties>
...
<property name="FirstName" value="John"/>
<property name="LastName" value="Doe"/>
...
</properties>
 
how would I get the lastname value?
 
I would prefer to use the .value(...)
 
-Mark
GeneralRe: Need query/value helpmemberHoward Richards17 Aug '08 - 22:17 
Check out http://www.w3schools.com/XPath/xpath_syntax.asp[^]
 
The XPath syntax for attributes is to prefix with the @ symbol.
 
So in your example, if the field is XMLdata
 
SELECT * FROM Table WHERE XMLdata.value('(/properties/property/@LastName)[1]', 'nvarchar(1000)').value = 'Doe'
 
If you don't care about the specific path (properties-property-lastname) then you can find ANY attribute in the XML value using '//@LastName'
 
'Howard

GeneralUnicode danish charactersmembersitedev.dk26 May '08 - 9:44 
Hello Howard, Thank you for at good example of numerous things, including a type of custom master/detail view. pretty neat.
 
anyway. i have encountered that when i use danish characters æøå or ÆØÅ. the code breaks.
 
i think it has something to do with UTF16 and UTF8 conversion between client/server and the use of htmlencode. however i figure out where to change the input paramters for html encoding.
 
please help., PS thanks for at good article.
 
Keep on progging in the free world...

AnswerRe: Unicode danish charactersmemberzuraw9 Apr '09 - 1:23 
I had similar problem. In XmlSerializer generic class I've changed System.Text.Encoding.UTF8 to System.Text.Encoding.Unicode (UTF16) and it helped.
GeneralTyped XmlmemberFresh Mexican Food Fan27 Jul '07 - 5:51 
Would love to see an article by this author on using Typed Xml Columns with schema(xsd). Any plans to write such an article?
AnswerRe: Typed XmlmemberHoward Richards30 Jul '07 - 5:33 
None at present, sorry.
 
I tend to use XML datatypes for storing data I get in XML format where the structures vary a lot and therefore using schemas would require me to amend the database structure when I need to support new uses.
 
'Howard

GeneralThis is what I consider as summary to the issue.memberGil.Y7 Apr '07 - 5:05 
1. while oop objects design can in-fact be translated to xml with 100% success,
the database structure is based on tables and cannot aquire the same flexability.
 
2. The flexability question is a balance between two infulencers:
a. from the code-programmer point of view: using xml is da thing! Big Grin | :-D
b. from DBA point of view: xml destorys the hierarchy of the tables. Mad | :mad:
 
3. for a class which look like that:
class myClass
{
string name;
int balance;
List<object> actions; Confused | :confused:
}
 
the dba would require to know what are all the possible actions so he or she can create the table or tables structures however the programmer need nothing more then an xml field.
This actually is the core of debate isn't it? and it shows that the real question behind all of that is: "should the database be dependant on the program which using it?" Unsure | :~
 
4. DAL - as we know can be placed in the database using stored procedures or can be placed outside in a form of code language. DAL is also the layer that actually is the only layer that should "talk" with the database. so now we can refine the question from section 4 to: "should the database be depenedant on the DAL which is communicating with it?" :->
 
5. The answer:
 
a. If the DAL is in the layer of the code and not the database, it shouldn't! Dead | X|
(why?) because tommorow we might have another program in a diffrent language and perhaps in a diffrent opeating system which would not be able to communicate with our code-language DAL. And that will require us to write another DAL.
 
b. If the DAL is in the database (using stored procuders), it shouldn't as well! Cry | :((
(why?) because creating the class objects from sql statments would be impossible or would still require a translating DAL in our code-language.

6. The conclusion is that the database should not be dependant on the DAL which is using it. Sigh | :sigh:
 
7. so why! oh why did they allow the XML field in Sql2005 ??? D'Oh! | :doh:
The asnwer is in automating! in the future you will create only your code and classes (BL) and with the click of a button the computer will create your DAL (as possible today) and your Database (not possible today). The database creation will become an automated process and in this case the usage of xml as field (which is the small price the computer pays - not you!) would worth it. This also means that a program not written in the MS enviorment would have problems as mentioned. WTF | :WTF:
 
I encourage that until the automatic process of database creation is not in our reality use the traditional aproach unless you plannning to use your xml field to hold data such as xhtml and in that case the database give you a free check of xml structure. in other words if you plan to hold data which is meaningless in realtion to your objects but is an xml use the field xml field.
 
Gil.
 

GeneralRe: This is what I consider as summary to the issue.memberHoward Richards7 Apr '07 - 23:56 
Thanks for the comments and ideas Gil.
 
I do not think it is a clear-cut black-or-white issue but "shades of grey".
 
In some situations the traditional database approach is best - usually where the data structure is known and does not vary much according to context, and especially if large volumes of data are involved and performance is a consideration.
 
In others my approach of using XML columns is better where the data is less structured or variable (hence the examples).
 
I think of XML datatypes as 'another tool in my toolkit' rather than a one-size-fits-all approach.
 
'Howard

GeneralRe: This is what I consider as summary to the issue.memberGil.Y9 Apr '07 - 10:09 
Well first of all! thanks for replying to my reply to your article Cool | :cool:
 
After writing all the response I written it got me thinking again that it is very not fun that xml is there and just because the automating process is not there I have to define a whole database, so I took a serious step toward the future and created the following table in Sql2005:
 
ID {uniqueidentifier} (null not allowed) (Primary Key)
HashCode {nchar(64)} (null not allowed)
ObjectType {nchar(64)} (null not allowed)
Object {xml} (null not allowed)
 
I named that table tblBO in hope it will be the only database table in the solution and it will hold all the business objects. objects that are too nested can be divided in the business logic intead of creating another table.
 
Pros:
1. one table in the database.
2. DBA can be fired. (he didn't do much anyway...:->)
3. all data managment can be made from the BL.
 
Cons:
1. DBA, his mother, his sister and her boyfriend want to kill me.
2. Speed - probably will be less effective, checks are being made.
(if speed will become an issue there will be no choice but to buy another machine...)
in other words speed is now an issue of money and not an issue of development (which is a con!)
3. the tradionalist will stop loving me but that's ok GOD still loves me. (I hope).
 
I would love to get your input on the idea,Smile | :)
thanks,
Gil.

GeneralRe: This is what I consider as summary to the issue.memberHoward Richards9 Apr '07 - 21:38 
Not an approach I would personally recommend or use! I think there will be a LOT more cons than I think you have realised. My main concerns would be:
 
1) Speed
 
This is the most obvious one, because SQL has to do a lot more work to extract any data from this table than a traditional one. So for example, when you want customers starting with the letter A you would have to (a) find all the customer objects, and then (b) find those with an XPath query that starts with A inside the XML. This means SQL has to deserialize (in a fashion) and process the XML object which might be quite large, - and you might only want a single field (as in this example) but SQL has to deconstruct the whole object to get it. This would have a big performance impact on this application.
 
2) Querying
 
A properly designed SQL database with tables and relationships allows for queries to be executed very efficiently (a lot of very good coding has gone into query optimisation). Let's take an example from my own telecomms billing app: my main calldata table has about 11 million rows. I want to summarise the calls by rate, for a single customer's bills. In SQL it might be along these lines:

SELECT Bills.CustomerID, Bills.BillID, Rates.RateName, SUM(Calldata.Cost)
FROM Bills JOIN CallData ON Bills.BillID = CallData.BillID
JOIN Rates ON Rates.RateID = CallData.RateID
WHERE Bills.Customer =@id GROUP BY Bills.CustomerID, Bills.BillID
 
That query took about 3 seconds on my live DB. The the XML-only approach just writing the query would be difficult and performance would be seriously slow. You would probably have to work quite hard on indexing the XML data to get it to work even tolerably.
 
I'd suggest reading the SQL article from MS I mentioned earlier - Microsoft makes specific recommendations on when/why to use XML.
 
http://msdn2.microsoft.com/en-us/library/ms345115.aspx[^]
 
'Howard

GeneralRe: This is what I consider as summary to the issue.memberGil.Y9 Apr '07 - 23:22 
yeah I agree.
 
speed/query speed is the problem with that approach.
 
Well, You can always add another table that links to the main xml BO table.
 
The reason to have the XML BO table is to have a place where you can save an entire Business Object serialized so the infrastructure don't have to work hard to be able to save itself.
 
for query purposes one will still need tradtional approach it seems.
 
Thanks,
Gil.
GeneralWell done description!memberlukner15 Nov '06 - 5:21 
I'm a scientific programmer who has started using databases to store data (instead of flat files) and I really enjoyed your example. I frequently encounter exactly the situations you describe where XML could be used in SQL Server (EAV models). I would like to learn more about SQL & XML and your article helped me with that. I couldn't believe how well this works, but I'll need to be a bit more comfortable with a few more examples before I start using this myself. You helped me make a big step forward.
--Ralf B. Lukner, Ph.D.
GeneralRe: Well done description!memberHoward Richards15 Nov '06 - 5:58 
Glad it was of use. The Microsoft article I referenced in the previous message is a good starting point if you need to know more.
 
'Howard

GeneralRe: Well done description!memberlukner15 Nov '06 - 6:20 
Yes, I briefly looked at that article -- very helpful as well. People have been warning me about using XML in SQL Server 2005 due to the XQuery challenge for the DB programmer newbie. I'm taking those warnings seriously and proceeding with caution, but proceeding nevertheless.
--Ralf
GeneralSuggested readingmemberHoward Richards15 Nov '06 - 0:26 
I found this MS article which is really useful in describing when you might want to use XML in SQL server
 
http://msdn.microsoft.com/xml/default.aspx?pull=/library/en-us/dnsql90/html/sql25xmlbp.asp[^]
 
'Howard

GeneralRe: Suggested readingmemberFresh Mexican Food Fan27 Jul '07 - 5:43 
This link is broken as of today. Can you please help me with the current link? Thanks for the article.
 
FMFF
AnswerRe: Suggested readingmemberHoward Richards30 Jul '07 - 5:37 
The article is titled (in case it moves again!) "XML Best Practices for Microsoft SQL Server 2005" and is currently found at
http://msdn2.microsoft.com/en-us/library/ms345115.aspx[^]
 
'Howard

GeneralNeat techniquememberSteven Berkovitz13 Nov '06 - 16:27 
Hey Howard,
 
I like your technique. I've often serialized a binary object graph (usually a hashtable) to a text column to store random bits and pieces of type-aware data. Your method improves on this with the ability to use XPath to query the data (something I can only do after deserializing and object).
 
I'm going to test this out in a projcet!
 

 
-Steven

GeneralRe: Neat techniquememberHoward Richards13 Nov '06 - 21:43 
Thanks for the comments. It certainly for me helps turn a few unwieldy tables into managable ones and prevent child table proliferation.
 
'Howard

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 9 Nov 2006
Article Copyright 2006 by Howard Richards
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid