XML Data Files, XML Serialization, and .NET

KennS

4.82/5 (25 votes)

Aug 28, 2003

17 min read

345206

6482

Describes a means to build XML data files using XML Schema and xsd.exe to facilitate easy XML Serialization

CardfileSerializationDemo application image

Screenshot for PocketPC

Introduction

Note: I recently added the PocketPC version of this application.

I was reading Manster's article A C# Personal Organizer regarding a personal organizer in C# whose data files were stored as XML. I had actually been writing something similar for myself, loosely based upon the old Windows Cardfile application. As I'd just received a new PocketPC and wanted to write some code for it, I figured I'd model an application on my desktop, save some data to an XML file, and transfer the data to and from the PocketPC. I was curious to see how Manster implemented his application, especially where the XML data files were concerned - perhaps he included something I should also have designed into my own application.

Actually, our applications are somewhat different. I wanted to store not only contact information but also images and general notes to myself. To me, this meant I had three different types of cards that could be stored in my card deck. But I also noticed that he took a different approach to actually converting his data into XML and reading it back again. He uses XmlTextWriter to create his XML by hand, which in fact is exactly what you have to do with the PocketPC. But on the desktop, there is an alternative approach I've used effectively in several other applications - XML Serialization.

XML Serialization is a process .NET implements that easily converts object public instance data into XML and reads the data back again. At the end of the day, you might think it no easier to use than writing the XML yourself, but XML Serialization does have some advantages, which I'll describe shortly. The nexus for XML serialization is the Web Service infrastructure .NET uses to convey data over the network. It's important to consider this, as this serialization implementation is different than and separate from the remoting serializers found in System.Runtime.Serialization. I may refer to these classes as "remoting serialization." No, instead you'll find the Web Service XML serialization classes in System.Xml.Serialization. These classes I'll refer to as "XML serialization" itself. (I can only speculate why there are two different implementations of XML Serialization present in the Framework, but in fact there are!) In this article, I'll really only describe Web Service XML serialization. Remoting XML Serialization would require yet another article, as the use mechanisms are moderately different.

How XML Serialization Works

XML serialization is very easy to use, but I must admit it's at times hard to debug. I've found the exception message text isn't especially useful, leaving me to guess and try again. Even so, essentially all public properties of any .NET object are automatically serialized into XML. The XML tag name is manufactured from the property name unless you specify differently. Arrays of objects are also handled automatically, even if the array is of complex types. Array element data is serialized as any other .NET object (public properties).

Of course, .NET "knows" about the public properties and whether they're array-based or not because of the metadata associated with the type. If you create a .NET class that has a public string property called "Name," there are classes available that will enumerate all of the public class properties and others that will provide information about the property, such as its name. The serializer then roughly follows this simple model:

(writing)

Stream XML document element opening tag in the form <type_name>
For each public property implemented by the class
  Read the property name
  Read the property value
  Stream XML in the form <name>value</name>
End for
Stream XML document element closing tag in the form </type_name>

(reading)

Create an instance of the type encoded within the XML
For each XML node within the document element
  Read the node name and value
  Assign the named property the value previously read
End for

Of course, there may be errors during this process. The XML may not match the object type you specified, or there may be general XML errors. There may also be instances of public object properties that cannot be serialized, such as properties based upon IDictionary (like a hash table). (Note this is true of the 1.0 and 1.1 versions of the Framework... future versions may serialize IDictionary-based properties.)

You'll also see in my code that I don't use the [Serializable] attribute. This attribute is for remoting serialization and is not necessary for pure XML serialization. This is also true for the ISerializable interface.

Designing Data Files

When it comes to XML data files, we could simply write some C# or VB code, add some public properties, and let the serializer deal with details. In many cases, that's fine. However, I prefer to design the XML that will represent my data and then generate the C# source code from that. As it happens, this is also possible, and it is this design and implementation process I'll focus on for the rest of the article. And while I'll be referring to a specific example in this section, the basic pattern works for any XML data file.

When I design XML data files, I often create a sample data file in XML and then create a schema from that. I then iterate between XML and schema until I have an XML data file format I like and a representative XML schema I can use for validation. This process works well with XML serialization because there is a utility I use that ships with the .NET Framework called xsd.exe. xsd.exe takes as input an XML schema and will generate C# or VB source files that, when serialized, will produce the exact XML as outlined in the schema. If I later change the schema, I simply run xsd.exe again and matching source files are regenerated.

To illustrate, let's use my cardfile application as an example. Using some application, we can create "cards" that model what would physically be index cards. We have three types of cards - simple text for notes, a simple image, and a specialized card for contact information. Cards are collected into a collection called a "deck." So a single XML file would represent a deck, with the deck containing all cards associated with that deck.

Some minor complications are that I wanted to associate properties with the deck, as Microsoft Word associates properties with documents. I also wanted to encode any image data directly into the XML stream. The reason for this is simply to refrain from inserting a reference to the image (like a filename or URL) and having to remember to copy the image along with the XML onto my PocketPC device. I want the XML data file to be self-contained, even if possibly large.

I also want to specify an application version with the card decks so that future versions of the application may require updated data files. Or to be more specific, older versions of the application cannot read data files destined for newer versions of the application if substantial data file formatting changes were applied. Simply put, the card deck will have a version number associated with it that I'll check when loading the data file. If the version isn't one I can handle at the time, I'll terminate the load operation.

On the "housekeeping" side, I knew I'd need some way to identify an individual card, so I chose a simple integer as the card identifier. But since you should be able to arbitrarily add and delete cards, I would need to somehow keep track of the last card ID used. This information must be serialized as well so that when the deck is loaded into the application, new card additions will have proper and unique ID values.

The basic XML I came up with looks like this:

<Cards>
  <NextID/>
  <Version/>
  <Props>
    <Name/>
    <Author/>
    <Comments/>
  </Props>
  <Card>
    <Header>
      <Name/>
      <ID/>
      <Type/>
      <Created/>
      <Updated/>
    </Header>
    <Body>
      {item}
    </Body>
  </Card>
  <Card/>
  <Card/>
</Cards>

Elements <NextID/> and <Version/> are simple types. <Props/> is a complex type, but there can be only one property element per card deck. <Card/> is also a complex type, but there can be from zero to many of them.

Each card has both a header and a body. The header contains the name of the individual card, along with the card's ID, type, and creation and update date/time stamps. The body contains the card information. That is, the "item" can be a string, an image, or a contact. We'll know what type it is by either examining the header or the first child's element tag name. The data type information appears redundant, but it's stored in the header to facilitate header-only processing, such as when sorting or searching. That way, I can sort all contacts alphabetically without opening the card body to see if the card is in fact a contact.

The card data itself will simply be a single text node (note):

<Note/>

A Base64-encoded node (image):

<Image/>

Or a contact element:

<Contact>
  <FName/>
  <MName/>
  <LName/>
  <Addr1/>
  <Addr2/>
  <Addr3/>
  <City/>
  <State/>
  <PCode/>
  <Country/>
  <Company/>
  <HomePh/>
  <MobilePh/>
  <WorkPh/>
  <FaxPh/>
  <EMail/>
  <Notes/>
</Contact>

The corresponding schema is shown here:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="Cardfile" targetNamespace="http://tempuri.org/Cardfile.xsd"
elementFormDefault="qualified" xmlns="http://tempuri.org/Cardfile.xsd"
xmlns:mstns="http://tempuri.org/Cardfile.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:complexType name="PropType">
    <xs:sequence>
      <xs:element name="Name" type="xs:string" />
      <xs:element name="Author" type="xs:string" />
      <xs:element name="Comments" type="xs:string" />
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="CardType">
    <xs:sequence>
      <xs:element name="Header">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Name" type="xs:string" />
            <xs:element name="ID" type="xs:nonNegativeInteger"/>
            <xs:element name="Type">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:enumeration value="Note" />
                  <xs:enumeration value="Contact" />
                  <xs:enumeration value="Image" />
                </xs:restriction>
              </xs:simpleType>
            </xs:element>
            <xs:element name="Created" type="xs:dateTime" />
            <xs:element name="Updated" type="xs:dateTime" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="Body">
        <xs:complexType>
          <xs:choice>
            <xs:element name="Image" type="xs:base64Binary" />
            <xs:element name="Note" type="xs:string" />
            <xs:element name="Contact">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="FName" type="xs:string" />
                  <xs:element name="MName" type="xs:string" />
                  <xs:element name="LName" type="xs:string" />
                  <xs:element name="Addr1" type="xs:string" />
                  <xs:element name="Addr2" type="xs:string" />
                  <xs:element name="Addr3" type="xs:string" />
                  <xs:element name="City" type="xs:string" />
                  <xs:element name="State" type="xs:string" />
                  <xs:element name="PCode" type="xs:string" />
                  <xs:element name="Country "type="xs:string" />
                  <xs:element name="Company "type="xs:string" />
                  <xs:element name="HomePh" type="xs:string" />
                  <xs:element name="MobilePh" type="xs:string" />
                  <xs:element name="WorkPh" type="xs:string" />
                  <xs:element name="FaxPh" type="xs:string" />
                  <xs:element name="EMail" type="xs:string" />
                  <xs:element name="Notes" type="xs:string" />
                </xs:sequence>
              </xs:complexType>
            </xs:element>
          </xs:choice>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
  <xs:element name="Cards">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="PropType" name="Props" minOccurs="1" 
            maxOccurs="1" />
        <xs:element name="NextID" type="xs:nonNegativeInteger" />
        <xs:element name="Version" type="xs:string" />
        <xs:element type="CardType" name="Card" minOccurs="0" 
            maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Creating the Source Files

I could create the basic XML file and then use xsd.exe to generate the schema for me, but I'm personally not fond of the schema format xsd.exe produces so I create the schema by hand. Something to remember is that the schema will guide xsd.exe when it creates our source files, so it's important to understand XML Schema to some degree. For example, consider this schema fragment:

<xs:complexType name="PropType">
  <xs:sequence>
    <xs:element name="Name" type="xs:string" />
    <xs:element name="Author" type="xs:string" />
    <xs:element name="Comments" type="xs:string" />
  </xs:sequence>
</xs:complexType>
<xs:element type="PropType" name="Props" minOccurs="1"

This will quite literally translate into this C# code:

public class PropType
{
  public string Name;
  public string Author;
  public string Comments
}
public PropType Props;

The card elements are a bit more complex because I'm telling xsd.exe to implement a polymorphic choice:

<xs:element name="Body">
  <xs:complexType>
    <xs:choice>
      <xs:element name="Image" type="xs:base64Binary" />
      <xs:element name="Note" type="xs:string" />
      <xs:element name="Contact">...</xs:element>
    </xs:choice>
  </xs:complexType>
</xs:element name="Body">

What will this element translate into? Well, in code what we're saying is the "body" can consist of a string, something associated with Base64, and some complex element representing contact information. The only way we can polymorpically represent this is to create a public property associated with the body that is of type object. Since all .NET types have object as their base class, we can associate any piece of data with the body object we want. We can use the header's type enumeration to pull it back out. This is known as "weak typing," and in general good designs avoid it. In this case, I could avoid it if I could use the <xs:union/> element within my schema, but unfortunately xsd.exe doesn't handle unions. In this case, I believe the weak typing is justified since we're merely flagging the serialized data's type.

Therefore, the C# source code for this would be:

public class Body
{
  public object Item;
}

But notice how we lost information here. xsd.exe will generate this source code for us, but how will the XML serializer know what datatype "Item" truly represents? The answer is through attributes xsd.exe also injects into the source code:

[System.Xml.Serialization.XmlElementAttribute("Image", typeof(System.Byte[]),
  DataType="base64Binary")]
[System.Xml.Serialization.XmlElementAttribute("Note", typeof(string))]
[System.Xml.Serialization.XmlElementAttribute("Contact", typeof(Contact))]
public object Item;

The XmlSerializer, which is the object that performs the actual serialization, interprets the attribute metadata when it attempts to serialize the public object Item. If the true datatype of the item is a byte array, the serializer will serialize it automatically as a Base64-encoded string. If the object type is a string, the string contents will be streamed out as text. And if the item object type is a contact, the serializer will serialize the contact object just as it would any other .NET object. Any Base64 conversion is handled for you, as is any textual entitization ('<' turns into "<", '&' turns into "&", and so forth). If you serialized the name of the law firm Jones & Jones, the XML would contain "Jones & Jones" to avoid parsing the XML special characters inappropriately.

If you take my XML card schema and run it through xsd.exe, the source code you'll get is slightly different than what I've shown here, but only because the type names it generates are slightly different. In fact, what you'll get is something much like the UML I've shown here:

Card object UML static class diagram

xsd.exe saw the occurrence relationships I specified for the properties and the individual cards and created a single property (minOccurs = maxOccurs = 1)instance yet created an array of cards (minOccurs = 0, maxOccurs = unbounded).

I then further modified the source files to suit my tastes. For example, I far prefer to work with .NET collection classes over simple arrays when possible, so you'll see in the source code I added a public property called "Items" to the deck's class. I then wanted to tell the serializer to ignore this public property, since the Card array property would serve my serialization needs. To do this, I used another XML serialization attribute, XmlIgnore:

[System.Xml.Serialization.XmlIgnore()]
public CardCollection Items = new CardCollection();

I then modified the public card property, the one XmlSerializer will actually serialize, to use my collection:

[System.Xml.Serialization.XmlElementAttribute("Card")]
public CardType[] Card
{
  get { return Items.ToArray(); }
  set {
    Items.Clear();
    Items.AddRange(value);
  }
}

Here you see another serialization attribute, XmlElement. XmlElement is used to change the XML element name associated with the object's property. In this case, we're dealing with an array, so each array element is named <Card/>. The CardCollection class is one I implemented. Just remember that if you regenerate your source files, you'll need to re-implement any custom modifications.

xsd.exe also added two other serialization attributes that are of interest: XmlType and XmlRoot:

[System.Xml.Serialization.XmlTypeAttribute(
  Namespace="http://tempuri.org/Cardfile.xsd")]
[System.Xml.Serialization.XmlRootAttribute(
  Namespace="http://tempuri.org/Cardfile.xsd",
  IsNullable=false)]
public class Cards
{
  ...
}

XmlType is there because in the schema I specified a target namespace, which dictates that the associated XML file must have a namespace applied that matches the schema. The XmlSerializer needs this information, and this attribute is there to provide it. XmlRoot is there to identify the root XML node (the class that represents the document element). With no other input, the XmlSerializer would need to implement more complex algorithms to ferret out what the root might be, if it could be determined at all. This shortcut element helps the serializer by specifically indicating what the root of the XML serialization is to be.

For the most part, the serialization attributes I've shown here are all you'll need, and if you use xsd.exe to generate your source files, it'll insert the appropriate attribute for you. If when you execute xsd.exe you get an error, you'll need to correct or update the schema to accommodate xsd.exe or create the source file(s) yourself. Most of the errors I encounter are from schemas I've been given that include schema elements xsd.exe cannot handle (like <xs:union/>) or are errors in the flow of schema elements (ahem, that would be errors I made when creating the schema).

There is another serialization element that is sometimes helpful, and though. xsd.exe might not inject it for you, you may need it from time to time when serializing complex elements. The attribute is XmlInclude, and it's used only to specify a type of object for serialization and deserialization. It's especially useful with Web Services when you're shipping complex datatypes over the wire (i.e.: classes you create that represent method parameters or return types).

Serializing Your Data File

Thus far we've merely created C# files that when serialized represent our desired XML data file layout. Once you've created the source files, it's time to use them. You create the data file objects in the same way you create and use other Framework components. In this case, the sample allows you to create and fill cards, save them to disk, read them back, and display their contents. The demo app isn't very fancy...I, well, haven't had time to finish my "nice" cardfile application. But this code is probably better to demonstrate the serialization concept as there is less code to sort through when figuring out how I did things.

Saving cards to a file is a very simple matter. We just create an instance of the XmlSerializer and an associated stream writer, serialize the card deck object using the serializer's Serialize() method, and close the stream. The following "save" method is from the demonstration application:

private void SaveCards(string fileName)
{
  // Serialize the cards to a file
  StreamWriter writer = null;
  try
  {
    XmlSerializer ser = new XmlSerializer(typeof(Cards));
    writer =  new StreamWriter(fileName);
    ser.Serialize(writer, this._cards);
  } // try
  catch (Exception ex)
  {
    string strErr = String.Format("Unable to save cards, error '{0}'",
      ex.Message);

MessageBox.Show(strErr,"Card File Save Error",MessageBoxButtons.OK, MessageBoxIcon.Error); } // catch finally { if (writer != null) writer.Close(); writer = null; } // finally }

Deserializing a saved deck is just as easy via the serializer's Deserialize() method:

private void LoadCards(string fileName)
{
  StreamReader reader = null;
  try
  {
    // Deserialize
    XmlSerializer ser = new XmlSerializer(typeof(Cards));
    reader =  new StreamReader(fileName);
    this._cards = (Cards)ser.Deserialize(reader);
    if ( this._cards == null ) throw new NullReferenceException(
         "Invalid card file");
  } // try
  catch (Exception ex)
  {
    string strErr = String.Format("Unable to load cards, error '{0}'",
      ex.Message);
    MessageBox.Show(strErr,"Card File Open Error",MessageBoxButtons.OK,
      MessageBoxIcon.Error);
  } // catch
  finally
  {
    if (reader != null) reader.Close();
    reader = null;
  } // finally
}

The XML serialization infrastructure does all of the XML conversion work for us, simplifying our code. We still had to design our XML and create an associated XML Schema, but changing our XML data file format will be much simpler in the long run using XML serialization over direct reads/writes using XmlTextReader /XmlTextWriter or some other similar means. Note that since I have the schema, I can also add a step when loading a card file. I could load the file as XML into a validating reader and validate it against the schema. If it validates, only then would I deserialize it to a set of card objects. I haven't shown that here since the focus was serialization, but it's an obvious extension I added to the sample application. Look for the LoadCards() method to see how I grabbed the schema from the application resource pool and used it with a validating reader during deserialization.

The PocketPC Version

For those interested, early in the article I mentioned I was interested in writing both a desktop and a PocketPC version of the application. My goal was to share data files. I've since finished an initial PocketPC version and have provided the source and CAB files for you to check out.

The Major Differences?

Much of the application ported fine. There are minor differences in the ListView control, and much of the Framework that is supported on the desktop is not supported in the Compact Framework (most notably to me was the lack of a ThreadException event and cursor support). I also had to learn to deal with the input device component, at least to allow it to be displayed (in child forms, drop in a blank main menu).

The major changes were to serialization and deserialization. In the case of the PocketPC, we have no XPath or XmlSerializer support, so serialization and deserialization is more of a chore. I elected to serialize via XmlDocument and create elements as I went (rather than use XmlTextWriter directly). I start at the document element and build the file top-down. For deserialization, I try to limit expectations and simply ask for nodes that I should find at a given depth. If I find a node, great. If not, then I either move on or throw an exception (if the node was particularly important). I chose to encapsulate all of the serialization and deserialization logic in a single worker class, CardSerializer.

Of course, another major change is to the user interface. But that's to be expected given the limited screen real estate.

The Most Surprising Thing I Learned?

Bitmaps cannot be saved. You can create a card file on the desktop and include within it an embedded image card, and when you deserialize that on the PocketPC device, you'll be able to see your image (without GIF animation). But the Compact Framework doesn't support the ability to stream the bitmap bytes out, so I couldn't allow you to create image cards from scratch that actually embedded new images (you can create the cards, but you can only persist the card name). I'm not sure why Microsoft elected to keep bitmaps opaque in this way. I found some solutions on the Web, but none worked as I liked. In that case, I simply don't allow you to insert an image into an image card.

History

26.08.2003 Initial posting
29.08.2003 Updated article width
01.09.2003 Added PocketPC source and demo

Kenn Scribner - bio

Kenn is the author and co-author of several Windows development books, including:

MFC Programming Visual C++ 6.0 Unleashed
Teach Yourself ATL Programming in 21 Days
Understanding SOAP
Applied SOAP: Implementing .NET XML Web Services

He has contributed to several other Windows development books and has written articles for "PC Magazine" and "Visual C++ Developer's Journal." He currently is a principal consultant with and instructs XML and .NET Web Services for Wintellect.

License

This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below. A list of licenses authors might use can be found here.