<!--------------------------------------------------------------------------->
<!-- INTRODUCTION
The Code Project article submission template (HTML version)
Using this template will help us post your article sooner. To use, just
follow the 3 easy steps below:
1. Fill in the article description details
2. Add links to your images and downloads
3. Include the main article text
That's all there is to it! All formatting will be done by our submission
scripts and style sheets.
-->
<!--------------------------------------------------------------------------->
<!-- IGNORE THIS SECTION --><html><head>
<title>XML Data Files, XML Serialization, and .NET</title>
<STYLE>
BODY, P, TD { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10pt }
H2,H3,H4,H5 { color: #ff9900; font-weight: bold; }
H2 { font-size: 13pt; }
H3 { font-size: 12pt; }
H4 { font-size: 10pt; color: black; }
PRE { BACKGROUND-COLOR: #FBEDBB; FONT-FAMILY: "Courier New", Courier, mono; WHITE-SPACE: pre; }
CODE { COLOR: #990000; FONT-FAMILY: "Courier New", Courier, mono; }
</STYLE>
<link href="http://www.codeproject.com/styles/global.css" type="text/css" rel="stylesheet"></head>
<body bgColor="#ffffff" color="#000000">
<!--------------------------------------------------------------------------->
<!------------------------------- STEP 1 --------------------------->
<!-- Fill in the details (CodeProject will reformat this section for you) -->
<pre>Title: XML Data Files, XML Serialization, and .NET
Author: Kenn Scribner
Email: kenn@endurasoft.com
Environment: C#, .NET 1.0, .NET 1.1
Keywords: XML Serialization, C# Programming, CardFile, Personal Organizer, XML Schema, xsd.exe
Level: Intermediate
Description: Describes means to build XML data files using XML Schema and xsd.exe to facilitate easy XML Serialization
Section General C#
SubSection C# Programming
</pre>
<!------------------------------- STEP 2 --------------------------->
<!-- Include download and sample image information. -->
<ul class="download">
<li>
<A href="CardfileSerializationDemo_demo.zip">Download demo project (.NET 1.0) -
17.1 Kb </A>
<li>
<A href="CardfileSerializationDemo_demo_1_1.zip">Download demo project (.NET 1.1) -
17.1 Kb </A>
<li>
<A href="CardfileSerializationDemo_src.zip">Download source (.NET 1.0) - 51.0 Kb</A>
<li>
<A href="CardfileSerializationDemo_src_1_1.zip">Download source (.NET 1.1) - 51.1
Kb</A>
</li>
</ul>
<p><IMG height="247" alt="CardfileSerializationDemo application image" src="CardfileSerializationDemo.jpg" width="601"></p>
<!------------------------------- STEP 3 --------------------------->
<!-- Add the article text. Please use simple formatting (<h2>, <p> etc) -->
<P>I was reading <a href="http://www.codeproject.com/script/profile/whos_who.asp?id=59120">
Manster's</a> article <a href="http://www.codeproject.com/csharp/cspersonalorganizer1.asp?target=personal%7Ccontact">
A C# Personal Organizer</a> regarding a personal organizer in C# whose data
files were stored as XML I had actually been writing something similar for
myself, loosely based upon the old Windows Cardfile application. As I'd just
received a new PocketPC and wanted to write some code for it, I figured I'd
model an application on my desktop, save some data to an XML file, and transfer
the data to and from the PocketPC. I was curious to see how Manster implemented
his application, especially where the XML data files were concerned--perhaps he
included something I should also have designed into my own application.</P>
<P>Actually our applications are somewhat different. I wanted to store not only
contact information but also images and general notes to myself. To me, this
meant I had three different types of cards that could be stored in my card
deck. But I also noticed that he took a different approach to actually
converting his data into XML and reading it back again. He uses <code>XmlTextWriter</code>
to create his XML by hand, which in fact is exactly what you have to do with
the PocketPC. But on the desktop, there is an alternative approach I've used
effectively in several other applications--XML Serialization.</P>
<P>XML Serialization is a process .NET implements that easily converts C# objects
into XML and reads them back again. At the end of the day, you might think it
no easier to use than writing the XML yourself, but XML Serialization does have
some advantages, which I'll describe shortly. The nexus for XML serialization
is the Web Service infrastructure .NET uses to convey data over the network.
It's important to consider this, as this serialization implementation is
different than and separate from the remoting serializers found in <code>System.Runtime.Serialization</code>.
I may refer to these classes as "remoting serialization." No, instead you'll
find the Web Service XML serialization classes in <code>System.Xml.Serialization</code>.
These classes I'll refer to as "XML serialization" itself. (I can only
speculate why there are two different implementations of XML Serialization
present in the Framework, but in fact there are!) In this article I'll really
only describe Web Service XML serialization. Remoting XML Serialization would
require yet another article, as the use mechanisms are moderately different.</P>
<h2>How XML Serialization Works</h2>
<P>XML serialization is very easy to use, but I must admit it's at times hard to
debug. I've found the exception message text isn't especially useful, leaving
me to guess and try again. Even so, essentially all public properties of any
.NET object are automatically serialized into XML. The XML tag name is
manufactured from the property name unless you specify differently. Arrays of
objects are also handled automatically, even if the array is of complex types.
Array element data is serialized as any other .NET object (public properties).</P>
<P>Of course, .NET "knows" about the public properties and whether they're
array-based or not because of the metadata associated with the type. If you
create a .NET class that has a public string property called "Name," there are
classes available that will enumerate all of the public class properties and
others that will provide information about the property, such as its name. The
serializer then roughly follows this simple model:</P>
<P><i>(writing)</i></P>
<pre>Stream XML document element opening tag in the form <type_name>
For each public property implemented by the class
Read the property name
Read the property value
Stream XML in the form <name>value</name>
End for
Stream XML document element closing tag in the form </type_name></pre>
<P><i>(reading)</i></P>
<pre>Create an instance of the type encoded within the XML
For each XML node within the document element
Read the node name and value
Assign the named property the value previously read
End for</pre>
<P>Of course, there may be errors during this process. The XML may not match the
object type you specified, or there may be general XML errors. There may also
be instances of public object properties that cannot be serialized, such as
properties based upon <code>IDictionary</code> (like a hash table). (Note this
is true of the 1.0 and 1.1 versions of the Framework...future versions may
serialize IDictionary-based properties.)</P>
<P>You'll also see in my code that I don't use the <code>[Serializable]</code> attribute.
This attribute is for remoting serialization and is not necessary for pure XML
serialization. This is also true for the <code>ISerializable</code> interface.</P>
<h2>Designing Data Files</h2>
<P>When it comes to XML data files, we could simply write some C# or VB code, add
some public properties, and let the serializer deal with details. In many cases
that's fine. However, I prefer to design the XML that will represent my data
and then generate the C# source code from that. As it happens, this is also
possible, and it is this design and implementation process I'll focus on for
the rest of the article. And while I'll be referring to a specific example in
this section, the basic pattern works for any XML data file.</P>
<P>When I design XML data files, I often create a sample data file in XML and then
create a schema from that. I then iterate between XML and schema until I have
an XML data file format I like and a representative XML schema I can use for
validation. This process works well with XML serialization because there is a
utility I use that ships with the .NET Framework called <code>xsd.exe</code>. <code>
xsd.exe</code> takes as input an XML schema and will generate C# or VB
source files that, when serialized, will produce the exact XML as outlined in
the schema. If I later change the schema, I simply run <code>xsd.exe</code> again
and matching source files are regenerated.</P>
<P>To illustrate, let's use my cardfile application as an example. Using some
application we can create "cards" that model what would physically be index
cards. We have three types of card--simple text for notes, a simple image, and
a specialized card for contact information. Cards are collected into a
collection called a "deck." So a single XML file would represent a deck, with
the deck containing all cards associated with that deck.
</P>
<P>Some minor complications are that I wanted to associate properties with the
deck, as Microsoft Word associates properties with documents. I also wanted to
encode any image data directly into the XML stream. The reason for this is
simply to refrain from inserting a reference to the image (like a filename or
URL) and having to remember to copy the image along with the XML onto my
PocketPC device. I want the XML data file to be self-contained, even if
possibly large.</P>
<P>I also want to specify an application version with the card decks so that future
versions of the application may require updated data files. Or to be more
specific, older versions of the application cannot read data files destined for
newer versions of the application if substantial data file formatting changes
were applied. Simply put, the card deck will have a version number associated
with it that I'll check when loading the data file. If the version isn't one I
can handle at the time, I'll terminate the load operation.</P>
<P>On the "housekeeping" side I knew I'd need some way to identify an individual
card, so I chose a simple integer as the card identifier. But since you should
be able to arbitrarily add and delete cards, I would need to somehow keep track
of the last card ID used. This information must be serialized as well so that
when the deck is loaded into the application, new card additions will have
proper and unique ID values.</P>
<P>The basic XML I came up with looks like this:</P>
<pre><Cards>
<NextID/>
<Version/>
<Props>
<Name/>
<Author/>
<Comments/>
</Props>
<Card>
<Header>
<Name/>
<ID/>
<Type/>
<Created/>
<Updated/>
</Header>
<Body>
{item}
</Body>
</Card>
<Card/>
<Card/>
</Cards></pre>
<P>Elements <code><NextID/></code> and <code><Version/></code> are
simple types. <code><Props/></code> is a complex type, but there can be
only one property element per card deck. <code><Card/></code> is also a
complex type, but there can be from zero to many of them.</P>
<P>Each card has both a header and a body. The header contains the name of the
individual card, along with the card's ID, type, and creation and update
date/time stamps. The body contains the card information. That is, the "item"
can be a string, an image, or a contact. We'll know what type it is by either
examining the header or the first child's element tag name. The data type
information appears redundant, but it's stored in the header to facilitate
header-only processing, such as when sorting or searching. That way I can sort
all contacts alphabetically without opening the card body to see if the card is
in fact a contact.</P>
<P>The card data itself will simply be a single text node (note):</P>
<pre><Note/></pre>
<P>A Base64-encoded node (image):</P>
<pre><Image/></pre>
<P>Or a contact element:</P>
<pre><Contact>
<FName/>
<MName/>
<LName/>
<Addr1/>
<Addr2/>
<Addr3/>
<City/>
<State/>
<PCode/>
<Country/>
<Company/>
<HomePh/>
<MobilePh/>
<WorkPh/>
<FaxPh/>
<EMail/>
<Notes/>
</Contact></pre>
<P>The corresponding schema is shown here:</P>
<pre><?xml version="1.0" encoding="utf-8" ?>
<xs:schema id="Cardfile" targetNamespace="http://tempuri.org/Cardfile.xsd" elementFormDefault="qualified" xmlns="http://tempuri.org/Cardfile.xsd" xmlns:mstns="http://tempuri.org/Cardfile.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="PropType">
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="Author" type="xs:string" />
<xs:element name="Comments" type="xs:string" />
</xs:sequence>
</xs:complexType>
<xs:complexType name="CardType">
<xs:sequence>
<xs:element name="Header">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="ID" type="xs:nonNegativeInteger"/>
<xs:element name="Type">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Note" />
<xs:enumeration value="Contact" />
<xs:enumeration value="Image" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Created" type="xs:dateTime" />
<xs:element name="Updated" type="xs:dateTime" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Body">
<xs:complexType>
<xs:choice>
<xs:element name="Image" type="xs:base64Binary" />
<xs:element name="Note" type="xs:string" />
<xs:element name="Contact">
<xs:complexType>
<xs:sequence>
<xs:element name="FName" type="xs:string" />
<xs:element name="MName" type="xs:string" />
<xs:element name="LName" type="xs:string" />
<xs:element name="Addr1" type="xs:string" />
<xs:element name="Addr2" type="xs:string" />
<xs:element name="Addr3" type="xs:string" />
<xs:element name="City" type="xs:string" />
<xs:element name="State" type="xs:string" />
<xs:element name="PCode" type="xs:string" />
<xs:element name="Country "type="xs:string" />
<xs:element name="Company "type="xs:string" />
<xs:element name="HomePh" type="xs:string" />
<xs:element name="MobilePh" type="xs:string" />
<xs:element name="WorkPh" type="xs:string" />
<xs:element name="FaxPh" type="xs:string" />
<xs:element name="EMail" type="xs:string" />
<xs:element name="Notes" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:element name="Cards">
<xs:complexType>
<xs:sequence>
<xs:element type="PropType" name="Props" minOccurs="1" maxOccurs="1" />
<xs:element name="NextID" type="xs:nonNegativeInteger" />
<xs:element name="Version" type="xs:string" />
<xs:element type="CardType" name="Card" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema></pre>
<h2>Creating the Source Files</h2>
<P>I could create the basic XML file and then use <code>xsd.exe</code> to generate
the schema for me, but I'm personally not fond of the schema format <code>xsd.exe</code>
produces so I create the schema by hand. Something to remember is that the
schema will guide <code>xsd.exe</code> when it creates our source files, so
it's important to understand XML Schema to some degree. For example, consider
this schema fragment:</P>
<pre><xs:complexType name="PropType">
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="Author" type="xs:string" />
<xs:element name="Comments" type="xs:string" />
</xs:sequence>
</xs:complexType>
<xs:element type="PropType" name="Props" minOccurs="1"</pre>
<P>This will quite literally translate into this C# code:</P>
<pre>public class PropType
{
public string Name;
public string Author;
public string Comments
}
public PropType Props;</pre>
<P>The card elements are a bit more complex because I'm telling <code>xsd.exe</code>
to implement a polymorphic choice:</P>
<pre><xs:element name="Body">
<xs:complexType>
<xs:choice>
<xs:element name="Image" type="xs:base64Binary" />
<xs:element name="Note" type="xs:string" />
<xs:element name="Contact">...</xs:element>
</xs:choice>
</xs:complexType>
</xs:element name="Body"></pre>
<P>What will this element translate into? Well, in code what we're saying is the
"body" can consist of a string, something associated with Base64, and some
complex element representing contact information. The only way we can
polymorpically represent this is to create a public property associated with
the body that is of type <code>object</code>. Since all .NET types have <code>object</code>
as their base class, we can associate any piece of data with the body object we
want. We can use the header's type enumeration to pull it back out. This is
known as "weak typing," and in general good designs avoid it. In this case I
could avoid it if I could use the <code><xs:union/></code> element within
my schema, but unfortunately <code>xsd.exe</code> doesn't handle unions. In
this case I believe the weak typing is justified since we're merely flagging
the serialized data's type.</P>
<P>Therefore, the C# source code for this would be:</P>
<pre>public class Body
{
public object Item;
}</pre>
<P>But notice how we lost information here. <code>xsd.exe</code> will generate this
source code for us, but how will the XML serializer know what datatype "Item"
truly represents? The answer is through attributes <code>xsd.exe</code> also
injects into the source code:</P>
<pre>[System.Xml.Serialization.XmlElementAttribute("Image", typeof(System.Byte[]), DataType="base64Binary")]
[System.Xml.Serialization.XmlElementAttribute("Note", typeof(string))]
[System.Xml.Serialization.XmlElementAttribute("Contact", typeof(Contact))]
public object Item;</pre>
<P>The <code>XmlSerializer</code>, which is the object that performs the actual
serialization, interprets the attribute metadata when it attempts to serialize
the public object Item. If the true datatype of the item is a byte array, the
serializer will serialize it automatically as a Base64-encoded string. If the
object type is a string, the string contents will be streamed out as text. And
if the item object type is a contact, the serializer will serialize the contact
object just as it would any other .NET object. Any Base64 conversion is handled
for you, as is any textual entitization ('<' turns into "&lt;", '&'
turns into "&amp;", and so forth). If you serialized the name of the law
firm <I>Jones & Jones</I>, the XML would contain "Jones &amp; Jones" to
avoid parsing the XML special characters inappropriately.</P>
<P>If you take my XML card schema and run it through <code>xsd.exe</code>, the
source code you'll get is slightly different than what I've shown here, but
only because the type names it generates are slightly different. In fact, what
you'll get is something much like the UML I've shown here:</P>
<P><IMG height="478" alt="Card object UML static class diagram" src="Cards_UML.jpg" width="579"></P>
<P><code>xsd.exe</code> saw the occurrence relationships I specified for the
properties and the individual cards and created a single property (<code>minOccurs
= maxOccurs = 1</code>)instance yet created an array of cards (<code>minOccurs
= 0</code>, <code>maxOccurs = unbounded</code>).</P>
<P>I then further modified the source files to suit my tastes. For example, I far
prefer to work with .NET collection classes over simple arrays when possible,
so you'll see in the source code I added a public property called "Items" to
the deck's class. I then wanted to tell the serializer to ignore this public
property, since the <code>Card</code> array property would serve my
serialization needs. To do this, I used another XML serialization attribute, <code>XmlIgnore</code>:</P>
<pre>[System.Xml.Serialization.XmlIgnore()]
public CardCollection Items = new CardCollection();</pre>
</CODE>
<P>I then modified the public card property, the one <code>XmlSerializer</code> will
actually serialize, to use my collection:</P>
<pre>[System.Xml.Serialization.XmlElementAttribute("Card")]
public CardType[] Card
{
get { return Items.ToArray(); }
set {
Items.Clear();
Items.AddRange(value);
}
}</pre>
<P>Here you see another serialization attribute, <code>XmlElement</code>. <code>XmlElement</code>
is used to change the XML element name associated with the object's property.
In this case we're dealing with an array, so each array element is named <code><Card/></code>.
The <code>CardCollection</code> class is one I implemented. Just remember that
if you regenerate your source files, you'll need to re-implement any custom
modifications.</P>
<P><code>xsd.exe</code> also added two other serialization attributes that are of
interest: <code>XmlType</code> and <code>XmlRoot</code>:</P>
<pre>[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://tempuri.org/Cardfile.xsd")]
[System.Xml.Serialization.XmlRootAttribute(Namespace="http://tempuri.org/Cardfile.xsd",IsNullable=false)]
public class Cards
{
...
}</pre>
<P><code>XmlType</code> is there because in the schema I specified a target
namespace, which dictates that the associated XML file must have a namespace
applied that matches the schema. The <code>XmlSerializer</code> needs this
information, and this attribute is there to provide it. <code>XmlRoot</code> is
there to identify the root XML node (the class that represents the document
element). With no other input, the <code>XmlSerializer</code> would need to
implement more complex algorithms to ferret out what the root might be, if it
could be determined at all. This shortcut element helps the serializer by
specifically indicating what the root of the XML serialization is to be.</P>
<P>For the most part, the serialization attributes I've shown here are all you'll
need, and if you use <code>xsd.exe</code> to generate your source files, it'll
insert the appropriate attribute for you. If when you execute xsd.exe you get
an error, you'll need to correct or update the schema to accommodate <code>xsd.exe</code>
or create the source file(s) yourself. Most of the errors I encounter are from
schemas I've been given that include schema elements <code>xsd.exe</code> cannot
handle (like <code><xs:union/></code>) or are errors in the flow of
schema elements (ahem, that would be errors I made when creating the schema).</P>
<P>There is another serialization element that is sometimes helpful, and though. <code>xsd.exe</code>
might not inject it for you, you may need it from time to time when serializing
complex elements. The attribute is <code>XmlInclude</code>, and it's used only
to specify a type of object for serialization and deserialization. It's
especially useful with Web Services when you're shipping complex datatypes over
the wire (i.e.: classes you create that represent method parameters or return
types).</P>
<h2>Serializing Your Data File</h2>
<P>Thus far we've merely created C# files that when serialized represent our
desired XML data file layout. Once you've created the source files, it's time
to use them. You create the data file objects in the same way you create and
use other Framework components. In this case, the sample allows you to create
and fill cards, save them to disk, read them back, and display their contents.
The demo app isn't very fancy...I, well, haven't had time to finish my "nice"
cardfile application. But this code is probably better to demonstrate the
serialization concept as there is less code to sort through when figuring out
how I did things.</P>
<P>Saving cards to a file is a very simple matter. We just create an instance of
the <code>XmlSerializer</code> and an associated stream writer, serialize the
card deck object using the serializer's <code>Serialize()</code> method, and
close the stream. The following "save" method is from the demonstration
application:</P>
<pre>private void SaveCards(string fileName)
{
// Serialize the cards to a file
StreamWriter writer = null;
try
{
XmlSerializer ser = new XmlSerializer(typeof(Cards));
writer = new StreamWriter(fileName);
<B>ser.Serialize(writer, this._cards);</B>
} // try
catch (Exception ex)
{
string strErr = String.Format("Unable to save cards, error '{0}'",ex.Message);<P></P>
MessageBox.Show(strErr,"Card File Save Error",MessageBoxButtons.OK,MessageBoxIcon.Error);
} // catch
finally
{
if (writer != null) writer.Close();
writer = null;
} // finally
}</pre>
<P>Deserializing a saved deck is just as easy via the serializer's <code>Deserialize()</code>
method:</P>
<pre>private void LoadCards(string fileName)
{
StreamReader reader = null;
try
{
// Deserialize
XmlSerializer ser = new XmlSerializer(typeof(Cards));
reader = new StreamReader(fileName);
<B>this._cards = (Cards)ser.Deserialize(reader);</B>
if ( this._cards == null ) throw new NullReferenceException("Invalid card file");
} // try
catch (Exception ex)
{
string strErr = String.Format("Unable to load cards, error '{0}'",ex.Message);
MessageBox.Show(strErr,"Card File Open Error",MessageBoxButtons.OK,MessageBoxIcon.Error);
} // catch
finally
{
if (reader != null) reader.Close();
reader = null;
} // finally
}</pre>
<P>The XML serialization infrastructure does all of the XML conversion work for us,
simplifying our code. We still had to design our XML and create an associated
XML Schema, but changing our XML data file format will be much simpler in the
long run using XML serialization over direct reads/writes using XmlTextReader
/XmlTextWriter or some other similar means. Note that since I have the schema,
I can also add a step when loading a card file. I could load the file as XML
into a validating reader and validate it against the schema. If it validates,
only then would I deserialize it to a set of card objects. I haven't shown that
here since the focus was serialization, but it's an obvious extension I added
to the sample application. Look for the <code>LoadCards()</code> method to see
how I grabbed the schema from the application resource pool and used it with a
validating reader during deserialization.</P>
<h2>History</h2>
<UL>
<LI>
26.08.2003 Initial posting</LI></UL>
<h2>About Kenn Scribner</h2>
<p>Kenn is the author and co-author of several Windows development books,
including:</p>
<UL>
<LI>
<i>MFC Programming Visual C++ 6.0 Unleashed</i>
<LI>
<i>Teach Yourself ATL Programming in 21 Days</i>
<LI>
<i>Understanding SOAP</i>
<LI>
<i>Applied SOAP: Implementing .NET XML Web Services</i></LI>
</UL>
<p>He has contributed to several other Windows development books and has written
articles for "PC Magazine" and "Visual C++ Developer's Journal."</p>
<p>He currently is a principle consultant with and instructs XML and .NET Web
Services for <a href="http://www.wintellect.com">Wintellect</a>.</p>
<p>Click <a href="/script/profile/whos_who.asp?vt=arts&id=286932">here</a> to
view Kenn's online profile.</p>
</body>
</html>