A Custom .NET XML Serialization Library

Wilfred Verkley

Rate me:

4.43/5 (4 votes)

25 Jan 200611 min read

42.2K

432

Describes a custom XML serialization library, with functionality to compare for, and to combine differences

Introduction

When I originally sat down and wrote this code, I wanted to write a client/server application. I looked at using XML serialization to send the server state to the client. XML serialization in .NET is extremely flexible, being able to serialize and deserialize almost any .NET type with a lot of control on how the XML is formatted. However, sending the entire server state over a network to the client is inefficient; I wanted a way to send only the differences from the previous state since the last update.

I needed the following features:

The XML output had to conform to a fixed format with specific naming and ordering conventions, so any two XML sources could be compared for differences quickly and efficiently.
In an XML difference document, the XML has to store the information about the difference state (Additions, Deletes, Updates, and a No Change placeholder, as well as the previous value that was deleted.)
The ability to be able to recurse through an existing XML document or data structure, comparing or combining any differences.

I therefore wrote a simple XML serialization library called "Wml", to support these features, built on and trying to follow the same conventions as .NET's existing XML and serialization classes as much as possible. In the process of writing it, I found that being able to recurse over a Wml structure, and being able to calculate and combine the differences had a few other uses as well.

In this article, I cover some of the basics of Wml, how to make .NET types serializable to Wml, and some uses of the Wml serialization library.

Wml

The restricted form of XML which the Wml library serializes to is called "Wml".

An example:

XML

<Wml>
  <DirectReports I="2" V="Wxv.WmlDemo.JobPosition">
    <FirstName V="Daniel" />
    <Id V="2" />
    <LastName V="Taylor" />
  </DirectReports>
  <FirstName V="Christopher" />
  <Id V="0" />
  <LastName V="Wilson" />
  <zz. I="1" V="Wxv.WmlDemo.JobPosition">
    <FirstName V="Isabella" />
    <Id V="1" />
    <LastName V="Jones" />
  </zz.>
</Wml>

An example including difference state:

XML

<Wml>
  <!-- updating the name from chairperson to director -->
  <JobTitle V="Director" D="Chairperson" S="U" />  
  <!-- addition of a new employee -->
  <zz. I="1" V="Wxv.WmlDemo.JobPosition" S="A">    
    <FirstName V="Isabella" S="A" />
    <LastName V="Jones" S="A" />
  </zz.>
</Wml>

Its basic structure is:

The root element name is always "Wml".
Element names correspond to field or property names, or "zz." for collection items.
The two main attributes are:
- "Identity" (I) - An integer representing the unique object identity. Default is -1.
- "Value" (V) - Either the string representation of a value, or full type name for a .NET type. Default is null.
Child nodes must be ordered by, and be unique on their name and ID.
Difference documents use two other attributes:
- "State" (S) - Either "Added" (A), "Deleted" (D), "Updated" (U), or "No Change" (N). Default is "No Change".
- "DeletedValue" (D) - The previous value of the member, used when rolling back any differences. Default is null.

Wml has this fixed structure and constraints so that any two Wml sources can be compared for differences efficiently.

Like XML:

It can be held in a DOM, using the class WmlDocument, which holds WmlDomNodes.
It can be read or written to a text source or target using WmlTextReader and WmlTextWriter.
A Wml DOM itself can be read or written to using a WmlDomNodeReader or WmlDomNodeWriter.
The nodes implement an IWmlNode interface, similar in concept to IXPathNavigable, which allows a WmlNodeReader or WmlNodeWriter (which WmlDomNodeReader or WmlDomNodeWriter inherit from) to access the nodes without having to know the underlying structure or types in which the Wml node information is held.

Serializing Types

For a type (class or struct) to be serializable, it must implement the IWmlSerializable interface:

public interface IWmlSerializable
{
    int GetHashCode();
}

There is only one method that needs to be implemented in the type, IWmlSerializable.GetHashCode(), which forces any implementer to override object.GetHashCode(). This function should return an integer that corresponds to the unique identity of the instance compared to any other instance or a null value (which is treated as having the default hash code or identity value of -1) in the place where it is held. XML serialization in .NET does not need this concept, as it's always de-serializing a new data structure from scratch, but it is important in Wml when merging in differences into an existing structure. The hash code result should not change over the instance lifetime, and its value based on a member value which is also serialized, so it can be reliably recalculated after deserialization.

The simplest way to have a unique identity for an object is to simply have an auto-incrementing integer value assigned to an "ID" field or property when it's instantiated. It could also be calculated from a unique and constant data value that it holds, or based on the index of the collection or array it's held in (as long as it's the only reference ever held at that index, including nulls).

Like XML serialization, any IWmlSerializable type must also have a parameter-less constructor defined, so the types can be automatically instantiated during deserialization.

By default, Wml serialization serializes any non-static public member (fields or properties) defined on the IWmlSerializable type whose value can be read and written to. This includes IWmlSerializable references, and any other "primitive" type whose value can be converted to and from a string by the .NET class TypeConverter.

The members you don't want to be serialized (e.g., they hold temporary or derived information) can be marked with the WmlIgnore attribute.

For example:

public class JobPosition : IWmlSerializable
{
    private static int MaxId = 0;
    private int id = -1;

    public int Id
    {
        get { if (id == -1) id = MaxId++; return id; }
        set { id = value; }
    }

    public override int GetHashCode()
    {
        return Id;
    }

    public string FirstName;
    public string LastName;
    public string JobTitle;
    public DateTime DateStarted;

    public enum GenderEnum { Male = 0, Female = 1 }
    public GenderEnum Gender;

    public JobPosition DirectReports;

    [WmlIgnore()]
    public int Tag;
}

If a collection type needs to serialize any child data objects that it holds, it can implement the IWmlSerializableCollection interface:

public interface IWmlSerializableCollection : IWmlSerializable, 
                                                     IEnumerable
{
    IWmlSerializable Get (int id);
    void Remove (int id);
    void Add (IWmlSerializable item);
}

The id parameter corresponds to the GetHashCode() value that the IWmlSerializable instance returns. IWmlSerializableCollection instances may not contain null items, and its enumerator must return the IWmlSerialiable collection items in "id" order.

For example:

public class JobPosition : IWmlSerializable, IWmlSerializableCollection {

    /* other code here */
    private SortedList manages = new SortedList();

    public IWmlSerializable Get(int id)
    {
        return Get (id);
    }

    public void Remove (int id)
    {
        manages.Remove (id);
    }

    public void Add (IWmlSerializable item)
    {
        Add ((JobPosition) item);
    }

    public IEnumerator GetEnumerator()
    {
        return manages.Values.GetEnumerator();
    }
}

(Though generally you provide type safe versions of these methods and hide the IWmlSerializableCollection implementation.)

When a type implements IWmlSerializable and optionally IWmlSerializableCollection, it allows the Wml code to build a view over it in a class called WmlSerializableNode. Like Wml DOM nodes, this class implements IWmlNode which allows it to be treated the same way by WmlNodeReader and WmlNodeWriter as a Wml DOM. The overridden WmlNodeReader and WmlNodeWriter classes for IWmlSerializable instances are WmlSerializableNodeReader and WmlSerializableNodeWriter.

Every instance in an IWmlSerializable should be referenced only once by a serialized member or collection, e.g., no duplicate references to the same instance, or circular reference. This is validated by the WmlSerializableNodeReader and it raises an exception if you try to serialize an invalid data structure.

WmlSerializer

This is an abstract class containing static methods that perform various Wml related utility functions, mostly utilizing Wml readers and writers. There are eight kinds of methods (overridden to support WmlDocument or IWmlSerializable instances, or Wml readers and writers). In this list, "Wml Structure" refers to both WmlDocument and IWmlSerializable instances:

Equals - Compare two Wml structures for equality
Compare - Compare two Wml structures and produce a Wml difference document
Combine - Combine a Wml difference into an existing Wml structure
Copy - Copies the input from a WmlReader source to a target WmlWriter
Clone - Creates a deep copy of a Wml structure
Serialize - Saves a Wml structure to a writer or document
Deserialize - Loads a Wml structure from a reader or document
ToString - Saves a Wml structure to a string

Functionalities like "Equals" and "Clone" are a side effect of having a WmlNodeReader and WmlNodeWriter that can recurse through both a DOM and a IWmlSerializable structure, node by node. This is not possible in .NET XML serialization unless you generate a complete intermediate XML output first. Combining a difference document into an existing IWmlSerializable instance or a Wml DOM only adds, removes, or updates any differences; no other part of the data structure is changed.

Transactions

One of the main benefits of being able to calculate the difference between a before and after state is that you can roll back (or forward) any operation on a data object you may make. The only proviso being that the changes must keep the data structure in a valid serializable state, which mostly means that you have to ensure an object has a valid identity or GetHashCode() result before you add it to an existing data structure.

Rolling an object's state back and forth can be done manually using different methods on the WmlSerializer class (e.g., Serialize, Compare, and Combine), but the Wml library makes this simpler on IWmlSerializable instance using the WmlTransaction class.

For example:

WmlTransaction transaction = new WmlTransaction (myDataObject, 
                                          "my transaction name");
try
{
    // throws a validation exception if any changes are bad
    myDataObject.MakeChanges(); 
    // changes are committed and differences recorded
    transaction.Commit();  
}
catch (Exception)
{
    // changes are rolled back and differences discarded
    transaction.RollBack(); 
}

When created, the transaction object serializes the IWmlSerializable to a WmlDocument to hold the "previous state". The differences to the data object are calculated when the transaction is finished, and is used to roll the data object back if "RollBack" is called and then immediately discarded, or stored to provide a difference history when "Commit" is called.

Using individual transactions is somewhat inefficient, since the WmlTransaction has to serialize a copy of the previous state of data object when it is instantiated. If you are performing multiple transactions on your data, a better technique would be to use the WmlTransactionLog class which will cache and automatically update the previous state on any changes, as well as keep a copy of any committed transaction which you can roll back and forth for Undo/Redo.

For example:

WmlTransactionLog transactionLog = new WmlTransactionLog();
transactionLog.CurrentState = myDataObject;

WmlTransaction transaction1 = 
    transactionLog.BeginTransaction ("Change 1");
myDataObject.MakeChanges();
WmlTransaction transaction1a = 
    transactionLog.BeginTransaction ("Change 1 a");
myDataObject.MakeChanges();
transaction1a.RollBack();
transaction1.Commit();

WmlTransaction transaction2 = 
    transactionLog.BeginTransaction ("Change 2");
myDataObject.MakeChanges();
transaction2.Commit();

// Roll back our committed transactions to the 
// starting state
transactionLog.RollBack();
transactionLog.RollBack();

// Roll forward our committed transactions to the 
// finish state
transactionLog.RollForward();
transactionLog.RollForward();

As per this example, transactions can be nested provided they are rolled back or committed in the reverse order. Nested transactions are more expensive though, since a new "previous state" has to be calculated, like the cached version the transaction log keeps cannot be used. The transaction log also has a "Modified" event, which is raised every time its current state changes (e.g., when a transaction is committed), it is useful for knowing when to refresh a user interface.

Demo Application

The sample application illustrates how a simple class "JobPosition", which was used in previous examples, can be made Wml serializable. The demo user interface only lets you do one operation to it, "Modify" which randomly shuffles the data, adding and removing JobPositions to the parents collection and "DirectReports" reference, and modifying the descriptive information.

It demonstrates the following Wml serialization features:

Loading (Deserialize) and saving (Serialize) an IWmlSerializable data structure to a file using WmlSerializer.
Keeping a WmlTransactionLog instance to provide undo/redo functionality, and to notify the user interface when the data object was modified.
Using "WmlSerializer.Equals()" to test the data object against its previous state stored in a Wml document in the transaction log until it has been modified (because the randomize operation doesn't always change the data structure).

All this with very little code...

Conclusion

The Wml library was not written to replace .NET XML serialization. It's not that fast, flexible or robust (sorry). For, its intended purpose was to keep track of the changes to data structures, it might not be even as good as a custom solution, since every "Compare" for differences has to recurse over the entire data structure. It is probably more efficient to build a change log manually as you make your data structure modifications. Also, keeping the difference state for any data that changes a large percentage of its state in every operation is counter productive.

However, the Wml library does its job with very little support from the programmer. Any type that's currently XML serializable can be made serializable to Wml without much effort.

Some other uses that I have found for it:

Deep copies and equality tests.
Keeping a smart client synchronized by sending only the differences since the last update from the server.
Allowing validation code to be placed in the data structure itself, rather than having changes pre-validated before they are applied, since if there are any validation errors, it can be rolled back.
Undo/Redo functionality in applications.
Building multi-level modal dialogs that work safely on the main data object (because the main data object changes can be rolled back, or changes on a cloned structure merged in.)
Change logs.

I developed the Wml library on version 1.1 of the .NET Framework, and I have not tested it on generic types in version 2.0 of the .NET Framework yet. I don't foresee any problems, as long as reflection on generic types behaves consistently even on non-generic types, though I cannot confirm this.

Acknowledgements

Chris Beckett, for his article and code on the use of his nifty custom extender class for menu images for the demo.
Marc Clifton, for his simple serializer/deserializer article, for pointing me to use TypeConverter (I had been using reflection to clumsily look for a ToString() method and a static Parse() method on a primitive type before.)

History

26^th January, 2006 - Version 1.0

License

This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.

A list of licenses authors might use can be found here.

Written By

Wilfred Verkley

Web Developer

New Zealand

Im a Software Developer working in Auckland, New Zealand. When i was a lot shorter, i started programming in Atari Basic, though these days its mostly C#, and a bit of java (mostly the caffinated kind).

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

A Custom .NET XML Serialization Library

Introduction

Wml

Serializing Types

WmlSerializer

Transactions

Demo Application

Conclusion

Acknowledgements

History

License

Comments and Discussions