Click here to Skip to main content
15,867,308 members
Articles / Programming Languages / F#

How to Implement IXmlSerializable Correctly

Rate me:
Please Sign up or sign in to vote.
4.79/5 (42 votes)
26 Oct 2009BSD9 min read 305.3K   96   43
Describes the guidelines and pitfalls for implementing IXmlSerializable (.NET)

Introduction

Yeah, I know, this is yet another article about XML serialization... After having seen several issues in code using or demonstrating XML serialization on CodeProject (and having struggled with these issues myself!), I thought telling the community about the findings would be a good deed. After having seen the interest of people, I added some more examples in the form of source code.

There are many confusing things regarding the implementation of the IXmlSerializable interface. Even MSDN (at the time of this writing: 21.10.2009) adds confusion by publishing sample code for cases that are too simple and even these ones are wrongly implemented (See ReadXml and WriteXml from here as a starter, they work but are really wrong, you will maybe believe me after reading the full article). Many questions arise that took me a while to find a response to. That's the reason of being of this article.

Background

IXmlSerializable is composed of three methods:

  • GetSchema
  • ReadXml
  • WriteXml

The serializer created from the XML serialization attributes first has a look at if the type to be serialized implements this interface. If it is not implemented, then the public members and properties are analyzed and considered (or not thanks XmlIgnoreAttribute) for serialization.

This is a good starter. The article is clear and nicely written and introduces the main differences between attribute based serialization and implementing IXmlSerializable. IXmlSerializable.aspx is also worth reading.

After having read this article, by getting back to the other articles mentioned above, I hope you will be able to see the implementation mistakes made therein. The code works well as long as the classes do not get extended and as long as you do not mix serialization procedures. I made it all wrong too from the beginning until I dug into the problems...

This article is more or less written like a FAQ to serve as a quick reference. It should answer the most important questions one might have (or should have, hehe) asked himself regarding the implementation of IXmlSerializable. If you have more questions, please don't hesitate to contact me. I use C# as programming language. I did my best to avoid mentioning the language too much, actually this information is good for all .NET targeted languages.

Sample

To better support explanations, I introduce an example that contains many of the pitfalls that one may encounter during XML serialization. We want to serialize and deserialize animals stored as a collection in a farm. More interesting than foos and bars or?

Following aspects are present:

  1. Empty element in XML
  2. Collection interface to be serialized
  3. The collection contains elements of different types derived from a base class

Classes

C#
public abstract class Animal
{
	public Animal() { }
	public String Name { get; set; }
	public DateTime Birthday { get; set; }
}
public class Dog : Animal
{
	public Dog() { }
}
public class Cat : Animal
{
	public Cat() { }
}
public class Mouse : Animal
{
	public Mouse() { }
}
public class Farm
{
	public Farm() { Animals = new List<Animal>(); }
	public IList<Animal> Animals { get; private set; }
}

XML Snippet

XML
<Farm>
  <Dog Name="Rex">
    <Birthday>2009-10-22</Birthday>
  </Dog>
  <Cat Name="Tom">
    <Birthday>1940-06-15</Birthday>
  </Cat>
  <Mouse Name="Jerry" />
</Farm>

Shall GetSchema() Really Always Return Null?

YES! GetSchema() shall ALWAYS return null. This is sufficient in most cases. If you really need to provide a Schema, then use XmlSchemaProviderAttribute. GetSchema() might still be used by some legacy code or internally by .NET types, but you should not use it. It is safe and good to return null. People telling you it could be important to implement it are liars! :-)

How to Implement WriteXml?

That's the easy part, rather straight forward:

  1. Write out all attributes
  2. Write out all elements and sub objects

BUT don't write the wrapper element! That's the job of the calling code.

For our example, it means that the Dog class shall write the attribute "Name", then its element "Birthday". The Dog class shall however NOT write the "Dog" start element or its end element.

This code shows how to correctly handle all animals during WriteXml:

C#
public void WriteXml(System.Xml.XmlWriter writer)
{
	writer.WriteAttributeString("Name", Name);
	if (Birthday != DateTime.MinValue)
		writer.WriteElementString("Birthday",
			Birthday.ToString("yyyy-MM-dd"));
}

How to Implement ReadXml?

ReadXml shall read the attributes first and then consume the wrapper element by calling ReadStartElement(). Consuming the end tag of the wrapper shall also be done inside ReadXml by calling ReadEndElement(). This sounds rather counter intuitive because WriteXml shall not write the wrapper element! But it becomes clearer if you consider reading attributes: attributes can only be read before consuming the start element they are defined for and you need to know the element name from outside the class to create a class of the correct type. NOTE: Take care of empty elements! (See below.)

For our example, it means that the Dog class shall move to the content and read the attribute "Name". Then it shall read the start element ("Dog" element is consumed but do not specify it namely). Read the elements inside the class like "Birthday" and finally consume the end element. This omits the correct handling of the case when the element is empty (no birthday specified like for Jerry for simplicity).

This code shows how to correctly handle all animals during ReadXml:

C#
public void ReadXml(System.Xml.XmlReader reader)
{
	reader.MoveToContent();
	Name = reader.GetAttribute("Name");
	Boolean isEmptyElement = reader.IsEmptyElement; // (1)
	reader.ReadStartElement();
	if (!isEmptyElement) // (1)
	{
		Birthday = DateTime.ParseExact(reader.
			ReadElementString("Birthday"), "yyyy-MM-dd", null);
		reader.ReadEndElement();
	}
}

Are There Any Gotchas for the Implementations?

Quite a few actually:

  1. Take care of the current Culture while using ToString() inside WriteXml & reading back in ReadXml.
  2. Do not write the wrapper element in WriteXml but read it inside ReadXml!
  3. Handle empty elements correctly during deserialization.

Gotcha number 1 triggers in the case of dates, floating point values, .. that are written differently depending on the culture. In English speaking countries, it would probably display something like 10/22/2009 for Rex´s birthday. If you save that file like that and open it on another machine with a different locale, you'll get into trouble. I prefer always to specify a fixed format with the Date Time format specification for example. (A short C# Format specification Cheat Sheet I use is located here.)

Gotcha number 2 triggers if you mix both attribute driven serialization with IXmlSerializable implementation for some classes.

Gotcha number 3 triggers if elements are empty or omitted (such a surprise!).

Why are ReadXml and WriteXml Behaving Asymmetrically?

The implementation choice is good and justified because:

  1. The caller code must foresee what is the element name to create a new object of the appropriate type to be filled in during deserialization.
  2. You must be able to handle attributes in ReadXml, so the wrapping tag must not be consumed yet.
  3. You must be able to define the name of the wrapping tag from outside to allow the same type to be serialized into different container tags.

The second point is similar to saying that the rubbish does not need to know into which bin it gets into. It must only know how to describe itself and to sort itself you after you see the bin. The only counter-intuitive item is that in the case of ReadXml the rubbish opens the bin itself! But it doesn't need to know how the bin called: no argument is used for the name in ReadStartElement().

How to Handle Empty Elements?

I must say I did not find any elegant way of handling the deserialization of empty elements. No matter what I tried, I always had to perform an additional test. I found no method in the API that could help me. A suggestion to Microsoft would be to add a Boolean return value to ReadStartElement() that returns false if the element is empty. If you have an empty element, you can detect it before reading it. If you have one, then DO NOT call ReadEndElement().

For our example, it means that the Dog class shall move to the content and read the attribute "Name". BUT now comes the little difference. Store the result of IsEmptyElement into a boolean variable. Then read the start element ("Dog" element is consumed but do not specify it namely). Only if the boolean is not true, read the elements inside the class like "Birthday" and consume the end element. I really mean it, do NOT read the end element if the boolean is true. You could erroneously consume the next closing tag like in the case of the "Mouse" where you would also consume "</Animals>".

What are the Limitations of the XML Serialization Attributes?

  • Mixed mode is not supported: all text attributes get merged into a single part and you lose positional information during deserialization.
  • Serialization of interfaces is not possible, there is no declaration allowing to choose a concrete implementation for the interface.
  • Requirements on the objects have to be met (public fields and properties, default constructor, ... see this link).
  • Many .NET data structures cannot be serialized (only ICollection and IEnumerable implementations, not Dictionary for example).
  • Dynamic behaviour is not possible, it is type oriented and you cannot change the serialization depending on dynamic constraints. What if, for example, you are not interested in deserializing everything in some cases? Then IXmlSerializable saves you.

How to Realize the Same with XML Attributes?

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.ComponentModel;
using System.Text;
using System.Xml.Serialization;

namespace XmlWithAttributes
{
    public class Animal
    {
        public Animal() { }

        [XmlAttribute]
        public String Name { get; set; }

        [DefaultValue(typeof(DateTime), "0001-01-01T00:00:00")]
        public DateTime Birthday { get; set; }
    }
    public class Dog : Animal
    {
        public Dog() { }
    }
    public class Cat : Animal
    {
        public Cat() { }
    }
    public class Mouse : Animal
    {
        public Mouse() { }
    }
    public class Farm
    {
        public Farm() { Animals = new List<Animal>(); }

        [XmlElement("Dog", typeof(Dog))]
        [XmlElement("Cat", typeof(Cat))]
        [XmlElement("Mouse", typeof(Mouse))]
        public List<Animal> Animals { get; set; }
    }
}

The generated XML looks like this:

XML
<?xml version="1.0"?>
<Farm xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance 
	xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Dog Name="Rex">
    <Birthday>2009-10-22T00:00:00</Birthday>
  </Dog>
  <Cat Name="Tom">
    <Birthday>1940-06-15T00:00:00</Birthday>
  </Cat>
  <Mouse Name="Jerry" />
</Farm>

There are some limitations though. The animals need to be stored inside a List, IList won't work, interfaces cannot be serialized. All types must be public. The date format cannot be modified by attribute declaration. To overcome these limitations, the use of an implementation of IXmlSerializable is an easy way to go.

A mixed attribute and IXmlSerializable implementation looks like this:

C#
public class Animal : IXmlSerializable
{
	public Animal() { }
	public String Name { get; set; }
	public DateTime Birthday { get; set; }

	public System.Xml.Schema.XmlSchema GetSchema() { return null; }

	public void ReadXml(System.Xml.XmlReader reader)
	{
		reader.MoveToContent();
		Name = reader.GetAttribute("Name");
		Boolean isEmptyElement = reader.IsEmptyElement; // (1)
		reader.ReadStartElement();
		if (!isEmptyElement) // (1)
		{
			Birthday = DateTime.ParseExact(reader.
				ReadElementString("Birthday"), "yyyy-MM-dd", null);
			reader.ReadEndElement();
		}
	}

	public void WriteXml(System.Xml.XmlWriter writer)
	{
		writer.WriteAttributeString("Name", Name);
		if (Birthday != DateTime.MinValue)
			writer.WriteElementString("Birthday",
				Birthday.ToString("yyyy-MM-dd"));
	}
}
public class Dog : Animal
{
	public Dog() { }
}
public class Cat : Animal
{
	public Cat() { }
}
public class Mouse : Animal
{
	public Mouse() { }
}
public class Farm
{
	public Farm() { Animals = new List<Animal>(); }

	[XmlElement("Dog", typeof(Dog))]
	[XmlElement("Cat", typeof(Cat))]
	[XmlElement("Mouse", typeof(Mouse))]
	public List<Animal> Animals { get; set; }
}

The ReadXml() method is here tricky to implement. If you followed the guidelines correctly, the code should be similar to what is written above. If you omit the handling of an empty element (lines commented with "(1)"), deserializing the sample XML breaks on parsing "Jerry". The WriteXml() method is simple and ok, in this example it is difficult to do it in another way. But one could be tempted by writing the surrounding element in there if it were a simpler case. Here you see why it would not work in general.

The implementation overcomes the date/time issue but still has the list as a concrete class, and all members must still be public in the Farm. Note that we could already make the setters private (Name, Birthday) in the Animal class.

How to Deserialize XML Fragments?

I had to solve this to read so-called streamed XML. Not sure if it is really standard but I had to perform such a task for some projects where, let's explain it in a generic way, a source streams objects in XML all the time without a surrounding main tag. That means that actually the document would be invalid. There is a way to handle the fragments easily without having to embed them into an artificial tag. I have to dig out the code I once wrote or retry to get it right again...

This article also gives some ideas about solving this.

Feel free to ask questions and add comments, your feedback is precious to me. :-).

History

  • 2009-10-24 Added code samples and more details
  • 2009-10-22 First version released

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Technical Lead Alpine Electronics
Germany Germany
As a perfectionist I enjoy working for a company aiming at excellence and where precision matters. In my opinion, writing software that works is good but writing extensible and maintainable software is better. Having a strong Architecture and Design is the first step towards a solid re-usable software platform.

Website: https://www.color-of-code.de

Comments and Discussions

 
QuestionBoolean values and ToString. Pin
Member 1185301913-Jan-23 1:09
Member 1185301913-Jan-23 1:09 
QuestionHow to serialize/deserialize private List? Pin
Ganther823-Aug-19 3:30
Ganther823-Aug-19 3:30 
AnswerRe: How to serialize/deserialize private List? Pin
Jaap de Haan5-Nov-20 8:35
Jaap de Haan5-Nov-20 8:35 
QuestionGreat post, I have something to add Pin
Member 1053125810-Feb-17 6:49
Member 1053125810-Feb-17 6:49 
Questioncan this instance XML by produced by a valid schema? Pin
Code Chewer23-Sep-13 16:08
Code Chewer23-Sep-13 16:08 
GeneralMy vote of 5 Pin
Flebite21-Jun-13 16:07
Flebite21-Jun-13 16:07 
QuestionWhat about ReadSubtree with empty elements? Pin
Nathan Holt13-Mar-13 10:50
Nathan Holt13-Mar-13 10:50 
GeneralHow to serialize derived classes with elements and attributes Pin
MarkWoodard2-Jul-12 6:48
MarkWoodard2-Jul-12 6:48 
GeneralRe: How to serialize derived classes with elements and attributes Pin
oliwan26-Jan-18 6:52
oliwan26-Jan-18 6:52 
GeneralMy vote of 5 Pin
dmayer22-Mar-12 3:26
professionaldmayer22-Mar-12 3:26 
QuestionWhy does XmlSerializer work on a single object but not on a list of the same? Pin
Member-472225919-Aug-11 22:59
Member-472225919-Aug-11 22:59 
Generalgreat article - but still some questions Pin
FPDave21-Mar-11 14:10
FPDave21-Mar-11 14:10 
GeneralRe: great article - but still some questions Pin
Jaap de Haan23-Mar-11 3:56
Jaap de Haan23-Mar-11 3:56 
GeneralRe: great article - but still some questions Pin
FPDave24-Mar-11 0:45
FPDave24-Mar-11 0:45 
If i follow what you say then it doesnt move off the first PropertyValue element and is stiuck in an endless loop, but if I add a ReadStartElement after your "// read the property value" comment then its fine, as it moves on to the next PropertyValue element. I suspect you assumed I might be reading PropertyValue using a specific class, but the property values are actually stored in a List of name value pairs.

My working code now looks like:
public void ReadXml(XmlReader reader)
{
    reader.MoveToContent();

    string v = reader.GetAttribute("id");
    this.ID = Int32.Parse(v);
    v = reader.GetAttribute("TypeOfEntity");

    bool isEmptyElement = reader.IsEmptyElement;
    reader.ReadStartElement();

    if (!isEmptyElement)
    {
        // now for each of the property value elements)
        while (reader.IsStartElement(constXmlPropertyValueElementName))
        {
            string propname = reader["Name"];
            string propval = reader["Value"];
            // store the property values (omitted)
            reader.ReadStartElement();
        }
        reader.ReadEndElement();
    }
}


anyway, its now working fine thanks to your input, much appreciated
GeneralFantastic Article! Pin
Mark Olbert29-Jan-11 6:03
Mark Olbert29-Jan-11 6:03 
GeneralMy vote of 5 Pin
Mario Majčica9-Jan-11 22:47
professionalMario Majčica9-Jan-11 22:47 
GeneralGood article Pin
Empolized21-Sep-10 22:20
Empolized21-Sep-10 22:20 
GeneralMy vote of 2 Pin
goldsam14-Sep-10 9:08
goldsam14-Sep-10 9:08 
GeneralRe: My vote of 2 PinPopular
Jaap de Haan2-Jan-11 22:05
Jaap de Haan2-Jan-11 22:05 
QuestionGetSchema shall always return null. Why? Pin
Mathieu Cartoixa30-Aug-10 22:05
Mathieu Cartoixa30-Aug-10 22:05 
AnswerRe: GetSchema shall always return null. Why? Pin
Jaap de Haan31-Aug-10 9:27
Jaap de Haan31-Aug-10 9:27 
GeneralRe: GetSchema shall always return null. Why? Pin
Mathieu Cartoixa31-Aug-10 20:21
Mathieu Cartoixa31-Aug-10 20:21 
QuestionProcess collection with the IXmlSerializable approach Pin
NaeemIsmail27-Apr-10 22:47
NaeemIsmail27-Apr-10 22:47 
AnswerRe: Process collection with the IXmlSerializable approach Pin
Stefan Cronert28-Apr-10 2:51
Stefan Cronert28-Apr-10 2:51 
AnswerRe: Process collection with the IXmlSerializable approach Pin
Jaap de Haan29-Apr-10 11:11
Jaap de Haan29-Apr-10 11:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.