Custom Serialization using the SOAP Formatter: The Basics

Rudolf Jan

2.85/5 (3 votes)

Aug 19, 2007

CPOL

6 min read

43904

261

A tutorial on custom serialization using the SOAP formatter

Download source - 24.71 KB

Introduction

Serialization always has been a tricky topic for programmers. Using C++ and MFC, I managed to develop a method of working that minimized the amount of effort needed. When I switched to C# and .NET programming, I noticed things got much worse instead of better. Microsoft promotes using automatic serialization, but if you want to do more fancy things like using versioning, you cannot use this technology. Custom serialization is a reasonable alternative. It took me quite some time to understand custom serialization with enough detail to be useful for my purpose. In the first part of this article, I will cover the basics. In the second part, I will cover advanced topics, like serializing arrays, derived objects and versioning.

Background

Serialization is used to store the state of an object tree to a file and retrieve this object tree later. Microsoft designed a complex set of tools to accomplish serialization. I will not cover all of this. The way objects are stored is defined by a Formatter. In the .NET Framework, two formatters are used: The SOAP Formatter stores the object tree in XML code. It is especially suitable for web applications. The Binary Formatter uses a machine readable format. Microsoft decided to stop development on the SOAP Formatter. One of the consequences is that the SOAP Formatter does not support Generic types.

In this article, I will use the SOAP Formatter. The main reason is that you can review the formatted XML code, which helps a lot in debugging. For production, it is very easy to switch to the Binary Formatter, which produces much smaller files.

The easiest way to use serialization is to use automatic serialization. Using attributes, you can customize this a little bit and even add primitive versioning. In my second sample, I will use versioning, serialization of an array with objects of different types and serialization of derived classes. I use custom serialization, which allows much more control over the serialization process. I also think custom serialization creates better readable code than using attributes. When using attributes to control over serialization is spread all over your code. I do not like that very much. Unfortunately custom serialization requires a lot more code than you would need for the same task in MFC.

Using the Code

I provide one single solution space, with two sample projects. Serialization1 demonstrates the basics for custom serialization. The project Serialization2 is a much more extensive example. For Serialization1, I will explain the code in detail in this article. Serialization2 will be covered in part 2 of this article. Both projects use a single file named program.cs for all classes. I used a simple console application to keep things as simple as possible.

Both articles are included in the demo project.

Basic Custom Serialization

The sample Serialization1 uses two classes. The class ObjectList is the root class for serialization. This class contains two objects of the type MyObject. First you must include a using statement to the Soap formatter by including this code:

using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Soap;

The first line is required to allow IO functions. The others define serialization. Do not forget to add the reference to the Soap formatter to your project.

The next step is to include serialization in the Main function:

    ObjectList MyObjectList=new ObjectList();
    FileStream WriteStream=new FileStream("test2.xml",FileMode.Create);
    SoapFormatter serformatter=new SoapFormatter();
    serformatter.Serialize(WriteStream,MyObjectList);
    WriteStream.Close();

The first line creates the ObjectList, which is the root object for serialization. Next, we must create an output stream, which is used by the Soap formatter. As you see, creating the Soap formatter is trivial. Using it is also trivial. Use the Serialize method of the SoapFormatter class, with the stream and root object as parameters.

The next step is to tell all objects how to serialize themselves. We must set the [serializable] attribute for the ObjectList class. Also we must implement the ISerializable interface by declaring it in the class declaration.

    [Serializable]
    public class ObjectList:ISerializable

In order to implement the ISerializable interface, we must implement a method and a special serialization constructor. The method is used for serialization, the constructor is used for deserialization. Below we show the serialization method, which is called GetObjectData(info,context):

    public virtual void GetObjectData(SerializationInfo info,StreamingContext context)
        {
        info.AddValue("Test",TestString);
        info.AddValue("Object1",object1);
        info.AddValue("Object2",object2);
        }

The info objects build an internal data structure that hold all data to be serialized. After writing all data to this structure, the Soap formatter will write the info object to the stream.

As you see in the simple example, you can use AddValue to add a Type to the info object. AddValue has overloaded methods for many value types and for the Object type. You see I used them both here. The first parameter of the AddValue method specifies the identifier that is used to store the object. You cannot use the same name twice, because the formatter does not support two objects with the same identifier to be stored. This is a very annoying "feature". As a consequence, there is no easy way to store an array of objects.

object1 and object2 are of the type MyObject. In the same way we implement serialization for the ObjectList class, we have to specify the GetObjectData method for the MyObject class:

    public virtual void GetObjectData(SerializationInfo info,StreamingContext context)
        {
        info.AddValue("Idx",idx);
        }

This is not very exciting. As you see, MyObject only has one member that must be serialized, named idx. Let's have a look now at the resulting XML code. Skipping the header stuff, this is the result:

<SOAP-ENV:Body>
<a1:ObjectList id="ref-1" xmlns:a1=....>
    <Test id="ref-3">Test string</Test>
    <Object1 href="#ref-4"/>
    <Object2 href="#ref-5"/>
</a1:ObjectList>
<a1:MyObject id="ref-4" xmlns:a1=....>
    <Idx>1</Idx>
</a1:MyObject>
<a1:MyObject id="ref-5" xmlns:a1=....>
    <Idx>2</Idx>
</a1:MyObject>
</SOAP-ENV:Body>

You can easily recognize the identifiers we have used. You will also notice that for objects, a reference is constructed. Type information for value types is lacking in this XML code. You may like to use XML Notepad as a viewer for XML code. You can download it free at the Microsoft site.

Basic Custom Deserialization

Now we proceed to the deserialization process. Again, working top down, we start creating a stream and formatter:

    ObjectList MyNewObjectList=null;
    SoapFormatter deserformatter=new SoapFormatter();
    FileStream ReadStream=new FileStream("test.xml",FileMode.Open);
    MyNewObjectList=(ObjectList)deserformatter.Deserialize(ReadStream);
    ReadStream.Close();

As you see, this is straight forward. The second last line is interesting. You see here an assignment to the still uninitialized object MyNewObjectList. The effect of this statement is that a special constructor for the class ObjectList is invoked. This special constructor makes that the object is created from the serialization information in the stream.

This is what this special constructor looks like:

protected ObjectList(SerializationInfo info,StreamingContext context)
    {
    TestString=(String)info.GetString("Test");
    Type t=Type.GetType("Serialization.MyObject");
    object1=(MyObject)info.GetValue("Object1",t);
    object2=(MyObject)info.GetValue("Object2",t);
    }

Notice:

The constructor is protected
It has two parameters, a SerializationInfo object, which holds the serialization tree and a StreamingContext

In the first line, you see the string is extracted. In the GetString method, you must use exactly the same identification string as you used in the GetObjectData method. Because errors will not be detected at design time, typing errors can be a source of trouble. Maybe a better solution is to define constants for each string. Then you will be warned about mistakes.

In order to deserialize objects, you must use the generic method GetType. This method requires a Type object as second parameter. Therefore you must first create this type object. The compiler will not check if the Type objects represents an existing type. Typing errors will result in runtime exceptions. Unfortunately you cannot create a Type object from an existing object type using a static method. That's why you need this clumsy method.

For the MyObject class, we use the same procedure to create the special constructor:

protected MyObject(SerializationInfo info,StreamingContext context)
    {
    idx=info.GetInt32("Idx");
    }

Converting to the Binary Formatter

Converting your code to the binary formatter requires only three minor code changes:

Replace...

    using System.Runtime.Serialization.Formatters.Soap;

... with:

    using System.Runtime.Serialization.Formatters.Binary;

Replace the class name SoapFormatter with the class name BinaryFormatter.

Of course, it is recommended to do some testing afterwards. You never can be sure the binary formatter is really compatible with the Soap formatter.

History

18^th August, 2007: First release