Introduction
If you want to know how to get your application to save information to disk or the registry, then a quick skim through MSDN magazine or a quick search on newsgroups will give you the answer: serialization.
Mark your classes with the [Serializable]
attribute and there you go. It’s a simple matter of creating a Formatter
and a Stream
and a couple of lines later it’s done. Alternatively, you could mark up your class with the necessary attributes and use XML Serialization.
All very simple, but unfortunately all very wrong. There are a number of reasons why you should not opt for the simple approach. Here are nine important ones.
1. It forces you to design your classes a certain way
XML serialization only works on public methods and fields, and on classes with public constructors. That means your classes need to be accessible to the outside world. You cannot have private or internal classes, or serialize private data. In addition, it forces restrictions on how you implement collections.
2. It is not future-proof for small changes
If you mark your classes as [Serializable]
, then all the private data not marked as [NonSerialized]
will get dumped. You have no control over the format of this data. If you change the name of a private variable, then your code will break.
You can get around this by implementing the ISerializable interface. This gives you much better control of how data is serialized and deserialized. Unfortunately …
3. It is not future-proof for large changes
Type information is stored as part of the serialization information. If you change your class names or strong-name your assemblies, you’re going to hit all sorts of problems. Even if you manage to code the necessary contortions to get round this, you’re going to find that …
4. It is not future-proof for massive changes
.NET isn’t going to be around in five years or so. If you start implementing the ISerializable
interface in your code now, then its tendrils are going to be everywhere in five years’ time. Your code is going to be full of little hacks to cope with version changes, class re-naming, refactoring, etc. Some time in the future, .NET will be superseded by something even more wonderful. Nobody knows what this something wonderful will be, but you can bet that writing code-read data serialized by version 1.1 of the .NET framework is going to be a pig. I wrote some VB6 code 5 years ago and used the Class_ReadProperties
and Class_WriteProperties
events to access PropertyBag
objects. A neat, easy way of storing information to disk, I thought. And it was, until .NET came along and then I was stuck.
5. It is not secure
Using XML serialization is inherently insecure. Your classes need to be public, and they need to have public properties or fields. In addition, XML serialization works by creating temporary files. If you think you’re creating temporary representations of your data (for example, to create a string that you’re going to post to a web service), then files on disk will pose a potential security risk. If, instead, you implement the ISerializable
interface and are persisting sensitive internal data, then, even if you’re not exposing private data through your classes, anyone can serialize your data to any file and read it that way, since GetObjectData
is a public method.
6. It is inefficient
XML is verbose. And, if you are using the ISerializable interface, type information gets stored along with data. This makes serialization very expensive in terms of disk space.
7. It is a black box
The odds are you don’t really know how serialization works. I certainly don’t. This means that there are going to be all sorts of quirks and gotchas that you can’t even conceive of when you start using it. Did you know that XML serialization actually uses the CodeDom
? When you think you’re creating a bunch of XML, .NET is actually doing some sort of compilation. What are the implications of that? The only thing I know is that I will not know about them until it’s too late.
8. It is slow
When I did some research for a previous article (http://www.devx.com/dotnet/Article/16099/0), I noticed a few interesting things. I wrote a class that contained two double values. I created 100,000 instances of this class, stored them to disk, and then read them back again. I did this two ways. First of all, I did it the “proper” way, by implementing ISerializable, creating a BinaryFormatter, and using the Serialize
and Deserialize
methods. Secondly, I did it the “dirty” way, by blasting the data straight out into a Stream
. Which way was faster? Perhaps not surprisingly, the dirty way. About 50 times faster. Surprised? I was.
9. It is weird
ISerializable
does a lot of cunning work. This means that it doesn’t necessarily behave the way you might expect. When you deserialize a collection of objects, for example, the constructors won’t get called in the order that you might think. Take the following code sample:
using System;
using System.Runtime.Serialization;
using System.Collections;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;
class Class1
{
static void Main(string[] args)
{
ParentClass c1=new ParentClass();
BinaryFormatter f=new BinaryFormatter();
MemoryStream m=new MemoryStream();
f.Serialize(m, c1);
m.Seek(0, SeekOrigin.Begin);
ParentClass newClass=(ParentClass)f.Deserialize(m);
Console.WriteLine("Deserialized\r\n{0}", newClass.ToString());
Console.WriteLine("Press [Enter]");
Console.ReadLine();
}
}
[Serializable]
class ParentClass : ISerializable
{
private ArrayList m_Collection;
public ParentClass()
{
m_Collection=new ArrayList();
m_Collection.Add(new ChildClass("Hello World!"));
m_Collection.Add(new ChildClass("Hello again!"));
}
public override string ToString()
{
string s="";
foreach (ChildClass c in m_Collection)
{
s=s+c.ToString()+"\r\n";
}
return s;
}
public ParentClass(SerializationInfo info, StreamingContext context)
{
m_Collection=(ArrayList)info.GetValue("Collection",
typeof(ArrayList));
Console.WriteLine("Just deserialized items:");
foreach (ChildClass c in m_Collection)
{
Console.WriteLine("{0}", c.ToString());
}
}
public void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("Collection", m_Collection);
}
}
[Serializable]
class ChildClass : ISerializable
{
private string m_TestString;
public ChildClass(string testString)
{
m_TestString=testString;
}
public string TestString
{
get
{
return m_TestString;
}
}
public override string ToString()
{
return m_TestString;
}
public ChildClass(SerializationInfo info, StreamingContext context)
{
Console.WriteLine("Deserializing a child class");
m_TestString=info.GetString("v");
}
public void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("v", m_TestString);
}
}
This code essentially serializes and de-serializes a parent object that contains a collection of child objects. You cannot, however, access the child objects from within the deserialization constructor of the parent object. The m_Collection
object has been created, a value has been assigned to it, and info.GetValue(“Collection”, typeof(ArrayList))
has been called, but the m_Collection
object does not contain any child objects. This is necessary given the way that serialization works, but it is not obvious behaviour. This, and other things, means that using serialization can be non-intuitive, and very hard to debug.
Have no regrets
Although .NET provides a number of quick and easy ways to serialize and deserialize data, do not use them. A week, a month, a year, or five years down the line you will regret it.
ANTS Profiler, the simple code profiling tool from Red Gate Software, will find bottlenecks in your apps and tell you what your code is really doing.