Click here to Skip to main content
15,881,882 members
Articles / Programming Languages / XML
Article

A Fast Serialization Technique

Rate me:
Please Sign up or sign in to vote.
4.75/5 (46 votes)
19 May 20064 min read 223.4K   3K   141   52
Transparently boosting serialization performance and shrinking the serialized object's size.

Introduction

Serialization is everywhere in .NET. Every parameter you pass to or from a remoted object, web service, or WCF service gets serialized at one end and deserialized at the other. So why write about fast serialization? Surely, the standard BinaryFormatter and SoapFormatter are pretty quick, aren't they?

Well, no. When passing a reasonably substantial object from one process to another using Remoting, we find that performance was topping out at 300 calls per second. Investigation showed that each serialization/deserialization cycle was taking 360 microseconds, which would be fine except that 300 per second means that 11% of the CPU is being consumed by the serialization alone!

Background

Some form of custom serialization would be an option. An object knows exactly what types of what fields it wants to serialize. It doesn't need all the general purpose overheads and Reflection to work this out and extract the data - it can do it all by itself, much more efficiently. The result is generally much more compact. There is an example in .Shoaib's article, which demonstrates these benefits.

The problem with custom serialization is that the interface is different, requiring the calling code to be changed. It also doesn't help the automated serialization in .NET's remote access mechanisms, unless you manually serialize to a byte array and then pass this as a parameter. This isn't very type-safe!

What I cover below is a simple way to retain the benefits of custom serialization, while retaining the standard serialization interface and all the benefits that confers.

Using the code

As is often the case in matters of complex serialization, the solution lies in implementing the ISerializable interface (see here for a primer). Here's a much simplified version of the object we are using:

C#
[Serializable]
public class TestObject : ISerializable {
  public long     id1;
  public long     id2;
  public long     id3;
  public string   s1;
  public string   s2;
  public string   s3;
  public string   s4;
  public DateTime dt1;
  public DateTime dt2;
  public bool     b1;
  public bool     b2;
  public bool     b3;
  public byte     e1;
  public IDictionary<string,object> d1;
}

To serialize an object, ISerializable requires us to implement GetObjectData to define the set of data to be serialized. The trick here is to use custom serialization to merge all the fields into a single buffer, then to add this buffer to the SerializationInfo parameter to be serialized by the standard formatters. This is how it's done:

C#
// Serialize the object. Write each field to the SerializationWriter
// then add this to the SerializationInfo parameter

public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
  SerializationWriter sw = SerializationWriter.GetWriter ();
  sw.Write (id1);
  sw.Write (id2);
  sw.Write (id3);
  sw.Write (s1);
  sw.Write (s2);
  sw.Write (s3);
  sw.Write (s4);
  sw.Write (dt1);
  sw.Write (dt2);
  sw.Write (b1);
  sw.Write (b2);
  sw.Write (b3);
  sw.Write (e1);
  sw.Write<string,object> (d1);
  sw.AddToInfo (info);
}

The SerializationWriter class extends BinaryWriter to add support for additional data types (DateTime and Dictionary) and to simplify the interface to SerializationInfo. It also overrides BinaryWriter's Write(string) method to allow for null strings. I won't go into the implementation detail here. There is lots of explanation in the code for those who are interested.

ISerializable also requires us to define a constructor to deserialize a stream to a new object. The process here is just as simple as that above:

C#
// Deserialization constructor. Create a SerializationReader from
// the SerializationInfo then extract each field from it in turn.

public TestObject (SerializationInfo info, StreamingContext ctxt) {
  SerializationReader sr = SerializationReader.GetReader (info);
  id1 = sr.ReadInt64 ();
  id2 = sr.ReadInt64 ();
  id3 = sr.ReadInt64 ();
  s1  = sr.ReadString ();
  s2  = sr.ReadString ();
  s3  = sr.ReadString ();
  s4  = sr.ReadString ();
  dt1 = sr.ReadDateTime ();
  dt2 = sr.ReadDateTime ();
  b1  = sr.ReadBoolean ();
  b2  = sr.ReadBoolean ();
  b3  = sr.ReadBoolean ();
  e1  = sr.ReadByte ();
  d1  = sr.ReadDictionary<string,object> ();
}

Similarly, SerializationReader extends BinaryReader for the same reasons as above.

Over time, I'll probably be extending the set of types which the writer and reader can handle efficiently. There are already the WriteObject() and ReadObject() methods which will write any arbitrary type, but this just falls back to standard binary serialization (unless it's one of the supported fast types).

Results

The test program included in the download simply creates and populates the TestObject, and times its serialization and deserialization, in microseconds per cycle, averaged over 250K cycles. All timings are done on a 1.5GHz Pentium M laptop. The results are:

 FormatterSize (bytes)Time (uS)
Standard serializationBinary2080364
Fast serializationBinary42174
Fast serializationSOAP1086308

So, the fast serialization technique below can cut both the size and serialization-deserialization time to about a fifth of the out-of-the box serialization. Even SOAP serialization (normally 2 to 3 times slower than binary) is faster than the standard binary serialization.

Summary

Combining custom serialization with ISerializable in this way delivers major performance gains without any change to the handling of the objects in question. It allows fast serialization to be transparently added to specific objects where a performance issue has been identified.

In our own case, throughput increased from 300 Remoting calls per second to over 700, just by changing this for one key object. No other changes were necessary.

There is also one other unexpected benefit from this. You'll notice that there are no comparative figures above for the SoapFormatter, which is because MS has not equipped the SoapFormatter to handle generic types. Using the technique above means that the SoapFormatter never sees the generic type which has been custom serialized to a byte array, so this restriction is removed.

Combining custom serialization with ISerializable is never going to be as fast as pure custom serialization alone. However, the added benefit of remaining within the standard serialization framework makes this a useful technique for boosting performance without impacting other code.

History

  • First version - 19 May 2005.

This is my first post on CodeProject - so please be gentle!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
AnswerRe: Interoperability between .NET and Java binary serialization Pin
Ennis Ray Lynch, Jr.5-Jan-07 7:47
Ennis Ray Lynch, Jr.5-Jan-07 7:47 
AnswerRe: Interoperability between .NET and Java binary serialization Pin
Mehdi Mousavi20-May-07 3:50
Mehdi Mousavi20-May-07 3:50 
QuestionSupport for DataSet? Pin
Michael B. Hansen22-Nov-06 1:17
Michael B. Hansen22-Nov-06 1:17 
GeneralEven Better... PinPopular
lorekd2-Nov-06 10:25
lorekd2-Nov-06 10:25 
QuestionInherited classes? Pin
iwasiunknown12-Oct-06 16:22
iwasiunknown12-Oct-06 16:22 
AnswerRe: Inherited classes? Pin
Tim Haynes15-Oct-06 6:03
Tim Haynes15-Oct-06 6:03 
GeneralImprove performance Pin
nuri h21-Aug-06 20:03
nuri h21-Aug-06 20:03 
GeneralFurther optimizations... Pin
SimmoTech8-Aug-06 3:41
SimmoTech8-Aug-06 3:41 
Hi Tim

Excellent work - I was going to do something similar myself but this gave me a great start for a proof of concept. Have you added any further optimizations since the release code?

Here are some optimizations I have found useful:

1) In ReadObject() - put a specific test for the null case at the top of the list rather than let it drop to the default case. In fact maybe these should put in order of likely usage.

2) BinarySerializer has a protected method for storing ints in a compact form. I have made it available with the following code. Very useful for things like counts where the number is usually small but could be large.
public new int Read7BitEncodedInt()
{
return base.Read7BitEncodedInt();
}

3) This one was the killer for me. I am serializing a lot of object[] objects which contained all values and was using WriteObject() but adding this specific code produced amazing results! Note the use of Read7BitEncodedInt as mentioned above - only takes a single byte for all of my usage! Do the same for ReadByteArray() and ReadChar() too.
public object[] ReadObjectArray()
{
int count = base.Read7BitEncodedInt();
object[] result = new object[count];
for(int i = 0; i < count; i++)
{
result[i] = ReadObject();
}
return result;
}

Future optimizations - I've not tried these yet but will soon.
1) The ObjType enum is only using a fraction of the 255 available entries. I am going to try using some for special casing. ie a ZeroInt32Type/ZeroInt16Type etc. (one for each numeric type). Possibly the same for One/MinValue/MaxValue. String could have EmptyStringType; DateTime could have MinValue/MaxValue and EmptyTrueBool/FalseBool would also save some space. Anything where a type and a 'common' value could be defined really.

2) In the ReadObjectArray() method above, I mentioned I am storing values read from a database and there may be cases where there are 'runs' of null values. By having a specific "NullListType" ObjType and storing a 7BitEncodedInt for runs of 3 or more null values there is potential for a lot of saved space depending on the data.

Cheers
Simon



Cheers
Simon

GeneralWrite( byte[] b ): Length vs LongLength Pin
Schlups12-Jul-06 22:40
Schlups12-Jul-06 22:40 
GeneralA little "error handling" Pin
RuneFS6-Jul-06 23:02
RuneFS6-Jul-06 23:02 
GeneralCode to support Guid Pin
kgbroce16-Jun-06 9:24
professionalkgbroce16-Jun-06 9:24 
GeneralRe: Code to support Guid Pin
Ram Cronus3-Jun-08 6:16
Ram Cronus3-Jun-08 6:16 
Generalbyte[] array gets skipped Pin
kgbroce16-Jun-06 9:14
professionalkgbroce16-Jun-06 9:14 
GeneralRe: byte[] array gets skipped Pin
Tim Haynes18-Jun-06 10:12
Tim Haynes18-Jun-06 10:12 
GeneralUse TypeCode for objType Pin
zuken215-Jun-06 18:08
zuken215-Jun-06 18:08 
GeneralRe: Use TypeCode for objType Pin
Tim Haynes8-Jun-06 22:07
Tim Haynes8-Jun-06 22:07 
GeneralNullable Types Pin
jmueller30-May-06 5:48
jmueller30-May-06 5:48 
GeneralRe: Nullable Types Pin
Tim Haynes8-Jun-06 22:23
Tim Haynes8-Jun-06 22:23 
GeneralGreat solution Pin
ScottEllisNovatex23-May-06 16:19
ScottEllisNovatex23-May-06 16:19 
GeneralRe: Great solution Pin
Frank Stegerwald23-May-06 20:51
Frank Stegerwald23-May-06 20:51 
GeneralRe: Great solution Pin
ScottEllisNovatex24-May-06 21:03
ScottEllisNovatex24-May-06 21:03 
GeneralRe: Great solution Pin
Frank Stegerwald24-May-06 22:59
Frank Stegerwald24-May-06 22:59 
JokeI like it ! Pin
kikos3120-May-06 0:39
kikos3120-May-06 0:39 
GeneralRe: I like it ! Pin
Rafael Nicoletti21-May-06 5:31
Rafael Nicoletti21-May-06 5:31 
GeneralGently: Wrong section Pin
Jerry Evans19-May-06 10:07
Jerry Evans19-May-06 10:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.