Click here to Skip to main content
15,867,939 members
Articles / Programming Languages / XML
Article

A Fast Serialization Technique

Rate me:
Please Sign up or sign in to vote.
4.75/5 (46 votes)
19 May 20064 min read 222.7K   3K   141   52
Transparently boosting serialization performance and shrinking the serialized object's size.

Introduction

Serialization is everywhere in .NET. Every parameter you pass to or from a remoted object, web service, or WCF service gets serialized at one end and deserialized at the other. So why write about fast serialization? Surely, the standard BinaryFormatter and SoapFormatter are pretty quick, aren't they?

Well, no. When passing a reasonably substantial object from one process to another using Remoting, we find that performance was topping out at 300 calls per second. Investigation showed that each serialization/deserialization cycle was taking 360 microseconds, which would be fine except that 300 per second means that 11% of the CPU is being consumed by the serialization alone!

Background

Some form of custom serialization would be an option. An object knows exactly what types of what fields it wants to serialize. It doesn't need all the general purpose overheads and Reflection to work this out and extract the data - it can do it all by itself, much more efficiently. The result is generally much more compact. There is an example in .Shoaib's article, which demonstrates these benefits.

The problem with custom serialization is that the interface is different, requiring the calling code to be changed. It also doesn't help the automated serialization in .NET's remote access mechanisms, unless you manually serialize to a byte array and then pass this as a parameter. This isn't very type-safe!

What I cover below is a simple way to retain the benefits of custom serialization, while retaining the standard serialization interface and all the benefits that confers.

Using the code

As is often the case in matters of complex serialization, the solution lies in implementing the ISerializable interface (see here for a primer). Here's a much simplified version of the object we are using:

C#
[Serializable]
public class TestObject : ISerializable {
  public long     id1;
  public long     id2;
  public long     id3;
  public string   s1;
  public string   s2;
  public string   s3;
  public string   s4;
  public DateTime dt1;
  public DateTime dt2;
  public bool     b1;
  public bool     b2;
  public bool     b3;
  public byte     e1;
  public IDictionary<string,object> d1;
}

To serialize an object, ISerializable requires us to implement GetObjectData to define the set of data to be serialized. The trick here is to use custom serialization to merge all the fields into a single buffer, then to add this buffer to the SerializationInfo parameter to be serialized by the standard formatters. This is how it's done:

C#
// Serialize the object. Write each field to the SerializationWriter
// then add this to the SerializationInfo parameter

public void GetObjectData (SerializationInfo info, StreamingContext ctxt) {
  SerializationWriter sw = SerializationWriter.GetWriter ();
  sw.Write (id1);
  sw.Write (id2);
  sw.Write (id3);
  sw.Write (s1);
  sw.Write (s2);
  sw.Write (s3);
  sw.Write (s4);
  sw.Write (dt1);
  sw.Write (dt2);
  sw.Write (b1);
  sw.Write (b2);
  sw.Write (b3);
  sw.Write (e1);
  sw.Write<string,object> (d1);
  sw.AddToInfo (info);
}

The SerializationWriter class extends BinaryWriter to add support for additional data types (DateTime and Dictionary) and to simplify the interface to SerializationInfo. It also overrides BinaryWriter's Write(string) method to allow for null strings. I won't go into the implementation detail here. There is lots of explanation in the code for those who are interested.

ISerializable also requires us to define a constructor to deserialize a stream to a new object. The process here is just as simple as that above:

C#
// Deserialization constructor. Create a SerializationReader from
// the SerializationInfo then extract each field from it in turn.

public TestObject (SerializationInfo info, StreamingContext ctxt) {
  SerializationReader sr = SerializationReader.GetReader (info);
  id1 = sr.ReadInt64 ();
  id2 = sr.ReadInt64 ();
  id3 = sr.ReadInt64 ();
  s1  = sr.ReadString ();
  s2  = sr.ReadString ();
  s3  = sr.ReadString ();
  s4  = sr.ReadString ();
  dt1 = sr.ReadDateTime ();
  dt2 = sr.ReadDateTime ();
  b1  = sr.ReadBoolean ();
  b2  = sr.ReadBoolean ();
  b3  = sr.ReadBoolean ();
  e1  = sr.ReadByte ();
  d1  = sr.ReadDictionary<string,object> ();
}

Similarly, SerializationReader extends BinaryReader for the same reasons as above.

Over time, I'll probably be extending the set of types which the writer and reader can handle efficiently. There are already the WriteObject() and ReadObject() methods which will write any arbitrary type, but this just falls back to standard binary serialization (unless it's one of the supported fast types).

Results

The test program included in the download simply creates and populates the TestObject, and times its serialization and deserialization, in microseconds per cycle, averaged over 250K cycles. All timings are done on a 1.5GHz Pentium M laptop. The results are:

 FormatterSize (bytes)Time (uS)
Standard serializationBinary2080364
Fast serializationBinary42174
Fast serializationSOAP1086308

So, the fast serialization technique below can cut both the size and serialization-deserialization time to about a fifth of the out-of-the box serialization. Even SOAP serialization (normally 2 to 3 times slower than binary) is faster than the standard binary serialization.

Summary

Combining custom serialization with ISerializable in this way delivers major performance gains without any change to the handling of the objects in question. It allows fast serialization to be transparently added to specific objects where a performance issue has been identified.

In our own case, throughput increased from 300 Remoting calls per second to over 700, just by changing this for one key object. No other changes were necessary.

There is also one other unexpected benefit from this. You'll notice that there are no comparative figures above for the SoapFormatter, which is because MS has not equipped the SoapFormatter to handle generic types. Using the technique above means that the SoapFormatter never sees the generic type which has been custom serialized to a byte array, so this restriction is removed.

Combining custom serialization with ISerializable is never going to be as fast as pure custom serialization alone. However, the added benefit of remaining within the standard serialization framework makes this a useful technique for boosting performance without impacting other code.

History

  • First version - 19 May 2005.

This is my first post on CodeProject - so please be gentle!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
AnswerRe: Interoperability between .NET and Java binary serialization Pin
Code Monkey27-Nov-06 3:40
Code Monkey27-Nov-06 3:40 
GeneralRe: Interoperability between .NET and Java binary serialization Pin
Fuego0627-Nov-06 8:38
Fuego0627-Nov-06 8:38 
GeneralRe: Interoperability between .NET and Java binary serialization Pin
Jon Rista27-Nov-06 11:17
Jon Rista27-Nov-06 11:17 
GeneralRe: Interoperability between .NET and Java binary serialization Pin
Code Monkey28-Nov-06 0:33
Code Monkey28-Nov-06 0:33 
AnswerRe: Interoperability between .NET and Java binary serialization Pin
Ennis Ray Lynch, Jr.5-Jan-07 7:47
Ennis Ray Lynch, Jr.5-Jan-07 7:47 
AnswerRe: Interoperability between .NET and Java binary serialization Pin
Mehdi Mousavi20-May-07 3:50
Mehdi Mousavi20-May-07 3:50 
QuestionSupport for DataSet? Pin
Michael B. Hansen22-Nov-06 1:17
Michael B. Hansen22-Nov-06 1:17 
GeneralEven Better... Pin
lorekd2-Nov-06 10:25
lorekd2-Nov-06 10:25 
I modified some of the classes a little to demonstrate how to improve performance even more. These alterations resulted in a reduction in the size of the serialized object by about 21% for the SOAP version and about 28% for the binary version. Additionally, the time required to perform the serialization and deserialization test was reduced by nearly 47%. Here is the test application's output for both versions:

Binary serialized length: 433
Soap serialized length: 1098
Altered binary serialized length: 311
Altered soap serialized length: 867

Running serialization test for 250000 iterations
Serialization done in 135.625 uS per cycle

Running altered serialization test for 250000 iterations
Altered serialization done in 72.125 uS per cycle

I'll try to list all of the alterations here...


  1. SerializationWriter class: Moved some code from the AddToInfo method into a method named ToBinary. ToBinary is used by the TestObject class' ToBinary method. The updated code is as follows:
    <br />
    		public void AddToInfo(SerializationInfo info)<br />
    		{<br />
    			  byte[] b = ToBinary();<br />
    			  info.AddValue("X", b, typeof(byte[]));<br />
    		}<br />
    <br />
    		public byte[] ToBinary()<br />
    		{<br />
    			  return ((MemoryStream)BaseStream).ToArray();<br />
    		}<br />

  2. SerializationReader class: Added an overload for the GetReader static method and cleaned up duplicate code. Here is the new source:
    <br />
    		public static SerializationReader GetReader(SerializationInfo info)<br />
    		{<br />
    			  byte[] byteArray = (byte[])info.GetValue("X", typeof(byte[]));<br />
    			  return GetReader(byteArray);<br />
    		}<br />
    <br />
    		public static SerializationReader GetReader(byte[] buffer)<br />
    		{<br />
    			  return new SerializationReader(new MemoryStream(buffer));<br />
    		}<br />
    <br />

  3. TestObject class: Added methods to make use of the modified SerializationReader and SerializationWriter classes.
    <br />
    		private TestObject(byte[] buffer)<br />
    		{<br />
    			  SerializationReader sr = SerializationReader.GetReader(buffer);<br />
    			  id1 = sr.ReadInt64();<br />
    			  id2 = sr.ReadInt64();<br />
    			  id3 = sr.ReadInt64();<br />
    			  s1 = sr.ReadString();<br />
    			  s2 = sr.ReadString();<br />
    			  s3 = sr.ReadString();<br />
    			  s4 = sr.ReadString();<br />
    			  dt1 = sr.ReadDateTime();<br />
    			  dt2 = sr.ReadDateTime();<br />
    			  b1 = sr.ReadBoolean();<br />
    			  b2 = sr.ReadBoolean();<br />
    			  b3 = sr.ReadBoolean();<br />
    			  e1 = sr.ReadByte();<br />
    			  d1 = sr.ReadDictionary<string, object>();<br />
    		}<br />
    <br />
    		public byte[] ToBinary()<br />
    		{<br />
    			  SerializationWriter sw = SerializationWriter.GetWriter();<br />
    			  sw.Write(id1);<br />
    			  sw.Write(id2);<br />
    			  sw.Write(id3);<br />
    			  sw.Write(s1);<br />
    			  sw.Write(s2);<br />
    			  sw.Write(s3);<br />
    			  sw.Write(s4);<br />
    			  sw.Write(dt1);<br />
    			  sw.Write(dt2);<br />
    			  sw.Write(b1);<br />
    			  sw.Write(b2);<br />
    			  sw.Write(b3);<br />
    			  sw.Write(e1);<br />
    			  sw.Write<string, object>(d1);<br />
    			  return sw.ToBinary();<br />
    		}<br />
    <br />
    		public static TestObject FromBinary(byte[] buffer)<br />
    		{<br />
    			  return new TestObject(buffer);<br />
    		}<br />


  4. MainClass class: Added code to the PrintSize method to show the altered serialization sizes. Also, added a AlteredPerfTest method that mimics the PerfTest method, using the new serialization method instead. Finally, called the AlteredPerfTest method in the Main of the application. Here are the modified methods:
    <br />
    		static void PrintSize()<br />
    		{<br />
    			  TestObject testObj = new TestObject();<br />
    <br />
    			  MemoryStream ms = new MemoryStream();<br />
    <br />
    			  new BinaryFormatter().Serialize(ms, testObj);<br />
    			  Console.WriteLine("Binary serialized length: {0}", ms.Length);<br />
    <br />
    			  ms.Position = 0;<br />
    <br />
    			  new SoapFormatter().Serialize(ms, testObj);<br />
    			  Console.WriteLine("Soap serialized length: {0}", ms.Length);<br />
    <br />
    <br />
    			  ms.Position = 0;<br />
    			  ms.SetLength(0);<br />
    <br />
    			  new BinaryFormatter().Serialize(ms, testObj.ToBinary());<br />
    			  Console.WriteLine("Altered binary serialized length: {0}", ms.Length);<br />
    <br />
    			  ms.Position = 0;<br />
    <br />
    			  new SoapFormatter().Serialize(ms, testObj.ToBinary());<br />
    			  Console.WriteLine("Altered soap serialized length: {0}", ms.Length);<br />
    <br />
    		}<br />
    <br />
    		static void AlteredPerfTest(int count)<br />
    		{<br />
    			  Console.WriteLine("\nRunning altered serialization test for {0} iterations", count);<br />
    <br />
    			  TestObject obj1 = new TestObject();<br />
    <br />
    			  DateTime t = DateTime.Now;<br />
    			  for (int i = 0; i < count; i++)<br />
    			  {<br />
    <br />
    				    MemoryStream ms = new MemoryStream();<br />
    				    BinaryFormatter bf = new BinaryFormatter();<br />
    				    bf.Serialize(ms, obj1.ToBinary());<br />
    <br />
    				    ms.Position = 0;<br />
    				    TestObject obj2 = TestObject.FromBinary((byte[])bf.Deserialize(ms));                  // deserialize again<br />
    <br />
    			  }<br />
    			  TimeSpan ts = DateTime.Now - t;<br />
    			  Console.WriteLine("Altered serialization done in {0} uS per cycle", ts.TotalMilliseconds * 1000.0 / count);<br />
    <br />
    		} // AlteredPerfTest<br />
    <br />
    		static void Main(string[] args)<br />
    		{<br />
    			   PrintSize();<br />
    			   PerfTest(250000);<br />
    			   AlteredPerfTest(250000);<br />
    		}<br />
    <br />

Okay. I'm pretty sure I included all of my changes. I tried to leave as much of the original code intact as possible so that the comparison would be as accurate as it could be. The main difference in the new code is that instead of serializing and deserializing the TestObject (a user-defined type) these operations are performed on a byte array (a built-in type.)

I found this article because I was considering posting my first article on the same topic. My method is very similar, utilizing the BinaryFormatter, but with the addition that I included in this response. Hopefully, it will help those of you that need to squeeze out some extra performance.

Happy coding,

David
QuestionInherited classes? Pin
iwasiunknown12-Oct-06 16:22
iwasiunknown12-Oct-06 16:22 
AnswerRe: Inherited classes? Pin
Tim Haynes15-Oct-06 6:03
Tim Haynes15-Oct-06 6:03 
GeneralImprove performance Pin
nuri h21-Aug-06 20:03
nuri h21-Aug-06 20:03 
GeneralFurther optimizations... Pin
SimmoTech8-Aug-06 3:41
SimmoTech8-Aug-06 3:41 
GeneralWrite( byte[] b ): Length vs LongLength Pin
Schlups12-Jul-06 22:40
Schlups12-Jul-06 22:40 
GeneralA little "error handling" Pin
RuneFS6-Jul-06 23:02
RuneFS6-Jul-06 23:02 
GeneralCode to support Guid Pin
kgbroce16-Jun-06 9:24
professionalkgbroce16-Jun-06 9:24 
GeneralRe: Code to support Guid Pin
Ram Cronus3-Jun-08 6:16
Ram Cronus3-Jun-08 6:16 
Generalbyte[] array gets skipped Pin
kgbroce16-Jun-06 9:14
professionalkgbroce16-Jun-06 9:14 
GeneralRe: byte[] array gets skipped Pin
Tim Haynes18-Jun-06 10:12
Tim Haynes18-Jun-06 10:12 
GeneralUse TypeCode for objType Pin
zuken215-Jun-06 18:08
zuken215-Jun-06 18:08 
GeneralRe: Use TypeCode for objType Pin
Tim Haynes8-Jun-06 22:07
Tim Haynes8-Jun-06 22:07 
GeneralNullable Types Pin
jmueller30-May-06 5:48
jmueller30-May-06 5:48 
GeneralRe: Nullable Types Pin
Tim Haynes8-Jun-06 22:23
Tim Haynes8-Jun-06 22:23 
GeneralGreat solution Pin
ScottEllisNovatex23-May-06 16:19
ScottEllisNovatex23-May-06 16:19 
GeneralRe: Great solution Pin
Frank Stegerwald23-May-06 20:51
Frank Stegerwald23-May-06 20:51 
GeneralRe: Great solution Pin
ScottEllisNovatex24-May-06 21:03
ScottEllisNovatex24-May-06 21:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.