Click here to Skip to main content
15,888,113 members
Articles / Programming Languages / C#

Relationship Oriented Programming

Rate me:
Please Sign up or sign in to vote.
4.87/5 (20 votes)
12 Dec 2011CPOL14 min read 57.7K   805   58  
Modeling the Romeo and Juliet meta-model.
<html>

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Raw Serializer</title>
</head>

<body>

<h2>Introduction</h2>
<p>While .NET provides a <code>BinaryReader</code> and <code>BinaryWriter</code>, these classes 
are insufficient to handle structures and nullable value types.&nbsp; 
Conversely, the <code>BinaryFormatter</code> is an unwieldy and bloated solution.&nbsp; 
What is needed is something that produces a compact serialized data stream while 
also supporting nullable data values, both in the classic C# 1.0 sense (boxed 
value types) and in the nullable value type C# 2.0 sense.&nbsp; Also, the whole 
issue of null vs. <code>DBNull.Value</code>, which nullable types in C# 2.0 still don't 
address:</p>
<pre>DateTime? dt = null;
dt = DBNull.Value; // compiler error!</pre>
<p>needs to be dealt with (meaning, the serializer needs to preserve whether the 
boxed value type is null or <code>DBNull.Value</code>).&nbsp; </p>
<p>So, this is what the 
raw serializer/deserializer does.&nbsp; It is a replacement for the 
BinaryFormatter when you are serializing are nullable value types into a known 
format, and deserializing those values with the same format.</p>
<h3>The Problem: The Binary Formatter</h3>
<p>The <code>BinaryFormatter</code> is a horribly inefficient beast for 
transmitting data.&nbsp; It creates a large &quot;binary&quot; file and it sucks up huge 
amounts of memory because it isn't a stream, and it can crash your application. 
&nbsp;For example, a typical use is to package up the contents of a <code>DataTable</code>:</p>
<pre>DataTable dt=LoadDataTable();
BinaryFormatter bf=new BinaryFormatter();
FileStream fs=new FileStream(filename, FileMode.Create);
bf.Serialize(fs, dt);
fs.Close();</pre>
<ul>
	<li>I tried this with a table consisting of some 200,000 records and the 
	<code>BinaryFormatter</code> crashed with an &quot;out of memory&quot; exception.</li>
	<li>I tried this with a smaller table and discovered that the resulting 
	binary file was 10 times larger than the estimated data size.</li>
	<li>During formatting, it sucks up a lot of memory, making the usability of 
	this class problematic in the real world when you don't know what sort of 
	physical memory the system might have.</li>
	<li>Even though the <code>BinaryFormatter</code> takes an output stream, it clearly does 
	not stream the data until the stream is closed.</li>
</ul>
<p>These problems were cause for concern, so I decided to look at a more lean 
implementation, and one that was not susceptible to crashing and consuming huge 
amounts of memory. </p>
<h3>An Overview Of The Raw Serializer</h3>
<p>Some articles are harder to figure out how to start than others.&nbsp; This 
is one in which I've waffled a lot.&nbsp; In the initial version, I spent about 
half the article talking about why I wrote a raw serialization class.&nbsp; 
Ultimately, I decided that the discussion was too much.&nbsp; Then, in writing 
the article, I realized I wasn't handling the reader and writer portions of the 
code symmetrically--the writer was using a dictionary to look up the writer 
method while the reader implemented a switch statement.&nbsp; Hmmm.&nbsp; I also 
realized that the implementation boxed the value types for writing and required unboxing on the part of the caller for reading.&nbsp; This functionality is 
necessary for serializing a <code>DataTable</code>, but it's a performance hit if you're 
serializing known types.&nbsp; Given that the primary purpose of these classes, 
for me, at least) 
is to serialize/deserialize a <code>DataTable</code> efficiently, I considered leaving 
this implementation decision alone, but then decided it wouldn't necessarily be 
what other people needed, so I decided to add the methods necessary to avoid 
boxing/unboxing.&nbsp; Finally, I realized I needed to explore and understand C# 
2.0's nullable types and how they should be supported.</p>
<p>When all is said and done, I figured a diagram of the various code paths 
(starting with the green boxes) would be a good start to helping the reader 
understanding what is going on:</p>
<p><img border="0" src="RawSerializer/codepaths.png" width="597" height="702"></p>
<p>You may ask yourself, why not just expose the <code>BinaryReader/Writer</code> so that the 
caller can use the appropriate <code>Read/Write</code> methods directly when nullable support 
isn't required?&nbsp; This question has some merit as the current implementation 
introduces what might be considered to be an unnecessary method call.&nbsp; 
However, the point of encapsulation is to allow the interface, in this case the 
<code>RawSerializer</code> class, to vary without affecting the caller.&nbsp; If, in the 
future, I want to use some stream other than the <code>BinaryReader/Writer</code>, 
or add additional functionality to the read/write methods, I can do so safely, 
knowing that the encapsulation of the <code>BinaryReader/Writer</code> isn't broken.</p>
<p>So, the end result is hopefully a better, more complete implementation.&nbsp; I must say, 
writing an article describing one's code is a really powerful code review 
technique!</p>
<h2>Raw Serializer</h2>
<p>The following describes what the raw serializer generally can handle and 
caveats you should be aware of when using it.</p>
<h3>Value Types, Structures And Nullable Values</h3>
<p>These classes serializes and deserialize native value types, 
including 
compliant structures (structures consisting of native types and which the marshaller can determine the size of) 
directly to binary values.&nbsp; The <code>RawSerializer</code> class, and it's 
complement, the <code>RawDeserializer</code>, are not themselves streams, however they 
encapsulate the <code>BinaryWriter</code> and <code>BinaryReader</code> classes, which are streams, and 
thus allow the raw serializer to work with stream contexts.</p>
<p>Because the serializer supports only value types, it is not a general purpose 
serialization engine like the BinaryFormatter.&nbsp; However, in cases where all 
you <i>are</i> serializing is value types, then this set of classes will 
generate a much more efficient output because it writes only the raw binary 
data.</p>
<p>The class supports serialization of <code>DateTime?</code> and <code>Guid?</code> nullable value types directly.&nbsp; 
Other nullable structures will go through the boxing mechanism and require you 
to explicitly use the <code>SerializeNullable(...)</code> method.&nbsp; For 
deserialization, you have to explicitly state the appropriate deserialization 
method, such as <code>int? DeserializeNInt()</code> as opposed to <code>int DeserializeInt()</code>.&nbsp; 
Other nullable structures need to use the <code>object DeserializeNullable(Type t)</code> 
method and explicitly unbox the return value.</p>
<h3>Version Information</h3>
<p>Unlike the <code>BinaryFormatter</code>, there is no version management to 
ensure that the deserializer matches the format of the serialized data.&nbsp; 
You can certainly add version information, but be aware that this is a very 
dumb, low-level set of functions--it expects the deserializer to know the 
correct value types and whether they are nullable or not, and in the right 
order.&nbsp; If you get the order wrong or type mismatches, the deserializer 
will generate erroneous data and most likely will blow up trying to decode the 
input stream.&nbsp; Again, type information could have been added--in fact, the 
standard value types could be encoded along with the null flag byte, but I chose 
not to do that specifically to reduce the resulting data set size.&nbsp; If you 
feel you want that added layer of protection, feel free to add it in.</p>
<h3>Null Support</h3>
<p>The serializer supports both <code>null</code> and <code>DBNull.Value</code> values, 
however, these are optional.&nbsp; If you need to support null values, an extra 
byte is added to the value, which is used to indicate whether the value is <code>null</code> 
or <code>DBNull.Value</code>.&nbsp; Clearly, the deserializer also needs to 
specify whether it is expecting null values--the serialization and 
deserialization must always be synchronized with regards to both value types and 
nullability (there's a new word!).</p>
<p>One optimization I was considering, but chose not to implement, was a bit 
field header for each row in the DataTable, in which the bit fields would 
indicate whether their associated fields were null or not.&nbsp; This would 
save, for example, four bytes for every eight fields (2 bits being required to 
manage &quot;not null&quot;, &quot;null&quot;, or &quot;DBNull.Value&quot;).&nbsp; Not that great of a 
savings, especially if you opt to use the remaining 6 bits per field to describe 
an encoded field type as discussed above.&nbsp; So I opted for the simple 
solution rather than programming myself (and you) into a corner.</p>
<h3>Understanding The Difference Between A Boxed Value Type And A Nullable Value 
Type</h3>
<p>The salient different between a boxed value type (object foo) and a nullable 
value type (int? foo) is that the boxed value type supports null and 
<code>DBNull.Value</code> &quot;values&quot;, whereas the nullable value type only supports null.&nbsp; 
The boxed value type serialization is useful when serializing data that has 
originated from a database and therefore may have <code>DBNull.Value</code> values.</p>
<h2>Example Usage</h2>
<p>Before getting into the code, I'm going to illustrate using these 
classes via some unit tests (the code has comprehensive unit tests, for the 
examples here, I'm looking picking specific ones).&nbsp; These unit tests are 
written using my
<a href="http://www.marcclifton.com/Projects/AdvancedUnitTesting/tabid/102/Default.aspx">
AUT engine</a>.&nbsp; This will give you a sense of what you can accomplish with the serializer.&nbsp; 
Each test has a setup routine that initializes the serializer, deserializer, and 
a memory stream:</p>
<pre>[TestFixture]
public class ValueTypeTests
{
  MemoryStream ms;
  RawSerializer rs;
  RawDeserializer rd;

  [SetUp]
  public void Setup()
  {
    ms = new MemoryStream();
    rs = new RawSerializer(ms);
    rd = new RawDeserializer(ms); 
  }
  ...
}</pre>
<h3>Simple Value Type Serialization</h3>
<pre>[Test]
public void Int()
{
  int val=int.MaxValue;
  rs.Serialize(val);
  rs.Flush();
  ms.Position=0;
  val=rd.DeserializeInt();
  Assertion.Assert(val==int.MaxValue, &quot;int failed&quot;);
}</pre>
<p>This first test illustrates a straight forward serialization of an integer.&nbsp; 
As expected, the memory stream length is four bytes.</p>
<h3>Boxed Serialization</h3>
<pre>[Test]
public void Int()
{
  int val = int.MaxValue;
  rs.Serialize((object)val);
  rs.Flush();
  ms.Position = 0;
  val = rd.DeserializeInt();
  Assertion.Assert(val == int.MaxValue, &quot;int failed&quot;);
}</pre>
<p>In this test, we're serializing a boxed value and deserializing it knowing 
the desired type.&nbsp; This test exercises a different pathway through the 
serializer.&nbsp; It also is a segue to the next test.&nbsp; Again, the memory 
stream length is 4 bytes.</p>
<h3>Boxed Nullable Value Types</h3>
<pre>[Test]
public void BoxedNullable()
{
  object anInt = 5;
  object aNullInt = null;
  rs.SerializeNullable(anInt);
  rs.SerializeNullable(aNullInt);
  rs.Flush();
  ms.Position = 0;
  anInt = rd.DeserializeNullable(typeof(int));
  aNullInt = rd.DeserializeNullable(typeof(int));
  Assertion.Assert((int)anInt == 5, &quot;non-null nullable failed.&quot;);
  Assertion.Assert(aNullInt == null, &quot;null nullable failed.&quot;);
}</pre>
<p>In this test, two boxed int's are serialized, the first with a value and the 
second assigned to null.&nbsp; The <code>SerializeNullable</code> method is used to tell the 
serializer that the value type is potentially null.&nbsp; After serialization, 
the memory stream length is 6 bytes.&nbsp; Why?&nbsp; The first value is 
serialized with a flag byte, thus taking 5 bytes.&nbsp; The second value, being 
null, simply gets the flag byte.</p>
<h3>Nullable Value Types</h3>
<pre>[Test]
public void Int()
{
  int? val1=int.MaxValue;
  int? val2=null;
  rs.Serialize(val1);
  rs.Serialize(val2);
  rs.Flush();
  ms.Position=0;
  val1=rd.DeserializeNInt();
  val2=rd.DeserializeNInt();
  Assertion.Assert(val1==int.MaxValue, &quot;non-null nullable int failed&quot;);
  Assertion.Assert(val2==null, &quot;null nullable int failed&quot;);
}</pre>
<p>Here we're using the new nullable value type supported in C# 2.0.&nbsp; The 
resulting memory stream length is also 6 bytes.&nbsp; Notice that the different 
deserialization method being used to return the appropriate nullable value type.&nbsp; 
You could also deserialize this into an object of type int:</p>
<pre>[Test]
public void IntObject()
{
  int? val1 = int.MaxValue;
  int? val2 = null;
  rs.Serialize(val1);
  rs.Serialize(val2);
  rs.Flush();
  ms.Position = 0;
  object obj1 = rd.DeserializeNullable(typeof(int));
  object obj2 = rd.DeserializeNullable(typeof(int));
  Assertion.Assert((int)val1 == int.MaxValue, &quot;non-null nullable int failed&quot;);
  Assertion.Assert(val2 == null, &quot;null nullable int failed&quot;);
}</pre>
<h3>Data Tables</h3>
<p>The following unit test demonstrates how a <code>DataTable</code> might be serialized.&nbsp; 
I have intentionally not included this code in the raw serializer class, as the 
method for serializing a <code>DataTable</code> will probably be application specific.</p>
<h4>The Test Data</h4>
<p>The test fixture's <code>DataTable</code> is initialized with the following data:</p>
<pre>[TestFixtureSetUp]
public void FixtureSetup()
{
  dt = new DataTable();
  dt.Columns.Add(new DataColumn(&quot;pk&quot;, typeof(Guid)));
  dt.Columns.Add(new DataColumn(&quot;LastName&quot;, typeof(string)));
  dt.Columns.Add(new DataColumn(&quot;FirstName&quot;, typeof(string)));
  dt.Columns.Add(new DataColumn(&quot;MiddleInitial&quot;, typeof(char)));
  dt.Columns[&quot;pk&quot;].AllowDBNull = false;
  dt.Columns[&quot;LastName&quot;].AllowDBNull = false;
  dt.Columns[&quot;FirstName&quot;].AllowDBNull = false;
  dt.Columns[&quot;MiddleInitial&quot;].AllowDBNull = true;

  DataRow dr=dt.NewRow();
  dr[&quot;pk&quot;]=Guid.NewGuid();
  dr[&quot;LastName&quot;]=&quot;Clifton&quot;;
  dr[&quot;FirstName&quot;]=&quot;Marc&quot;;
  dr[&quot;MiddleInitial&quot;] = DBNull.Value;
  dt.Rows.Add(dr);

  dr=dt.NewRow();
  dr[&quot;pk&quot;]=Guid.NewGuid();
  dr[&quot;LastName&quot;]=&quot;Clifton&quot;;
  dr[&quot;FirstName&quot;]=&quot;Ian&quot;;
  dr[&quot;MiddleInitial&quot;] = DBNull.Value;
  dt.Rows.Add(dr);

  dr=dt.NewRow();
  dr[&quot;pk&quot;]=Guid.NewGuid();
  dr[&quot;LastName&quot;]=&quot;Linder&quot;;
  dr[&quot;FirstName&quot;]=&quot;Karen&quot;;
  dr[&quot;MiddleInitial&quot;] = 'J';
  dt.Rows.Add(dr);

  dt.AcceptChanges();
}</pre>
<h4>Serializing And Deserializing The DataTable</h4>
<p>The following is the unit test that validates the serialization of the data 
table.&nbsp; Note how the <code>AllowDBNull</code> property is being used to determine 
whether the object being serialized should allow for nulls.&nbsp; You'll also 
see that I'm serializing the table name, the number of columns and rows, and 
also the column name and type.&nbsp; This information all comprises the header 
for the actual table data.&nbsp; Also note that the assembly qualified name is 
being used.&nbsp; In this example, it means that you would need the same version 
of .NET on the receiving end as was used to serialize the data.&nbsp; By using 
just the name, one could have one version of .NET serializing the data and 
another deserializing it.&nbsp; It's up to you and what you're trying to 
achieve, which is again why this code isn't part of the raw serialization 
classes in the download.</p>
<pre>[Test]
public void DataTable()
{
  rs.Serialize(dt.TableName);
  rs.Serialize(dt.Columns.Count);
  rs.Serialize(dt.Rows.Count);

  foreach (DataColumn dc in dt.Columns)
  {
    rs.Serialize(dc.ColumnName);
    rs.Serialize(dc.AllowDBNull);
    rs.Serialize(dc.DataType.AssemblyQualifiedName);
  }

  foreach (DataRow dr in dt.Rows)
  {
    foreach (DataColumn dc in dt.Columns)
    {
      if (dc.AllowDBNull)
      {
        rs.SerializeNullable(dr[dc]);
      }
      else
      {
        rs.Serialize(dr[dc]);
      }
    }
  }

  rs.Flush();
  ms.Position = 0;

  // Deserialize

  string tableName = rd.DeserializeString();
  int columns = rd.DeserializeInt(); 
  int rows = rd.DeserializeInt();

  Assertion.Assert(columns == 4, &quot;Column count is wrong.&quot;);
  Assertion.Assert(rows == 3, &quot;Row count is wrong.&quot;);

  DataTable dtIn = new DataTable();

  for (int x = 0; x &lt; columns; x++)
  {
    string columnName = rd.DeserializeString();
    bool allowNulls = rd.DeserializeBool();
    string type = rd.DeserializeString();

    DataColumn dc = new DataColumn(columnName, Type.GetType(type));
    dc.AllowDBNull = allowNulls;
    dtIn.Columns.Add(dc);
  }

  for (int y = 0; y &lt; rows; y++)
  {
    DataRow dr = dtIn.NewRow();

    for (int x = 0; x &lt; columns; x++)
    {
      DataColumn dc=dtIn.Columns[x];
      object obj;

      if (dc.AllowDBNull)
      {
        obj = rd.DeserializeNullable(dc.DataType);
      }
      else
      {
        obj = rd.Deserialize(dc.DataType);
      }

      dr[dc] = obj;
    }

    dtIn.Rows.Add(dr);
  }

  for (int y = 0; y &lt; rows; y++)
  {
    for (int x = 0; x &lt; columns; x++)
    {
      Assertion.Assert(dt.Rows[y][x].Equals(dtIn.Rows[y][x]), &quot;Deserialized data does not match serialized data&quot;);
    }
  }
}</pre>
<h3>Encryption Streaming</h3>
<p>If you want to tack encryption onto the serialization stream, here's an 
example of how that works:</p>
<pre>[Test]
public void EncryptionStreaming()
{
  // string -&gt; RawStreamEncoder -&gt; Encryptor -&gt; MemoryStream
  MemoryStream ms = new MemoryStream(); // final destination stream
  EncryptTransformer et = new EncryptTransformer(EncryptionAlgorithm.Des);
  ICryptoTransform ict = et.GetCryptoServiceProvider(null);
  CryptoStream encStream = new CryptoStream(ms, ict, CryptoStreamMode.Write);
  RawSerializer rs=new RawSerializer(encStream); // serializer outputs to encoder
  rs.Serialize(&quot;Hello World&quot;); // serialize
  ((CryptoStream)encStream).FlushFinalBlock(); // MUST BE APPLIED! Flush() does not output the last block.

  ms.Position=0;

  // MemoryStream -&gt; Decryptor -&gt; RawStreamDecoder -&gt; string
  DecryptTransformer dt = new DecryptTransformer(EncryptionAlgorithm.Des);
  dt.IV = et.IV;
  ict = dt.GetCryptoServiceProvider(et.Key);
  CryptoStream decStream = new CryptoStream(ms, ict, CryptoStreamMode.Read);
  RawDeserializer rd=new RawDeserializer(decStream); // Deserializes from decryptor stream
  string str=(string)rd.Deserialize(typeof(string)); // Gets the data.
  Assertion.Assert(str==&quot;Hello World&quot;, &quot;Unexpected return.&quot;);
}</pre>
<h3>Compression Streaming</h3>
<p>Or, let's say you want the stream compressed.&nbsp; This example utilizes the 
compression stream in the .NET 2.0 framework.</p>
<pre>[Test]
public void CompressionStreaming()
{
  // string -&gt; RawStreamEncoder -&gt; Compressor -&gt; MemoryStream
  MemoryStream ms=new MemoryStream(); // final destination stream
  GZipStream comp = new GZipStream(ms, CompressionMode.Compress, true); // important to be set to true!

  RawSerializer rs=new RawSerializer(comp); // serializer outputs to compression
  rs.Serialize(&quot;Hello World&quot;); // serialize
  comp.Close(); // outputs last part of the data

  ms.Position=0;

  // MemoryStream -&gt; Decompressor -&gt; RawStreamDecoder -&gt; string
  GZipStream decomp = new GZipStream(ms, CompressionMode.Decompress);

  RawDeserializer rd=new RawDeserializer(decomp); // Deserializes from decompressor stream
  string str=(string)rd.Deserialize(typeof(string)); // Gets the data.
  Assertion.Assert(str==&quot;Hello World&quot;, &quot;Unexpected return.&quot;);
}</pre>
<h3>Compression-Encryption Streaming</h3>
<p>And of course, you might want to compress and encrypt your data stream.</p>
<pre>[Test]
public void CompressionEncryptionStreaming()
{
  // string -&gt; RawStreamEncoder -&gt; Compressor -&gt; Encryptor -&gt; MemoryStream
  MemoryStream ms=new MemoryStream(); // final destination stream

  EncryptTransformer et = new EncryptTransformer(EncryptionAlgorithm.Des);
  ICryptoTransform ict = et.GetCryptoServiceProvider(null);
  CryptoStream encStream = new CryptoStream(ms, ict, CryptoStreamMode.Write);

  GZipStream comp = new GZipStream(encStream, CompressionMode.Compress, true); // important to be set to true!

  RawSerializer rs = new RawSerializer(comp); // serializer outputs to compression
  rs.Serialize(&quot;Hello World&quot;); // serialize
  comp.Close(); // must close to get final bytes
  ((CryptoStream)encStream).FlushFinalBlock(); // MUST BE APPLIED! Flush() does not output the last block.

  // Reset the position and read the stream back in.
  ms.Position=0;

  // MemoryStream -&gt; Decryptor -&gt; Decompressor -&gt; RawStreamDecoder -&gt; string
  DecryptTransformer dt = new DecryptTransformer(EncryptionAlgorithm.Des);
  dt.IV = et.IV;
  ict = dt.GetCryptoServiceProvider(et.Key);
  CryptoStream decStream = new CryptoStream(ms, ict, CryptoStreamMode.Read);

  GZipStream decomp = new GZipStream(decStream, CompressionMode.Decompress);

  RawDeserializer rd = new RawDeserializer(decomp); // Deserializes from decompressor stream

  string str=(string)rd.Deserialize(typeof(string)); // Gets the data.
  Assertion.Assert(str==&quot;Hello World&quot;, &quot;Unexpected return.&quot;);
}</pre>
<h2>Appendix</h2>
<p>Rather than cluttering the beginning of the article with the nuts and bolts 
and other issues, I decided to put some of that here, in the Appendix:</p>
<h3>The Wrong Tool</h3>
<p>The BinaryFormatter is simply the wrong tool to use for my client's needs.&nbsp; 
Here's what MSDN says about it:</p>
<p><i>The SoapFormatter and BinaryFormatter classes implement the 
IRemotingFormatter interface to support remote procedure calls (RPCs), and the 
IFormatter interface (inherited by the IRemotingFormatter) to support 
serialization of a graph of objects. The SoapFormatter class also supports RPCs 
with ISoapMessage objects, without using the IRemotingFormatter functionality.<br>
<br>
</i>First, we don't need to support remote procedure calls.&nbsp; Second, we don't need 
the full support of object graph serialization.&nbsp; For example, in our 
application, the DataTable representing the cached data can be 
transmitted to the client without any header information at all because there is 
a separate data dictionary that defines the table columns and flags.&nbsp; (In 
the code presented here for serializing a DataTable, I do have a header block).</p>
<p>Before looking at the complexities of serializing a <code>DataTable</code>, let's look at a very simple example--serializing a bool:</p>
<pre>using System;
using System.Runtime.Serialization.Formatters.Binary;
using System.IO;
using System.Text;

namespace BinaryFormatterTests
{
  class Program
  {
    static void Main(string[] args)
    {
      MemoryStream ms = new MemoryStream();
      BinaryFormatter bf = new BinaryFormatter();
      bool flag = false;
      bf.Serialize(ms, flag);
      byte[] data = ms.ToArray();
      Console.WriteLine(&quot;Done.&quot;);
    }
  }
}</pre>
<p>This generates 53 bytes:</p>
<p>
<img border="0" src="RawSerializer/rawSerializer1.PNG" width="550" height="60"></p>
<p>Where in this is the bool?&nbsp; It turns out it's in the second to last 
byte.&nbsp; If set to true, the last three bytes read: 01 01 0b</p>
<p>What if we add a second bool?&nbsp; How much of this is initial header vs. 
actual data?&nbsp; Well, it turns out, nothing is initial header.&nbsp; If we 
serialize a second bool (just add <code>bf.Serailize(ms, flag);</code> twice), 
the resulting memory stream is now twice as large: 106 bytes!&nbsp; </p>
<p>It gets worse.&nbsp; Let's look at a <code>DataTable</code> now.&nbsp; An empty <code>DataTable</code> takes an initial 1051 bytes to 
serialize.&nbsp; Adding three column definitions (a <code>Guid</code> and two strings) takes 
an additional 572 bytes.&nbsp; Each additional row (no data in the strings) 
takes an additional 85 bytes (for an empty <code>Guid</code> and two empty string!).&nbsp; 
And heaven help you if you actually have data in these rows:</p>
<pre>static void Main(string[] args)
{
  StringBuilder sb=new StringBuilder();
  StringWriter sw=new StringWriter(sb);
  XmlSerializer xs = new XmlSerializer(typeof(DataTable));

  DataTable dt = new DataTable(&quot;Foobar&quot;);
  DataColumn dc1 = new DataColumn(&quot;ID&quot;, typeof(Guid));
  DataColumn dc2 = new DataColumn(&quot;FirstName&quot;, typeof(string));
  DataColumn dc3 = new DataColumn(&quot;LastName&quot;, typeof(string));
  dt.Columns.Add(dc1);
  dt.Columns.Add(dc2); 
  dt.Columns.Add(dc3);

  DataRow dr = dt.NewRow();
  dr[&quot;ID&quot;] = Guid.NewGuid();
  dr[&quot;FirstName&quot;] = &quot;Marc&quot;;
  dr[&quot;LastName&quot;] = &quot;Clifton&quot;;
  dt.Rows.Add(dr);

  dr = dt.NewRow();
  dr[&quot;ID&quot;] = Guid.NewGuid();
  dr[&quot;FirstName&quot;] = &quot;Karen&quot;;
  dr[&quot;LastName&quot;] = &quot;Linder&quot;;  
  dt.Rows.Add(dr);

  xs.Serialize(sw, dt);
  string str = sb.ToString();
  Console.WindowWidth = 100;
  Console.WriteLine(str.Length);
  Console.WriteLine(str);
}</pre>
<p>Adding Guid's 
and &quot;Marc&quot;, &quot;Clifton&quot; and &quot;Karen&quot;, &quot;Linder&quot; to the two rows, the serialized 
output grows to a whopping 1982 bytes, adding 274 bytes to represent two rows of 
data that, if ideally stored, shouldn't take up more than 40 bytes or so (60, if 
you want to use Unicode to represent the strings).</p>
<h3>Why Not Use Xml&nbsp; Serialization?</h3>
<p>The resulting xml data for my two record example is actually smaller, 1671 bytes, and of course 
compresses well because of its high tokenization rate.</p>
<p>
<img border="0" src="RawSerializer/rawSerializer2.PNG" width="600" height="424"></p>
<p>But let's look at a problem with how xml handles null values.&nbsp; Say some of these rows have null values.&nbsp; We're going to look 
at both DBNull.Value and simply setting the field to null:</p>
<pre>  DataRow dr = dt.NewRow();
  dr[&quot;ID&quot;] = Guid.NewGuid();
  dr[&quot;FirstName&quot;] = &quot;Marc&quot;;
  dr[&quot;LastName&quot;] = DBNull.Value;
  dt.Rows.Add(dr);

  dr = dt.NewRow();
  dr[&quot;ID&quot;] = Guid.NewGuid();
  dr[&quot;FirstName&quot;] = &quot;Karen&quot;;
  dr[&quot;LastName&quot;] = null;  
  dt.Rows.Add(dr);</pre>
<p>The resulting output is:</p>
<p>
<img border="0" src="RawSerializer/rawSerializer3.PNG" width="404" height="122"></p>
<p>Hmmm.&nbsp; <code>LastName</code> is completely missing.&nbsp; Now what happens when we 
deserialize this into a new <code>DataTable</code>?&nbsp; The result is that the <code>LastName</code> 
field of both rows is set to type <code>DBNull</code>.&nbsp; While this is appropriate in the 
context of serializing a database table, it may not be what you want or expect 
in other contexts!&nbsp; If you test <code>null == DBNull.Value</code>, the result is false!&nbsp; 
(In fact, the complexity of null, <code>DBNull</code>, and empty strings within the context 
of control property values, like a <code>TextBox.Text</code> property, and binding these 
properties directly to <code>DataTable</code> fields, is another article in itself).</p>
<p>This is problem is more insidious than you might think.&nbsp; Take for 
example this class:</p>
<pre>public class Test
{
  protected string str=String.Empty;

  public string Str
  {
    get { return str; }
    set { str = value; }
  }
}</pre>
<p>Here the programmer has initialized &quot;str&quot;.&nbsp; However, if &quot;str&quot; is set to 
null at some point and then serialized, the property is not emitted.&nbsp; On 
deserialization, &quot;str&quot; is not assigned and therefore keeps its initial value--in 
this case, an empty string.&nbsp; Again, not what you might be expecting, and 
definitely not something that's so obvious you'd think about it as a possible 
problem.</p>
<p>So, xml serialization is not symmetric because it doesn't handle null 
references.&nbsp; Depending on your needs, you may find this to be an issue.&nbsp; 
More importantly though, xml serialization still results in a large file 
requiring a compression post-process.</p>
<h4>Compression vs. Expansion</h4>
<p>While I'm on the subject, let me give you some benchmarks: using #ziplib: a 
27MB file takes 33 seconds to compress down to 5.1MB on my test machine.&nbsp; Clicking &quot;Send To 
compressed (zipped) folder&quot; on XP, it takes 3 seconds to create a 5.6MB 
compressed file.&nbsp; Needless to say, #ziplib isn't very optimized.&nbsp; Nor 
is compression all that valuable with the typically small datasets that we are 
working with, especially when looking at the serialization of data packets being 
sent over a network.&nbsp; However, the expansion of the data resulting from the 
<code>BinaryFormatter</code> is simply unacceptable.&nbsp; If we use 
compression, it's because we want to compress that data itself.&nbsp; If we look 
at the xml serialization, we immediately think &quot;oh, that will compress really 
well&quot;, but that is basically wrong thinking because what we're identifying as 
compressible information is primarily the <i>metadata</i>!&nbsp; The point of 
raw serialization is to get rid of the metadata!</p>
<h3>The Importance Of AcceptChanges</h3>
<p>You'll note in the first xml screenshot above, the &quot;hasChanges&quot; attribute.&nbsp; 
This brings up an important point--make sure you call <code>AcceptChanges</code> on the 
<code>DataTable</code> before you serialize it, otherwise you get these diffgram entries, 
which is probably not what you want.&nbsp; By calling <code>AcceptChanges</code> in the 
BinaryFormatter test above, I save 60 bytes. </p>
<h2>Acknowledgements</h2>
<p>I would like to thank Justin Dunlap for his help in pointing me in the right 
direction with regards to the BinaryReader/Writer and his work on the structure 
marshalling code.</p>
<h2>A Final Note</h2>
<p>While digging around, I came across a third party library from
<a href="http://www.xceedsoft.com/products/ZipNet/index.aspx">XCEED</a> that 
supports compression, encryption, binary serialization, etc.&nbsp; I haven't 
evaluated their product, but anyone interested in a professional solution should 
probably look at what they have done, and I'm sure there must be other packages 
out there as well.&nbsp; There's always a balance though between developer 
licensing, source code access, and additional features that are part of the 
library that you don't need and end up being extra baggage, compared to the 
convenience of a pre-tested, supported product.&nbsp; For &quot;simple&quot; things like 
raw serialization, I tend to prefer rolling my own, as I end up learning a lot 
and frankly, I'd still have to spend the same amount of time writing the test 
routines to validate that a third party package does what I need.&nbsp; And the 
result is, I get a short and simple piece of code that I think is easily 
maintained.</p>

</body>

</html>

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect Interacx
United States United States
Blog: https://marcclifton.wordpress.com/
Home Page: http://www.marcclifton.com
Research: http://www.higherorderprogramming.com/
GitHub: https://github.com/cliftonm

All my life I have been passionate about architecture / software design, as this is the cornerstone to a maintainable and extensible application. As such, I have enjoyed exploring some crazy ideas and discovering that they are not so crazy after all. I also love writing about my ideas and seeing the community response. As a consultant, I've enjoyed working in a wide range of industries such as aerospace, boatyard management, remote sensing, emergency services / data management, and casino operations. I've done a variety of pro-bono work non-profit organizations related to nature conservancy, drug recovery and women's health.

Comments and Discussions