|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThis is the second of two articles on optimizing serialization, especially for use in remoting. In this article we will be looking at a complete and real-world example of how to use the Fast Serialization code introduced in Part 1 (here) to serialize DataSets and DataTables including Typed variants.
1. Test ResultsI have produced some test results using various sizes of source data to give an idea of the magnitude of reduction in size and time taken you might expect to achieve.
From these results, it becomes clear that Fast Optimization is always faster and always produces a smaller size. For smaller sets of data, the vanilla .Net serialization may be adequate but it doesn't scale as well as Fast Optimization and the time and size differences become more apparent as more data is serialized until the vanilla .Net serializer just cannot cope and throws an exception. It is important to note that stream compression (via Custom Sinks) can reduce the final data size to less than half of even the smallest size shown here. However, you may find that the smaller sizes generated much quicker with Fast Serialization are acceptable anyway and that overhead taken for compression is not worthwhile except maybe for transmission over known slow connections. 2. Using the CodeThe code download includes a class called AdoNetHelper. This has a number of static methods to provide serialization and deserialization services for supported ADO.Net objects. There are also a couple of non-serialization specific helper methods and this is the reason I created a single helper class rather than separate classes - I like to keep generic helper code in one place. In a nutshell, for serialization you pass in the ADO.Net object you want to serialize and get a byte array back. For deserialization, you pass in the byte array (and either an empty ADO.Net object or a Type which will be instantiated) and get back your ADO.Net object fully populated. Supported ADO.Net objects are DataSet public static byte[] SerializeDataSet(DataSet dataSet)
public static DataSet DeserializeDataSet(byte[] serializedData)
A plain All of the infrastructure will be stored including Tables, Columns, Rows, Constraints, Extended Properties, Xml namespaces and so on. Typed DataSet public static byte[] SerializeTypedDataSet(DataSet dataSet)
public static DataSet DeserializeTypedDataSet(Type dataSetType,
byte[] serializedData)
public static DataSet DeserializeTypedDataSet(DataSet dataSet,
byte[] serializedData)
A Typed Rather than use the generated As an example, if I have a set of data called XXX, I name the schema file XXXDataSetSchema.xsd which auto-generates XXXDataSetSchema.cs; I then create XXXDataSet.cs which contains a class called For deserialization you can either pass in the Type of the Typed DataTable public static byte[] SerializeDataTable(DataTable dataTable)
public static DataTable DeserializeDataTable(byte[] serializedData)
Very similar usage to the plain Typed DataTablepublic static byte[] SerializeTypedDataTable(DataTable dataTable) public static DataTable DeserializeTypedDataTable(DataTable dataTable, byte[] serializedData) public static DataTable DeserializeTypedDataTable(Type dataTableType, byte[] serializedData) It isn't really feasible to subclass a generated Typed Simple DataTablepublic static byte[] SerializeSimpleTable(DataTable dataTable) public static DataTable DeserializeSimpleDataTable(DataTable dataTable, byte[] serializedData) public static DataTable DeserializeSimpleDataTable(Type dataTableType, byte[] serializedData) Usage is identical to Typed In fact, the routine will also run with Added and Modified rows (but not Deleted rows - that will throw an exception) but on deserialization, the RowState for all rows will be Unchanged. This set of methods can also be used for other All of these helper methods, after appropriate parameter validation, use internal, nested classes to perform the actual work and return the result. As they stand, it is possible to use them as-is within remoting by passing around the generated byte array instead of the real object in your service interfaces. However that doesn't make code easy to read or understand so let's look at alternative methods of using this code. 3. Usage in RemotingWe now have a way of taking an ADO.Net serializable object and putting it into a byte array and vice versa. Now we need a way of getting .Net Remoting to use our serialization code rather than letting the DataSet do its XML serialization thing. Unfortunately, there is no way to make this completely transparent since we can't modify the What we must do is implement the In both cases there will be two methods to implement. A Let's look at methods for plain FastSerializableDataSetThe simplest way of all is to create a new class derived from Here is a sample class (included in the download): [Serializable]
public class FastSerializableDataSet: DataSet, ISerializable
{
#region Constructors
public FastSerializableDataSet(): base() {}
public FastSerializableDataSet(string dataSetName): base(dataSetName) {}
#endregion Constructors
#region ISerializable Members
protected FastSerializableDataSet(SerializationInfo info,
StreamingContext context)
{
AdoNetHelper.DeserializeDataSet(this,
(byte[]) info.GetValue("_", typeof(byte[])));
}
public void GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("_", AdoNetHelper.SerializeDataSet(this));
}
#endregion
}
Pretty simple really as it boils down to two lines - one for serialization and one for deserialization. If you can change your project so that all references to However life is rarely as simple as that and it may not be acceptable to all change references to DataSet so is there a less intrusive way? WrappedDataSetIf you are not able to change all Below is a class that 'wraps' a plain Here is the class: [Serializable] public class WrappedDataSet: ISerializable { #region Casting Operators public static implicit operator DataSet (WrappedDataSet wrappedDataSet) { return wrappedDataSet.DataSet; } public static implicit operator WrappedDataSet (DataSet dataSet) { return new WrappedDataSet(dataSet); } #endregion Casting Operators #region Constructors public WrappedDataSet(DataSet dataSet) { if (dataSet == null) throw new ArgumentNullException("dataSet"); this.dataSet = dataSet; } #endregion Constructors #region Properties public DataSet DataSet { get { return dataSet; } } DataSet dataSet; #endregion Properties #region ISerializable Members protected WrappedDataSet(SerializationInfo info, StreamingContext context) { dataSet = AdoNetHelper.DeserializeDataSet((byte[]) info.GetValue("_", typeof(byte[]))); } public void GetObjectData(SerializationInfo info, StreamingContext context) { info.AddValue("_", AdoNetHelper.SerializeDataSet(dataSet)); } #endregion } TypedDataSetsA Typed There is also a private If you follow my advice above about deriving from the generated #region ISerializable Members
protected DerivedFromGeneratedDataSet(SerializationInfo info,
StreamingContext context)
{
AdoNetHelper.DeserializeTypedDataSet(this,
(byte[]) info.GetValue("_", typeof(byte[])));
}
void ISerializable.GetObjectData(SerializationInfo info,
StreamingContext context)
{
info.AddValue("_", AdoNetHelper.SerializeTypedDataSet(this));
}
#endregion
On deserialization, this will call the base parameterless constructor which will run SurrogatesIf neither of the above are suitable then the only option left is to take control at a lower level using a surrogate object to perform the serialization/deserialization. In principal this should be relatively simple: A surrogate object is a class, implementing ISerializationSurrogate, which 'knows' how to perform serialization for a given Type and it is 'incorporated' into the remoting/serialization process to actually do the work in place of the normal reflection-based way. This is achieved by a class which implements If you want to do this directly with a However, the problem is that Microsoft has only allowed certain parts of Remoting to be public and not others. Some digging in Reflector shows that a new BinaryFormatter is created, a new RemotingSurrogateSelector is then attached to it but neither of these objects is accessible or configurable externally, unfortunately. All is not lost however, we can get around this with the use of custom sinks. There are a number of good articles on CodeProject, .NET Remoting Customization Made Easy: Custom Sinks, for example) which describe how to insert custom sinks both before and after the formatter sink but what we need to do is replace the formatter sink itself. In the download, I have including some example classes to do this:
They were written by examining the Microsoft code using Reflector and removing all non-Http code (I only use HTTP channels) and non-essential parts such as To use in your applications, you can either use App.config or write manual code: App.Config file configuration <system.runtime.remoting>
<application>
<channels>
<channel ref="http" port="999">
<clientProviders>
<formatter
type="SimmoTech.Utils.Remoting.CustomBinaryClientFormatterSinkProvider,
SimmoTech.Utils"/>
</clientProviders>
<serverProviders>
<formatter
type="SimmoTech.Utils.Remoting.CustomBinaryServerFormatterSinkProvider,
SimmoTech.Utils"/>
</serverProviders>
</channel>
</channels>
</application>
</system.runtime.remoting>
Then use this line at or near the start of your application: RemotingConfiguration.Configure(
AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
Code configuration CustomBinaryServerFormatterSinkProvider serverProvider
= new CustomBinaryServerFormatterSinkProvider();
CustomBinaryClientFormatterSinkProvider clientProvider
= new CustomBinaryClientFormatterSinkProvider();
IDictionary properties = new Hashtable();
properties["port"] = 999;
HttpChannel channel = new HttpChannel(properties, clientProvider,
serverProvider);
ChannelServices.RegisterChannel(channel);
The code for The main code for the public ISerializationSurrogate GetSurrogate(Type type,
StreamingContext context, out ISurrogateSelector selector)
{
if (typeof(DataSet).IsAssignableFrom(type) ||
typeof(DataTable).IsAssignableFrom(type))
{
selector = this;
return this;
} else
{
selector = null;
return null;
}
}
which just says if the type is The code for serialization looks like this: public void GetObjectData(object obj, SerializationInfo info,
StreamingContext context)
{
byte[] data;
if (obj.GetType() == typeof(DataSet) || obj is IModifiedTypedDataSet )
data = AdoNetHelper.SerializeDataSet(obj as DataSet);
else if (obj.GetType() == typeof(DataTable))
data = AdoNetHelper.SerializeDataTable(obj as DataTable);
else if (obj is DataSet)
data = AdoNetHelper.SerializeTypedDataSet(obj as DataSet);
else if (obj is DataTable)
data = AdoNetHelper.SerializeTypedDataTable(obj as DataTable);
else
{
throw new InvalidOperationException("Not a supported Ado.Net object");
}
info.AddValue("_", data);
}
The type is checked and the correct helper method on The code for deserialization looks like this: public object SetObjectData(object obj, SerializationInfo info,
StreamingContext context,
ISurrogateSelector selector)
{
obj = createNewInstance(obj);
byte[] data = (byte[]) info.GetValue("_", typeof(byte[]));
if (obj.GetType() == typeof(DataSet) || obj is IModifiedTypedDataSet)
return AdoNetHelper.DeserializeDataSet(obj as DataSet, data);
else if (obj.GetType() == typeof(DataTable))
return AdoNetHelper.DeserializeDataTable(obj as DataTable, data);
else if (obj is DataSet)
return AdoNetHelper.DeserializeTypedDataSet(obj as DataSet, data);
else if (obj is DataTable)
return AdoNetHelper.DeserializeTypedDataTable(obj as DataTable, data);
else {
throw new InvalidOperationException("Not a supported Ado.Net object");
}
}
Essentially it is the reverse of serialization with one twist: the object that is passed in is completely uninitialized - not even a default constructor has been called on it. The This all sounds complicated but really is a one-time setup. After that you can get Fast Serialization without modifying any of your application code. 4. How it worksThe serialization/deserialization code uses BitVector32A Sections are created using the Mask mode is used to store up to 32 boolean values. Creating a mask involves calling Typically, you will define a set of masks/ bit flags (I use the term interchangeable in this article) as static readonly ints, since their value will not change at runtime, and then use them in your methods by passing them to the Here is a sample: private static readonly int TypeAFirstBitFlag = BitVector32.CreateMask();
private static readonly int TypeASecondBitFlag
= BitVector32.CreateMask(TypeAFirstBitFlag);
private static readonly int TypeAThirdBitFlag
= BitVector32.CreateMask(TypeASecondBitFlag);
private static readonly int TypeBFirstBitFlag = BitVector32.CreateMask();
private static readonly int TypeBSecondBitFlag
= BitVector32.CreateMask(TypeBFirstBitFlag);
public void MyMethod() {
BitVector32 myFlags = new BitVector32();
myFlags[TypeAFirstBitFlag] = myBoolValue1;
myFlags[TypeASecondBitFlag] = true;
myFlags[TypeAThirdBitFlag] = false;
}
Note the linking of flags/masks after the first creation and note that you create different sets of flags for different object types. Using Bit FlagsHere are some tips when defining sets of bit flags:-
A bit flag can be used in several ways
The last three items in the list sound similar but there are subtle differences. The first of the three is usually a simple comparison to null " A flag doesn't have to directly relate to data value. For example, in the UniqueConstraints flag set, I have a flag called " Another example is in the DataRelation flag set where I have " Another good example is for a DataColumn which has It is also possible to use one bit flag as a condition for multiple values. ColumnHasAutoIncrementUnusedDefaults is an example - if false then both AutoIncrementSeed and AutoIncrement are written as a pair. Analyzing Your Object(s)I can't honestly say that I sat down, analysed the requirements and wrote the code in one go - there were a number of revisions and 'optimization opportunities'. What I will try and do here is give a bit of general guidance used in this and other similar projects which might help if you Fast Serialize your own classes:
Here is the rough overview I started coded from:
Below, I have put the major class types as headings and described a little about how they are coded and optimized. DataSetHere is the code for serializing a DataSet:- public byte[] Serialize(DataSet dataSet)
{
this.dataSet = dataSet;
writer = new SerializationWriter();
BitVector32 flags = GetDataSetFlags(dataSet);
writer.WriteOptimized(flags);
if (flags[DataSetHasName]) writer.Write(dataSet.DataSetName);
writer.WriteOptimized(dataSet.Locale.LCID);
if (flags[DataSetHasNamespace]) writer.Write(dataSet.Namespace);
if (flags[DataSetHasPrefix]) writer.Write(dataSet.Prefix);
if (flags[DataSetHasTables]) serializeTables();
if (flags[DataSetHasForeignKeyConstraints])
serializeForeignKeyConstraints(getForeignKeyConstraints(dataSet));
if (flags[DataSetHasRelationships]) serializeRelationships();
if (flags[DataSetHasExtendedProperties])
serializeExtendedProperties(dataSet.ExtendedProperties);
return getSerializedBytes();
}
I won't be reproducing the code for all of the methods but this is the top-level code and this pattern tends to repeat for
Deserialization is generally a reverse of this process. It is essential that all values read from the public DataSet DeserializeDataSet(DataSet dataSet, byte[] serializedData)
{
this.dataSet = dataSet;
reader = new SerializationReader(serializedData);
dataSet.EnforceConstraints = false;
BitVector32 flags = reader.ReadOptimizedBitVector32();
if (flags[DataSetHasName]) dataSet.DataSetName = reader.ReadString();
dataSet.Locale = new CultureInfo(reader.ReadOptimizedInt32());
dataSet.CaseSensitive = flags[DataSetIsCaseSensitive];
if (flags[DataSetHasNamespace]) dataSet.Namespace = reader.ReadString();
if (flags[DataSetHasPrefix]) dataSet.Prefix = reader.ReadString();
if (flags[DataSetHasTables]) deserializeTables();
if (flags[DataSetHasForeignKeyConstraints])
deserializeForeignKeyConstraints();
if (flags[DataSetHasRelationships]) deserializeRelationships();
if (flags[DataSetHasExtendedProperties])
deserializeExtendedProperties(dataSet.ExtendedProperties);
dataSet.EnforceConstraints = flags[DataSetAreConstraintsEnabled];
throwIfRemainingBytes();
return dataSet;
}
Some objects have an implied order. For example, DataTableIf we have reached the code to serialize a Some fields on a
DataColumnSerialization of these pretty much follow the same pattern of retrieving flags and conditionally serializing values according to the flags. We have already mentioned the DataRowSerialization begins with writing the number of rows first. Remember that this method won't be called at all if there are no rows (from the DataTable at least) but this method is also used by the helper methods that write the row data only. A
We could have used the ItemArray property to retrieve the values for most of these row versions but ItemArray throws an exception for Deleted rows. To get DeletedRow values requires the use of an indexer overload that allows the required DataRowVersion to be passed in. Rather than use one method for Deleted rows and another for all other states, I created a helper method which takes a DataRow and a DataRowVersion as parameters and returns an object array (actually ItemArray does the same thing internally anyway - did you know that the data is actually stored within the We also have to deal with Expression columns to ensure that their values are set to null prior to serialization. To do this, I create an Here is the code for deserialization: private void deserializeRows(DataTable dataTable)
{
ArrayList readOnlyColumns = null;
int rowCount = reader.ReadOptimizedInt32();
dataTable.BeginLoadData();
for(int i = 0; i < rowCount; i++)
{
BitVector32 flags = reader.ReadOptimizedBitVector32();
DataRow row;
if (!flags[RowHasOldData])
row = dataTable.LoadDataRow(reader.ReadOptimizedObjectArray(),
!flags[RowHasNewData]);
else if (!flags[RowHasNewData])
{
row = dataTable.LoadDataRow(reader.ReadOptimizedObjectArray(), true);
row.Delete();
}
else
{
/* LoadDataRow doesn't care about ReadOnly columns but ItemArray does
Since only deserialization of Modified rows uses ItemArray we do this
only if a modified row is detected and just once */
if (readOnlyColumns == null)
{
readOnlyColumns = new ArrayList();
foreach(DataColumn column in dataTable.Columns)
{
if (column.ReadOnly && column.Expression.Length == 0)
{
readOnlyColumns.Add(column);
column.ReadOnly = false;
}
}
}
object[] currentValues;
object[] originalValues;
reader.ReadOptimizedObjectArrayPair(out currentValues,
out originalValues);
row = dataTable.LoadDataRow(originalValues, true);
row.ItemArray = currentValues;
}
if (flags[RowHasRowError]) row.RowError = reader.ReadString();
if (flags[RowHasColumnErrors])
{
int columnsInErrorCount = reader.ReadOptimizedInt32();
for(int j = 0; j < columnsInErrorCount; j++)
{
row.SetColumnError(reader.ReadOptimizedInt32(), reader.ReadString());
}
}
}
// Must restore ReadOnly columns if any were found when deserializing a
// Modified row
if (readOnlyColumns != null && readOnlyColumns.Count != 0)
{
foreach(DataColumn column in readOnlyColumns)
{
column.ReadOnly = true;
}
}
dataTable.EndLoadData();
}
The indexers on For Unchanged and Added rows, we just supply the deserialized object array and a bool (obtained from our flag set) to specify whether to AcceptChanges immediately or not (true for Unchanged and false for Added. DeletedRows is similar but we accept and then immediately call Delete() to get the desired effect. Modified rows are treated slightly differently because we have two sets of object arrays but LoadDataRow can only take the first one - after that we need to apply the second set on the existing data to 'modify it'. The ItemArray property will allow us to do this but there is a gotcha - an exception will be thrown for ReadOnly columns. To get around this we need to remove the ReadOnly status for all columns (except those with an Expression) whilst we are deserializing the data. To do this optimally, we do this if, and only if, a Modified row is found and we do it just the once making a note of the column ordinals we changed - when all data is deserialized we make those columns ReadOnly again. UniqueConstraintA UniqueConstraint always has a name - it's never null or empty. If you don't specify a particular name, it will default to "Constraintxx" where xx is the next number within the DataSet. We can use this to our advantage - the method that gets the flags associated with a UniqueConstraint uses a regular expression to find a default name. If one is found, we only need to serialize the xx number; otherwise we store the full assigned constraint name. A Unique Constraint ultimately needs to store DataColumns on deserialization. There are a number of ways to achieve this - we could store the column names and since they would already have been stored previous during serialization of the DataColumn they would only take up the size of a string token. However, we can do better than that - since we know that the Further, by using a bit flag indicating whether the constraint is comprised of more than one DataColumn, we don't need to store the column count in the general case of a single-column constraint. ForeignKeyConstraintA Although a Therefore we need to identify the table(s) involved which we can do by using the ordinal of the Typically, the parent-side will be the A ForeignKeyConstraint also has rules: For the latter two, the options are DataRelationA DataRelation uses the same optimization techniques for ExtendedPropertiesThis is an instance of A Final Words and a CaveatOne caveat with Fast Serialization I should point out is that it was originally designed for remoting purposes. Since remoting would involve the use of identical code on both serialization and deserialization sides at any given time, it is impervious to any modifications made the code. This isn't necessarily the case however if you persist the serialized data to a file or database - any changes to flags or storage order would likely make the data unreadable. Hopefully you will have found the code in this and the previous article useful for your serialization/remoting needs. Please feel to comment, suggest any changes or reports any bugs here on Code Project. Changes from v1 to v2
History
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||