Introduction
Sometimes we have to serialize objects, e.g. to send them over a network, store and restore them locally or for any other reason. Now it can be useful if we would know after the deserializing process, if the object has been restored correctly. Especially if you have objects which have internal states or if you must manage multiple instances of a class. A possible solution to this problem is using the System.Guid
struct to identify the objects. But in this way, you cannot be sure that the internal states, etc. were deserialized correctly (see Background for explanation).
A commonly used technique in the Internet is to provide a MD5 - Hash String so the receiver can compare if the file has been transmitted without any modifications.
Background
The .NET Framework gives us a struct to uniquely identify our objects, the System.Guid
struct in the mscorlib.dll. This struct can be used to give each class its own identifier. And that's the crux of the matter. What we need is not an identifier for the class, we need an identifier for each instance of the class. Implicitly this identifier must also represent some internal values like state. Otherwise our recipient of the object cannot be sure, that he has received / deserialized the same object. Also our recipient cannot "create" a GUID on his own. Once it is created by the sender, it is not reproducible.
We must also provide a functionality, which can be executed by both, sender and recipient, to identify an object. This identifier must also implicit regard on the fields which are relevant for this object. And these relevant fields can be different for each class!
The idea I had was to use MD5 hashes for that. Each object has a built-in function called .GetHashCode()
. This method returns an Integer
, although according to the name of the method, you would expect a string
. That's because these HashValues
are intended to be used as Keys in e.g. a HashTable
.
But fortunately, there exists a class named MD5CryptoServiceProvider
in the System.Security.Cryptography
namespace. Unfortunately, this class is not easy to use. The main problem for most programmers could be that the class only accepts a byte-array as input and not a reference to an object. So I decided to wrap all the needed functionality into a generator class. This class could then generate the Hash for me, and I have to write just one line of code.
Using the Code
The codefile above contains a class called MD5HashGenerator
. This class has a static
method .generateKey(Object sourceObject)
, which does the "magic" for you. Include the class into your project, and use it as follows:
To use the class (as a publisher), you have to do the following things:
- Mark the object as
Serializable()
. Mark all variables which should not be serialized as NonSerializable()
.
- Call the
static
method MD5HashGenerator.generateKey(Object sourceObject)
. You get the MD5 - Hash for the object as a String
.
- Serialize the object, publish / store it and the hash.
If you are the receiver, then:
- Deserialize the received object.
- Call the
static
method MD5HashGenerator.generateKey(Object sourceObject)
on the deserialized object.
- Compare the hashes.
Example
We want to serialize a class which has a string
, an int
and a DateTime
. The dateTime
member is set at creation time, so it is different for each instance of the class. As mentioned above, the class must be tagged as serializable. It (could) look like this:
using System;
using System.Runtime.Serialization;
[Serializable]
public class SimpleClass
{
private string justAString;
private int justAnInt;
private DateTime justATime;
public SimpleClass()
{
justAString = "Some useless text";
justAnInt = 345678912;
justATime = DateTime.Now;
}
}
Because we use the system method DateTime.Now
to initialize the field justATime
, each instance of the class should be different. It is important to "mark" the class as Serializable
, because this is asked by the MD5HashGenerator
-class.
The generator class uses the BinaryFormatter
for serialization, so all fields (whether they are private
or not are automatically included in the serialization process). But exclude handles and pointers, if you are using them. See [1] for details.
The class which "publishes" the object must then do the following things:
...
SimpleObject simpleObject = new SimpleObject();
string simpleObjectHash = MD5HashGenerator.generateKey(simpleObject);
...
Now the "consumer" can deserialize the SimpleObject
and also call MD5HashGenerator.generateKey(simpleObject)
on the deserialized object. He can then compare the hashstrings and decide if it's the same object.
How It Works
The code of the MD5HashGenerator.generateKey(Object SourceObject)
method looks like this:
public static String GenerateKey(Object sourceObject)
{
String hashString;
if (sourceObject == null)
{
throw new ArgumentNullException("Null as parameter is not allowed");
}
else
{
try
{
hashString = ComputeHash(ObjectToByteArray(sourceObject));
return hashString;
}
catch (AmbiguousMatchException ame)
{
throw new ApplicationException("Could not definitely decide
if object is serializable. Message:"+ame.Message);
}
}
}
Let's have a deeper look at the following line of code:
hashString = ComputeHash(ObjectToByteArray(sourceObject));
As mentioned above I used the MD5CryptoServiceProvider
class to generate the Hashstring
. I encapsulated the use of the method in the ComputeHash(byte[] objectAsBytes)
method. Here's the implementation:
private static string ComputeHash(byte[] objectAsBytes)
{
MD5 md5 = new MD5CryptoServiceProvider();
try
{
byte[] result = md5.ComputeHash(objectAsBytes);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < result.Length; i++)
{
sb.Append(result[i].ToString("X2"));
}
return sb.ToString();
}
catch (ArgumentNullException ane)
{
Console.WriteLine("Hash has not been generated.");
return null;
}
As you can see, the MD5CryptoServiceProvider
class wants a byte
array as input. It does not accept an object directly. What you get out of it is not a string
as we would like to have, but a byte
array. Therefore I added the conversion from byte
array to Hex. The conversion is done by using the Byte.ToString()
method. The method accepts a formatstring as input. And "X2
" here means that each byte is converted into a two-char-string-sequence (e.g. 01011100 => 5C or 00000111 => 07).
Now there is still the question as to how to convert an object into a byte
array. We know that our object is serializable. So we can serialize it into the memory (using a MemoryStream
and a BinaryFormatter
) and getting out of the memory the needed byte
array. Because the whole thing should be thread-safe, we lock the Serialization
of the object.
private static readonly Object locker = new Object();
private static byte[] ObjectToByteArray(Object objectToSerialize)
{
MemoryStream fs = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
try
{
lock (locker)
{
formatter.Serialize(fs, objectToSerialize);
}
return fs.ToArray();
}
catch (SerializationException se)
{
Console.WriteLine("Error occurred during serialization. Message: " +
se.Message);
return null;
}
finally
{
fs.Close();
}
}
Conclusion
Generating MD5-hashes can be useful, if you must have a procedure both sides can execute to ensure the uniqueness and changeless serialization / deserialization of objects. The most difficult part for me was to convert an object into a byte
array and the conversion of a byte
array to an Hex - String
. Using Guids is also a possibility. But the Guid is created when the object is initialized and the consumer cannot "recreate" the Guid to ensure that no changes on the object were done. He just knows that he has received the same object the producer has created.
What I didn't do is all the security issues. Using only MD5 Hashes is not reliable enough. If you need strong security, provide RSA - encrypted channels or other encryption methods.
References
History
- V1.2 -- 28.07.2008 -- Refactored the article, after some discussions
- V1.1 -- 25.07.2008 -- Added some modifications according to the post of Adam Tibi
- V1.0 -- 15.11.2007 -- First version of article