Click here to Skip to main content
Click here to Skip to main content

Bend the .NET Object to your Will!

By , 14 Jan 2009
 

Introduction

Have you ever had to implement ICloneable on a complex type? Gets out of hand in a hurry, doesn't it? How about IEquatable<T>? Here's a good one: what happens when you need to serialize an object graph using BinaryFormatter (so it can be transmitted or stored) and somewhere in the tree there's a type you don't control that isn't serializable? XML to the rescue, right? But when you punt the object over to the XmlSerializer, there are read-only properties you don't control that aren't participating. Now what? Create your own surrogate type and handle the marshalling operations in some utility? Sounds like a pain in the butt to me. Which is why I decided to do it one more time, and then never again. :)

In order to clone an object, what do you really need? Ultimately, all you need is the structure of the object, and its simple values. If you know those two things, a new copy of the object can be constructed.

What about deep comparison between objects? Same thing. If an object's structure and each of its simple values equal another object's, then those objects are value-equivalent.

And wouldn't you know it, the process of serializing an unknown type requires that we store the structure of the object and its simple (implicitly serializable) values in a new structure that can be serialized.

Since all three features depend on the same thing happening to your objects, all of the extension methods delivering these features depend on the same class : ObjectGraph.

Background

This article focuses on a few small extension methods that all make use of a new class called ObjectGraph. This class decomposes objects down to their simplest values while maintaining member association. This enables objects to be analyzed and manipulated in fine-grained ways, regardless of type.

This article makes use of .NET Framework 3.5.

Using the Code

The code is extremely easy to use:

var instance = new ComplexType // this object could be anything at all
{
    Id = 47,
    Name = "My Complex Type",
    ArbitraryValue = ArbitraryEnum.Foo,
    Values = new List<string>(new[]{"Value1", "Value2", "Value3"})
};
 
// extension method:  Clone
ComplexType clone = instance.Clone(); // a true deep copy
 
// extension method:  ToBinaryString
string serializedInstance = instance.ToBinaryString(); // a base-64 encoded byte array 
 
// extension method:  ToObject<T>
var deserializedInstance = serializedInstance.ToObject<ComplexType>(); // another clone! 
 
// extension method : ValueEquals
bool isCloneEqual = instance.ValueEquals(clone); // true
bool isRoundTripEqual = instance.ValueEquals(deserializedInstance); // also true :)

How It Works

The biggest convention breaker here is the idea of being able to serialize any object using the BinaryFormatter, even ones that aren't decorated with [Serializable]. It's a simple trick: the object being serialized isn't your object. It's actually a wrapper class (ObjectGraph) that is 100% serializable, and stores enough information to completely rehydrate your object after being deserialized.

When ObjectGraph wraps an object, several things may take place, depending on the object being wrapped. If the wrapped object is a simple type, i.e. one that the code recognizes as being directly serializable, then the raw value of the object is stored and the wrapping operation is complete. If the object has already been wrapped in the current graph, a pointer to the original wrapper is stored. If the object is an array of other objects, then the array items are individually wrapped and stored. If the object is a complex type, then each of its member variables are wrapped and stored in a name-keyed dictionary.

Why member variables? This is the key. No matter what the public interface of your class, if the class holds state information at all it will be in a member variable. Automatic properties get their variables generated for them, but it's all the same. Once I have the value of all of an object's variables, I can use Reflection-based instantiation to create an exact copy of the object, or compare them to any other object with a matching type.

Most of ObjectGraph's code loses its meaning if you try to read individual methods out of context, so I apologize if this doesn't make enough sense, but here's the private ObjectGraph constructor; it should give some clue as to how the ObjectGraph analyzes the object it wraps.

private ObjectGraph(object data, GraphRegistry registry, bool isRootGraph)
{
	// make sure to unhook all pointers created during scan
	using (new DisposableContext(() => { if (isRootGraph) registry.Clear(); }))
	{
		_isValueBased = data.IsValueBased();
 
		if (_isValueBased) _value = data;
		else
		{
			_pointer = registry.Register(data, this);
 
			if (_pointer == null)
			{
				_isArray = data is Array;
 
				if (_isArray)
				{
					_arrayItems = GetItems((Array) data,
                                                 registry).ToList();
 
					// CLR gens type names for arrays using the
					// {itemTypeName}[{length}] syntax.
					_type = Regex.Replace(
                                                                    data.GetType().AssemblyQualifiedName,
					                      @"\[\d*\]", string.Empty);
 
					return;
				}
 
				_state = GetValues(data, registry);
			}
		}
 
		_type = data != null ? data.GetType().AssemblyQualifiedName : string.Empty;
	}
}

Points of Interest

Check out the unit tests in the source. They show that in addition to CLR types & your custom types, anonymous types will also participate happily with the ObjectGraph class.

Speaking of which, the unit tests included in the source are not really unit tests; they are integration tests with BDD naming semantics, all of which is completely improper. The only reason they are present is so that I (and you) can quickly debug the code. Please do not think that this article is attempting to address the proper way to implement TDD or BDD. In fact, here's a disclaimer: THIS ARTICLE DEMONSTRATES POOR TESTING HABITS.

Also, since indirect recursion is used in both object scanning and rehydration, I have concerns that a graph of sufficient depth could cause a StackOverflowException to occur. I have not been able to make this happen in practical use, so it may be okay for most scenarios. Fair warning.

Finally, I would like to thank the members who quickly responded with some critical feedback that led to this component's current status. Your input is much appreciated!

Enjoy :)

History

  • 12/19/08: Submitted first draft
  • 12/20/08: Submitted second draft & code revision for cyclic reference support
  • 1/11/09: Submitted final draft

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

John Batte
Architect
United States United States
Member
I'm a professional developer with over 9 years of experience in advanced C# development. I've worked extensively with every phase of the SDLC and have developed and deployed many enterprise solutions using the latest .Net technologies. Please let me know if you have any suggestions or questions.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralSomewhat presumptuous to assume such cloning/comparison will workmembersupercat916 Jan '09 - 7:59 
In some cases, objects may be meaningfully cloned, compared, or serialized by simply processing all of their sub-components. There are many objects which disallow such operations only because the programmer neglected to provide them; using reflection to process such objects may be useful.
 
On the other hand, even within entirely managed code, it's possible for the state of an object to be dependent upon other objects to which it contains no references. For example, consider this class:
Public Class Thingie
  Public Z() As Integer ' Just a nice simple array
End Class
 
An array of integers looks like it should be an easy thing to clone, compare, or serialize, but suppose that a Thingie's "Z" field references an array which some other class is updating once per second. Deep cloning a Thingie by creating a new array would yield a clone whose behavior was very different from the original object; shallow-cloning a Thingie would yield a clone whose sub-content could not be altered without affecting the original.
 
Another issue which poses problems for serialization is that even when there are no untraceable outside references to objects within a class, the meaning of member fields fields may depend upon the value of shared fields. A serializer would have access to the shared fields, but would have no way of knowing what they mean. For example:
Public Class Thingie2
  Shared ID as String = Guid.NewGuid.ToString
  Public St As String
End Class
 
At the time a Thingie2 is serialized, ID holds a particular value and St holds the same value. Later on, an attempt is made to deserialize when the class' ID value is different. Would the object be restored to its correct logical state by setting St equal to the new ID value, setting it to its old value, or neither. How could one tell?
GeneralRe: Somewhat presumptuous to assume such cloning/comparison will workmemberJohn Batte4 Jun '09 - 8:59 
First of all, I apologize for taking so long to reply. My day job and my girlfriend keep me very busy Big Grin | :-D
 
What you're describing is a business object, in the middle of a business process. It could be small and algorithmic in nature, or it could be large and transactional in nature. Either way, the process of cloning or serializing does not concern business logic, and thus should not impact the logic you've defined. Just remember that when other objects already carry references to your first instance, this means nothing for your second instance. It has simply skipped over the initialization process as you would've normally defined it, and is now ready for consumption in other parts of your code. In no way is it referentially tied to the first instance.
 
In your first example, you have an invalid use case. My code performs a deep clone rather than a shallow one. The goal is to obtain a reference to a second instance that is completely independent of the first, but which also recursively contains all of the field-level values that the first instance had at the time of the snapshot. After that, they are disconnected from one another. Your array updating process will continue, and the second instance (the clone) will remain unaltered. This (in my opinion) is the desired behavior.
 
In your second example, you have another invalid use case. Thingie2 contains no logic to maintain the data integrity between field ID and property St. Clearly, another class maintains this relationship for us, or no such relationship will exist. Cloning Thingie2 takes you from having one class that depends on something else for its integrity, to having two instances of that class, both in the same non-functional boat.
 
Sorry for the C# here, I can tell you're a VB guy...
 
public class Thingie2
{
  protected string _id = Guid.NewGuid().ToString();
 
  public string St{ get{ return _id; } }
}
 
NOW your scenario works. The value of _id is captured and preserved in the second instance. The behavior of the type (which has nothing to do with its shape and/or values) is to mirror the protected field onto the public property. All cloned / deserialized instances now exhibit the behavior you've requested.
 
---if at first you don't succeed, skydiving is not for you.---

GeneralRe: Somewhat presumptuous to assume such cloning/comparison will workmembersupercat94 Jun '09 - 11:29 
Perhaps my examples weren't the best, but my point was that sometimes the fields in objects can have meanings beyond their values. Suppose a class contains an object like a database connection or a reference to a form. One might be able to figure out the connection string and other attributes necessary to create a database connection or form similar to the existing one, but the proper semantics of such a "clone"--even if one could be created--could be rather vague. What should happen if a database connection is cloned between a "BEGIN TRANSACTION" and "COMMIT TRANSACTION"? Is there any sort of handling that would make sense?
 
I think you misunderstood my second example. My design intention wasn't that "St" will always equal ID, but rather that it might or might not happen to equal ID; if a system doesn't know the significance of values in shared and instance fields, it won't know how to deserialize an object if the shared field values can't be set to match the serialized version.
GeneralRe: Somewhat presumptuous to assume such cloning/comparison will workmemberJohn Batte4 Jun '09 - 17:02 
"The proper semantics could be rather vague" -- honestly, it sounds to me like you're having a hard time coming up with a good use for this Api. If that's the case, then the Api is not for you. The scenarios you are describing do not fit any situation in which I would use this Api. If I had a business component with an open database connection, I would not clone it. That would serve no purpose and make no sense. I may serialize it in a catch block for storage, but that's only so I could go back to it later via a Log-reading application of some sort and examine the object's state at the time of the error. The deserialized object would then never be used in any procedural capacity, because I know that its state is invalid.
 
I forgot that "shared" in VB is "static" in C#. So your second example makes even less sense than I thought. Cloning and serialization are for instances. Instances are allocated on demand. Static (shared) members are allocated once per AppDomain (or per thread if decorated with [ThreadStatic]). No matter how many instances of a type you create, if it has a static member with an explicitly assigned value, that value will not change until acted upon again, by some process. Creating a clone does not have any bearing on this.
 
Given your examples, I have one observation: YAGNI.
 
---if at first you don't succeed, skydiving is not for you.---

GeneralDoesn't work with public fieldsmemberandrewducker23 Dec '08 - 9:16 
If I have a public field on a class then it doesn't get cloned...
 
(And no, it may not be agreed best practice, but that doesn't help in the real world...)
GeneralRe: Doesn't work with public fieldsmemberJohn Batte23 Dec '08 - 9:28 
I see what you mean.
 
Quick fix:
In ObjectGraph.cs:
 
Lines 213 & 219: change the two GetFields calls to GetFields(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public)
 
I've factored this more appropriately into a const that gets used both places. Submitting code update now.
 
Thanks for the notification!
GeneralNeedlessly O(n^2)memberEd Brey22 Dec '08 - 15:10 
The general-purpose serialization concept is a good one, but in many applications serialization is time sensitive. While slow drop-in serialization available is often just fine or at least better than none at all, even better would be some improvements improvements.
 
For example, ObjectGraph.SetValue gets called in a loop for each field of an object, but it itself enumerates the same list a second time in a nested fashion.
 
Because of the heavy use of reflection, I would expect even an optimized version of the library - while still worth optimizing - to be slow compared to typical serialization. This calls into question the API choice of an extension method. I think requiring specific mention of a suitably-named class would be better, e.g. "ReflectionOperation.Clone(someObject)".
 
You want usage of this library to send the message to those reviewing the code, "I can't use traditional cloning, serialization, etc. here, so I'm falling back on a reflection-based approach."
GeneralRe: Needlessly O(n^2)memberJohn Batte22 Dec '08 - 19:32 
I totally agree. Except for the parts that I completely disagree with Laugh | :laugh:
 
As I describe in the first paragraph of the article, I wrote this component to get myself around those niche situations where normal serialization / cloning / etc is failing for various reasons. This code should only be used as a fallback when more performant methods are not an option.
 
SetValue gets called in a loop, true, but that's for each field key. The FieldInfo object I need to set the value on the instance is something I have to look up from the Type instance. They are two different lists. I could try to use FieldInfo as my property bag's key, but I'm trying to rely only on the simplest types within ObjectGraph itself. I've seen Type & MethodInfo instances refuse to serialize before based on problems in the underlying type system. So instead of using FieldInfo and circumventing the secondary loop to find the reflection setter I need, I use string instead and at least I know I'll always be able to serialize.
 
That being said, I'm not making a claim that this code is the leanest, most optimized way to do what I'm doing. Feel free to send me code! I'm not proud of the algorithms, just of the idea and the success of the follow-through.
 
I'm not sure if I see the benefit that you do in changing the naming semantic behind the whole thing from something clean and indicative of purpose to something self-deprecating or unnecessarily verbose. I can see putting something in the Xml comments of each making note of the fact that this is a low-performance method to be used when traditional methods are unavailable. I like the extension methods though. It may seem more appropriate to you, but going with the MyCrappyClass.DoSomethingPoorly(unwillingParticipant) syntax seems sorta depressing to me Cry | :((
Generalrecursion / multiple visitationsmembersprucely19 Dec '08 - 9:35 
What if an object has a child object that refers back to its owning object. Or what about doubly-linked lists where element 1 refers to element 2 and element 2 refers to 1? Does ObjectGraph keep track of objects it has visited so that it does not duplicate them?
GeneralRe: recursion / multiple visitationsmemberJohn Batte19 Dec '08 - 10:43 
Excellent! I knew someone would spot a hole in the design. Looks like I'm going to need to crack the source open and refactor ObjectGraph to handle this. Thanks for the input! I'll update soon with the working solution.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 14 Jan 2009
Article Copyright 2008 by John Batte
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid