|
Introduction
Dictionary objects take a single key as a look up key. This class simplifies using a Dictionary when you have multiple keys, such as two strings and an int, etc. Use this class when just tacking all the keys into a single string and using that as a key makes you feel dirty.
Here is an example of how this class can be used, with a sample class called TestClass. Define a new class for the key and, in it, implement the abstract method GetKeyValues which will return an array of values to use as the key. In that method, just return the properties you want to use as the key for looking up the object in a Dictionary.
public class TestClass
{
public string Column1 = null;
public string Column2 = null;
public TestClass(string Column1, string Column2)
{
this.Column1 = Column1;
this.Column2 = Column2;
}
}
public class TestClassKey : ClassKey<TestClass>
{
public TestClassKey(TestClass ClassReference) : base(ClassReference) { }
public override object[] GetKeyValues()
{
return new object[] {
ClassReference.Column1,
ClassReference.Column2
};
}
}
And, here is an example of a unit test confirming that it does indeed work:
TestClass model1 = new TestClass("abc", "def");
TestClass model2 = new TestClass("abc", "def");
Assert.AreEqual(new TestClassKey(model1), new TestClassKey(model2));
Assert.IsTrue(new TestClassKey(model1) == new TestClassKey(model2));
model1.Column1 = "xyz";
model2.Column2 = "123";
Assert.AreNotEqual(new TestClassKey(model1), new TestClassKey(model2));
Assert.IsTrue(new TestClassKey(model1) != new TestClassKey(model2));
Using the code
Since this is for use primarily on Dictionary objects, I had to get very familiar with equality overriding and the GetHashCode method. This is something you want to do once and in one place as it is very, very easy to get wrong. I've been using this in production systems for more than a few months, and have added tests for various bugs I've seen - so I'm pretty confident this code works well. I'm posting it here because I'm curious what other people think, or if there is an easier way to do this. Here is the entire code for ClassKey.cs:
public abstract class ClassKey<T> where T : class
{
public T ClassReference
{
get { return _CollectionItem; }
set { _CollectionItem = value; }
}
private T _CollectionItem = null;
public ClassKey() { }
public ClassKey(T CollectionItem)
{
this.ClassReference = CollectionItem;
}
public override bool Equals(object obj)
{
if (obj is ClassKey<T>)
{
return (obj as ClassKey<T>).GetHashCode() == this.GetHashCode();
}
else
return false;
}
public static bool operator ==(ClassKey<T> p1, ClassKey<T> p2)
{
if ((object)p1 == null && (object)p2 == null) return true;
if ((object)p1 == null || (object)p2 == null) return false;
return (p1.Equals(p2));
}
public static bool operator !=(ClassKey<T> p1, ClassKey<T> p2)
{
return !(p1 == p2);
}
public abstract object[] GetKeyValues();
public override int GetHashCode()
{
object[] keyValues = GetKeyValues();
int FinalHashCode = 17;
int OtherCoPrimeNumber = 37;
if(keyValues != null)
foreach (object keyValue in keyValues)
{
if (keyValue != null)
{
FinalHashCode = FinalHashCode *
OtherCoPrimeNumber + keyValue.GetHashCode();
}
}
return FinalHashCode;
}
}
Points of interest
You can also inherit directly from ClassKey if you don't want to use a separate class for comparisons. It is also a handy class to use if you want to override the == or GetHashCode methods so you don't have to remember all the little details involved in overriding all the methods that need to be done any time you touch any one of them. I would be interested to hear from people if the portions dealing with prime and co-prime numbers can be done in a different or better way.
Struct implementation
I added another download file which has a struct only implementation of the key, after reading one of the reader comments. I dislike the implementation because the contents of the key have to be specified multiple times, i.e., each time it is used since there isn't inheritance with structs. That being said, it does appear to be about 15-20% faster for lookups, but on a million rows of lookups, that ends up being 1200ms vs. 1000ms, so I still prefer the class/inheritance method.
Performance measurements on various methods
I did the below by running the various unit tests for the different methods. I tried to keep all of them more or less the same to try and keep it fair. To measure the memory, I simply killed ProcessInvocation.exe (the TestDriven.NET test runner), and ran the perf test which does a million iterations 10 times and averages the results for initialzation and lookup. No matter which way you go, there is a tradeoff. I still prefer the ClassKey method. Though it takes a bit longer (200ms on a million rows), I think it makes bugs far less likely to appear, and is much more intuitive. The Struct implementation takes a bit longer to initialize, but is faster for lookups, but less maintainable. The Dictionary of Dictionaries is the fastest for lookups, but takes longer to initialize and uses twice as much memory as the ClassKey method - presumably because it is creating another dictionary object for each item in the list. I also consider it to be the worst syntax and maintainability-wise. The concatenated string key isn't too terrible for performance, so if you were lazy and not wanting to implement something like this, then I think that would be the way to go as long as you have a common method for constructing the key that can be reused (and not specified on every use). It also took a significantly larger amount of memory though. These numbers aren't guaranteed or perfect, just some back of the envelope measurements I'm using on my system to have some basis for comparison.
Class key
Initialization: 3,018ms
Lookups: 1,144ms
Memory - Never above 313MB
Struct
Initialization: 3,210ms
Lookups: 1,064ms
Memory - Never above 354MB
Dictionary of Dictionaries
Initialization: 4,313ms
Lookups: 919ms
Memory - Never above 555MB
Concatenated string key dictionary
Initialization: 3,305ms
Lookups: 1,039ms
Memory - Never above 460MB
Tuple method
Initialization: 3,810ms
Lookups: 3,241ms
Memory - Never above 316MB
History
- 22-Apr-2008
- Initial version.
- Fought with formatting, got sick of dealing with the CodeProject editor.
- 23-Apr-2008
- Added new zip file with struct implementation and unit test to run 1 million times.
- Added new zip file with straight dictionary implementations and all of the above.
- 30-Apr-2008
- Added code download for unchecked/tuple discussions.
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 29 (Total in Forum: 29) (Refresh) | FirstPrevNext |
|
|
 |
|
|
//first co-prime number int FinalHashCode = 17; //other co-prime number - ask me if I know what //co-prime means, go ahead, ask me. int OtherCoPrimeNumber = 37;
Two integers are considered coprime if their greatest common divisor is 1. It takes two numbers to be coprime. So, your "first co-prime number" and "other co-prime number" comments are technically incorrect. 
Incidentally, the integers don't have to be prime in order for them to be coprime. For example, 12 and 25 are coprime.
Best Regards, Dustin Campbell Developer Express, Inc.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
"Technically you are correct... the best kind of correct" - Futurama
Yeah, basically I didn't really understand that portion but in all my searches and reading that was the one thing that fixed the problem and passed all my tests. You're the first one to ask me, so thanks for the tip.
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Paul B. wrote: "Technically you are correct... the best kind of correct" - Futurama
Except that I said you were technically incorrect. 
FYI, the Wikipedia article on coprime is accurate.
Best Regards, Dustin Campbell Developer Express, Inc.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Why not a real tuple? That is what I have used as a dictionary key for multi-keyed dictionaries, because... they aren't really multikeyed. they are keyed by a tuple. This is VERY natural for anyone with python experience.
http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2069[^]
http://diditwith.net/2008/04/03/ApplesAndOranges.aspx[^]
The tuple has other uses too, such as a type param to generic EventArgs class, or even just a list of tuples, or really anytime you don't want to explicitly create a new type.
Also, does AnonymousTypes in C# 3 do the same thing? couldn't you just key the dictionary by "new { A= firstkeyval, B= secondkeyval}"? The key of the dictionary can't be infered by a constructor, but a CreationMethod that wraps the constructor and first element creation might allow the compiler to infer the types.
I couldn't let the above message go as speculation, so I confirmed that the following test does pass:
[TestFixture] public class ExperimentTests { [Test] public void DictionaryWithAnonymousKeyTest() { var x = DictionaryHelper.Create(new { a = 1, b = 2 }, "one"); x.Add(new { a = 2, b = 3 }, "two"); Assert.AreEqual(x[new { a = 2, b = 3 }], "two"); Assert.AreEqual(x[new { a = 1, b = 2 }], "one"); } } public class DictionaryHelper { public static Dictionary<K, V> Create<K, V>(K key, V value) { Dictionary<K, V> retval = new Dictionary<K, V>(); retval.Add(key, value); return retval; } public DictionaryHelper() { } }
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I'm surprised that works - from looking at your code I would think it would do a reference comparison and not look at all the properties of the anonymous type. I'm definitely going to research that to see how that works.
That being said, it still suffers from the same problem as the other dictionary implementations, namely that you specify what the key is every time it is used. What I like about ClassKey is that the access method is in a single place so when it turns out one more key is needed for uniqueness that isn't a big operation to find and replace all those instances.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
I read your article, definitely confirms what I learned doing this. I think the syntax for using them for the dictionary is a hassle and if the lookups are three times slower then I don't have a strong incentive to use them. I would be curious if you see anything flawed in my tests that would make them as fast as the ClassKey method I am using.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
My article? I haven't written an article about creating composite keys for dictionaries.
Best Regards, Dustin Campbell Developer Express, Inc.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I few interesting notes on this - check out the Tuple class in that article and you'll see he is overriding the equals and gethashcode methods similar to ClassKey. That hashcode implementation is wrong though, that was the first one I tried myself - it will return the same value for "one", "two" as "two", "one". Another reason to keep all this stuff in one place.
Your method above does indeed work. The need for the helper class and passing in the delegate to create it is a bit strange though. The first thing I did was just make the dictionary keyed by object and pass the new { } as the key - which did indeed work. I then declared a test class with the same properties and that did not work. Which means that the anonymous types are structs underneath, not classes... So value comparisons work by default, no need to overload hashcode and equals. Good to know!
By making a dedicated struct and a helper method similar to the one I used for strings (shown below) I was able to centralize the location of the key so it doesn't need to be specified every time. I'll update the post with performance numbers, I suspect it will be similar to the struct and the need for boxing/unboxing of the struct will cause slightly worse performance, but if you didn't want to mess with the ClassKey method this would definitely be a way to go for this problem.
private TestClassKey GetTestClassKey(TestClass testClass) { return new TestClassKey { key1=testClass.Column1, key2=testClass.Column2 }; }
private struct TestClassKey { public string key1; public string key2; }
Dictionary<TestClassKey, TestClass> testLookups = new Dictionary<TestClassKey, TestClass>();
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I'm having problems submitting my article ("Object reference not set", classic) but I did the same perf tests with this method that I did with the others and it was 3 times slower for lookups, by far the worst of all the methods... I even went so far as trying out parameters to avoid making additional copies of the value types. I'll try again tomorrow to upload the files and update the article so you can see for yourself.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Ok, I'm not getting errors modifying my article today. The last download will give you the tuple style code, it appears to be much worse performance wise. Let me know if you see something I did that is different than the other methods that would account for the difference.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I just read something that I think answers the performance issue. There is no boxing since generic types are used, but... Value types implement equality checking by using reflection - that is probably why my initial struct implementation that overloaded gethashcode and equals was a bit faster than the ClassKey method, while the anonymous types were quite a bit slower. I don't think I'll use anonymous types for anything other than LINQ queries given this, since I already have this implementation.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Why don't you put "unchecked" block around the hashcode computations? They will produce overflow with high probability. Currently your code will crash if compiled with /check+ compiler option.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I've wondered about that but I have yet to see it overflow, even in the tests I'm running with millions of records. At all the hashcode calculations I've looked at nobody seems to handle that case. What should be done to prevent it? I was thinking the XOR'ing was somehow preventing it.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Try to execute this code:
checked { Random r = new Random(); int i=0, t; try { for (i = 0; i < 100000; i++) { byte[] bs1 = new byte[1000]; byte[] bs2 = new byte[1000]; r.NextBytes(bs1); r.NextBytes(bs2); t = Encoding.ASCII.GetString(bs1).GetHashCode() + Encoding.ASCII.GetString(bs2).GetHashCode(); } } catch (OverflowException) { MessageBox.Show("Overflow on step " + i); } MessageBox.Show("Finished."); }
and see on which step it will receive first overflow. I get it on first 5 steps almost always. Sometimes it goes until step 10. And this is just one specific class - String. You should write your generic classes which use GetHashCode() with assumption they return any valid value - from int.MinValue to int.MaxValue.
As to preventing it - if your code assumes overflow a normal case then you you just must be sure that overflows will be processed in "unchecked" mode no matter which are compiler options. So just use "unchecked{}" block to envelope all the overflow-causing code. Overflow itself in hash calculations is not a bad thing at all.
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Ah, that is interesting, I've never used that before. I'll try and reproduce the overflow exception with a unit test and see how it reacts to the unchecked keyword. Thanks!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
This was a pretty interesting adventure... I used the below code to reproduce the situation you are referring to:
Random r = new Random(); for (int i = 0; i < 100000; i++) { byte[] bs1 = new byte[1000]; byte[] bs2 = new byte[1000]; r.NextBytes(bs1); r.NextBytes(bs2);
TestClass t = new TestClass(Encoding.ASCII.GetString(bs1), Encoding.ASCII.GetString(bs2)); new TestClassKey(t).GetHashCode(); }
Originally I put a checked { } section around it and did not get an overflow. After a while I figured out the checked only applies to the code at the top level, i.e. the overflow code was inside a method so that checked keyword wasn't taking effect.So then I added the checked { } section around the actual hashcode calculation and got the overflow exception. Then for grins I added an unchecked section around the test - still got the overflow. Then I removed all the checked and unchecked blocks and added the "check for arithimetic overflow" option in the properties/build/ advanced page for ClassKeyDemo - and got the overflow exception.
Take aways:
Code in VS is unchecked by default - I didn't know this, I think I'll start turning it on by default. This is why it wasn't ever a problem for me.
A checked block still takes effect if an unchecked block is wrapped around it.
A checked block doesn't check the code inside a method called from inside the checked block, only code immediately within the check block.
I'll upload the latest files a bit later once I clean all this up
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Paul B. wrote: Code in VS is unchecked by default - I didn't know this, I think I'll start turning it on by default.
After you turn it on, be prepared for bugs appearing where there were none before.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
In fact, I just wrote one today to hold a translation table based on two strings to get an int.
private readonly System.Collections.Generic.Dictionary <string,System.Collections.Generic.Dictionary<string,int>> ResponseText = new System.Collections.Generic.Dictionary <string,System.Collections.Generic.Dictionary<string,int>>() ;
No muss, no fuss, no hassle.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
You would need to keep nesting Dictionary objects to support 3,4, etc. keys and the syntax is a bit clunky. You would also need to make changes in quite a few places when the key changed - But it's a good idea.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
I added some tests of the method you propose. I'm not a fan of the syntax and it uses twice as much memory for all the extra dictionary objects. But is is a pretty fast lookup.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Paul B. wrote: not a fan of the syntax
Eh, only I have to see it.
Paul B. wrote: uses twice as much memory
Not necessarily; it reduces duplication of the outer-most values
e.g. in
"Smith"+"John"=1 "Smith"+"Jane"=2
the "Smith" is only stored once.
Paul B. wrote: pretty fast lookup
Especially if the first value isn't present -- short-cut to failure.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
How do you do a ContainsKey implementation, call it once for every key? Or just try and access it and catch exceptions? Seems like a pain either way.
Have you ever used it for more than two keys?
Just curious.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
if ( MyTree.ContainsKey ( outer ) && MyTree [ outer ].ContainsKey ( inner ) ) ...
or
return ( MyTree.ContainsKey ( outer ) && MyTree [ outer ].ContainsKey ( inner ) ) ;
It could be wrapped in a class, but it's not general-purpose enough for that.
On the other hand, the various keys don't all have to be the same type.
|
| Sign In·View Thread·PermaLink | |
|
| | | |