 |
|
 |
Hi there!
I think you made a major mistake in your article (if I didn't miss anything).
As the domain of the hashcodes is limited, you will unfortunately get different items which have the same key.
The dictionary usually handles that by looking up the list of items of that key and using Equals to get the unique item which was requested.
What you did is to just compare hashcodes in Equals - which may return true for items which are in fact not equal!
And thats the thing why a generic key class can't work - the logic in equals is really something which is dependent on the type of item.
Users would have to derive from your key class and implement a problem-specific Equals override.
|
|
|
|
 |
|
 |
You are the first to notice and are correct that the initial article post had that bug - see the comment "Better equals method" in the coprime comment thread for the solution to that. Since the interface defined will return us all objects we are using for the kye we can call their equals method to compare them.
|
|
|
|
 |
|
 |
//first co-prime number
int FinalHashCode = 17;
//other co-prime number - ask me if I know what
//co-prime means, go ahead, ask me.
int OtherCoPrimeNumber = 37;
Two integers are considered coprime if their greatest common divisor is 1. It takes two numbers to be coprime. So, your "first co-prime number" and "other co-prime number" comments are technically incorrect.
Incidentally, the integers don't have to be prime in order for them to be coprime. For example, 12 and 25 are coprime.
Best Regards,
Dustin Campbell
Developer Express, Inc.
|
|
|
|
 |
|
 |
"Technically you are correct... the best kind of correct" - Futurama
Yeah, basically I didn't really understand that portion but in all my searches and reading that was the one thing that fixed the problem and passed all my tests. You're the first one to ask me, so thanks for the tip.
|
|
|
|
 |
|
 |
Paul B. wrote: "Technically you are correct... the best kind of correct" - Futurama
Except that I said you were technically incorrect.
FYI, the Wikipedia article on coprime is accurate.
Best Regards,
Dustin Campbell
Developer Express, Inc.
|
|
|
|
 |
|
 |
cls2deg wrote: FYI, the Wikipedia article on coprime is accurate.
At least I didn't do all that reading for nothing...
Now the way I understand it, 1 is coprime of every other integer, so does that mean you could use 1 and 2 as the coprime numbers when calculating the hash value or am I missing something? I know it seems arb but int * 2 is the same as int << 1 which will probably not impact performance at all but does make me feel a little better when dealing with perfomance critical code.
Oh, I haven't checked it at all but I think there might be bug in the GetHashCode implementation presented in the article. If I'm right, a ClassKey with 2 keys, one of which is null, will generate the same hash as a second ClassKey that has the values swapped. It's caused by this if statement:
if (keyValue != null)
{
FinalHashCode = FinalHashCode *
OtherCoPrimeNumber + keyValue.GetHashCode();
}
The FinalHashCode = FinalHashCode * OtherCoPrimeNumber part of the equation should happen regardless of the keyValue hash code. Replacing it with this should solve the problem:
FinalHashCode = FinalHashCode *
OtherCoPrimeNumber + (keyValue == null ? 0 : keyValue.GetHashCode());
It was really late when I looked at this, so please correct me if I'm wrong. Another thing to keep in mind is that Microsoft says[^] GetHashCode is not allowed to throw exceptions, so there should probably be some overflow checking in there somewhere.
Thanks for the article, I was completely stumped on how to correctly implement GetHashCode before finding it.
|
|
|
|
 |
|
 |
I'd need to look into the null thing because I'm pretty sure I had a unit test explictly for that, but perhaps it was the actual values swapped and there wasn't a null in the test. Seems like you are correct though.
The definition for co prime is that the numbers aren't divisible by any other number than 1, so I don't think that would work, but to tell you the truth I don't know. I used it from another source I had found and just trusted their math skills.
As far as the overflow, I did test that and it does occur, in fact it is ok for the overflow to occur. By default overflow checking is not on in VS projects so you never see it. I believe one of these comment threads in here gets into that, you just put an "unchecked" block around the code that could overflow if I remember correctly.
The bigger problem with this code (I think) is that I don't have a different Equals implementation, i.e. I tied together gethashcode and equals. When a hashcode collision occurs it needs to be able to tell if the two items are different. I haven't run into this being a problem yet but I've been pondering that for a while waiting for some strange dictionary behavior but haven't seen it yet. I have some plans to change the Equal to just call the object equals on all the items in the object array provided by the interface but haven't gotten around to it yet.
|
|
|
|
 |
|
 |
Here is a better implementation os the Equals method since it should not use the hashcode to determine equality in case of collision - passes all existsing tests:
public override bool Equals(object obj)
{
ClassKey<T> typedObj = (obj as ClassKey<T>);
if (typedObj != null)
{
object[] typedObjKeyValues = typedObj.GetKeyValues();
object[] theseKeyValues = this.GetKeyValues();
if (typedObjKeyValues == null && theseKeyValues == null) return true;
if (typedObjKeyValues == null || theseKeyValues == null) return false;
if (typedObjKeyValues.Length != theseKeyValues.Length) return false;
for (int i = 0; i < theseKeyValues.Length; i++)
{
object oTheseKeyValue = theseKeyValues[i];
object oTypedObjKeyValue = typedObjKeyValues[i];
if (oTheseKeyValue == null && oTypedObjKeyValue == null) continue;
if (oTheseKeyValue == null || oTypedObjKeyValue == null) return false;
if (!oTheseKeyValue.Equals(oTypedObjKeyValue))
return false;
}
return true;
}
else
return false; }
|
|
|
|
 |
|
 |
Why not a real tuple? That is what I have used as a dictionary key for multi-keyed dictionaries, because... they aren't really multikeyed. they are keyed by a tuple. This is VERY natural for anyone with python experience.
http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2069[^]
http://diditwith.net/2008/04/03/ApplesAndOranges.aspx[^]
The tuple has other uses too, such as a type param to generic EventArgs class, or even just a list of tuples, or really anytime you don't want to explicitly create a new type.
Also, does AnonymousTypes in C# 3 do the same thing? couldn't you just key the dictionary by "new { A= firstkeyval, B= secondkeyval}"? The key of the dictionary can't be infered by a constructor, but a CreationMethod that wraps the constructor and first element creation might allow the compiler to infer the types.
I couldn't let the above message go as speculation, so I confirmed that the following test does pass:
[TestFixture]
public class ExperimentTests
{
[Test]
public void DictionaryWithAnonymousKeyTest()
{
var x = DictionaryHelper.Create(new { a = 1, b = 2 }, "one");
x.Add(new { a = 2, b = 3 }, "two");
Assert.AreEqual(x[new { a = 2, b = 3 }], "two");
Assert.AreEqual(x[new { a = 1, b = 2 }], "one");
}
}
public class DictionaryHelper
{
public static Dictionary<K, V> Create<K, V>(K key, V value)
{
Dictionary<K, V> retval = new Dictionary<K, V>();
retval.Add(key, value);
return retval;
}
public DictionaryHelper()
{
}
}
|
|
|
|
 |
|
 |
I'm surprised that works - from looking at your code I would think it would do a reference comparison and not look at all the properties of the anonymous type. I'm definitely going to research that to see how that works.
That being said, it still suffers from the same problem as the other dictionary implementations, namely that you specify what the key is every time it is used. What I like about ClassKey is that the access method is in a single place so when it turns out one more key is needed for uniqueness that isn't a big operation to find and replace all those instances.
|
|
|
|
 |
|
|
 |
|
 |
I read your article, definitely confirms what I learned doing this. I think the syntax for using them for the dictionary is a hassle and if the lookups are three times slower then I don't have a strong incentive to use them. I would be curious if you see anything flawed in my tests that would make them as fast as the ClassKey method I am using.
|
|
|
|
 |
|
 |
My article? I haven't written an article about creating composite keys for dictionaries.
Best Regards,
Dustin Campbell
Developer Express, Inc.
|
|
|
|
 |
|
 |
I few interesting notes on this - check out the Tuple class in that article and you'll see he is overriding the equals and gethashcode methods similar to ClassKey. That hashcode implementation is wrong though, that was the first one I tried myself - it will return the same value for "one", "two" as "two", "one". Another reason to keep all this stuff in one place.
Your method above does indeed work. The need for the helper class and passing in the delegate to create it is a bit strange though. The first thing I did was just make the dictionary keyed by object and pass the new { } as the key - which did indeed work. I then declared a test class with the same properties and that did not work. Which means that the anonymous types are structs underneath, not classes... So value comparisons work by default, no need to overload hashcode and equals. Good to know!
By making a dedicated struct and a helper method similar to the one I used for strings (shown below) I was able to centralize the location of the key so it doesn't need to be specified every time. I'll update the post with performance numbers, I suspect it will be similar to the struct and the need for boxing/unboxing of the struct will cause slightly worse performance, but if you didn't want to mess with the ClassKey method this would definitely be a way to go for this problem.
private TestClassKey GetTestClassKey(TestClass testClass)
{
return new TestClassKey { key1=testClass.Column1, key2=testClass.Column2 };
}
private struct TestClassKey { public string key1; public string key2; }
Dictionary<TestClassKey, TestClass> testLookups = new Dictionary<TestClassKey, TestClass>();
|
|
|
|
 |
|
 |
I'm having problems submitting my article ("Object reference not set", classic) but I did the same perf tests with this method that I did with the others and it was 3 times slower for lookups, by far the worst of all the methods... I even went so far as trying out parameters to avoid making additional copies of the value types. I'll try again tomorrow to upload the files and update the article so you can see for yourself.
|
|
|
|
 |
|
 |
Ok, I'm not getting errors modifying my article today. The last download will give you the tuple style code, it appears to be much worse performance wise. Let me know if you see something I did that is different than the other methods that would account for the difference.
|
|
|
|
 |
|
 |
I just read something that I think answers the performance issue. There is no boxing since generic types are used, but... Value types implement equality checking by using reflection - that is probably why my initial struct implementation that overloaded gethashcode and equals was a bit faster than the ClassKey method, while the anonymous types were quite a bit slower. I don't think I'll use anonymous types for anything other than LINQ queries given this, since I already have this implementation.
|
|
|
|
 |
|
 |
Why don't you put "unchecked" block around the hashcode computations? They will produce overflow with high probability. Currently your code will crash if compiled with /check+ compiler option.
|
|
|
|
 |
|
 |
I've wondered about that but I have yet to see it overflow, even in the tests I'm running with millions of records. At all the hashcode calculations I've looked at nobody seems to handle that case. What should be done to prevent it? I was thinking the XOR'ing was somehow preventing it.
|
|
|
|
 |
|
 |
Try to execute this code:
checked
{
Random r = new Random();
int i=0, t;
try
{
for (i = 0; i < 100000; i++)
{
byte[] bs1 = new byte[1000];
byte[] bs2 = new byte[1000];
r.NextBytes(bs1);
r.NextBytes(bs2);
t = Encoding.ASCII.GetString(bs1).GetHashCode() +
Encoding.ASCII.GetString(bs2).GetHashCode();
}
}
catch (OverflowException)
{
MessageBox.Show("Overflow on step " + i);
}
MessageBox.Show("Finished.");
}
and see on which step it will receive first overflow. I get it on first 5 steps almost always. Sometimes it goes until step 10. And this is just one specific class - String. You should write your generic classes which use GetHashCode() with assumption they return any valid value - from int.MinValue to int.MaxValue.
As to preventing it - if your code assumes overflow a normal case then you you just must be sure that overflows will be processed in "unchecked" mode no matter which are compiler options. So just use "unchecked{}" block to envelope all the overflow-causing code. Overflow itself in hash calculations is not a bad thing at all.
|
|
|
|
 |
|
 |
Ah, that is interesting, I've never used that before. I'll try and reproduce the overflow exception with a unit test and see how it reacts to the unchecked keyword. Thanks!
|
|
|
|
 |
|
 |
This was a pretty interesting adventure... I used the below code to reproduce the situation you are referring to:
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
byte[] bs1 = new byte[1000];
byte[] bs2 = new byte[1000];
r.NextBytes(bs1);
r.NextBytes(bs2);
TestClass t = new TestClass(Encoding.ASCII.GetString(bs1), Encoding.ASCII.GetString(bs2));
new TestClassKey(t).GetHashCode();
}
Originally I put a checked { } section around it and did not get an overflow. After a while I figured out the checked only applies to the code at the top level, i.e. the overflow code was inside a method so that checked keyword wasn't taking effect.So then I added the checked { } section around the actual hashcode calculation and got the overflow exception. Then for grins I added an unchecked section around the test - still got the overflow. Then I removed all the checked and unchecked blocks and added the "check for arithimetic overflow" option in the properties/build/ advanced page for ClassKeyDemo - and got the overflow exception.
Take aways:
Code in VS is unchecked by default - I didn't know this, I think I'll start turning it on by default. This is why it wasn't ever a problem for me.
A checked block still takes effect if an unchecked block is wrapped around it.
A checked block doesn't check the code inside a method called from inside the checked block, only code immediately within the check block.
I'll upload the latest files a bit later once I clean all this up
|
|
|
|
 |
|
 |
Paul B. wrote: Code in VS is unchecked by default - I didn't know this, I think I'll start turning it on by default.
After you turn it on, be prepared for bugs appearing where there were none before.
|
|
|
|
 |
|
 |
In fact, I just wrote one today to hold a translation table based on two strings to get an int.
private readonly System.Collections.Generic.Dictionary
<string,System.Collections.Generic.Dictionary<string,int>> ResponseText =
new System.Collections.Generic.Dictionary
<string,System.Collections.Generic.Dictionary<string,int>>() ;
No muss, no fuss, no hassle.
|
|
|
|
 |
|
 |
You would need to keep nesting Dictionary objects to support 3,4, etc. keys and the syntax is a bit clunky. You would also need to make changes in quite a few places when the key changed - But it's a good idea.
|
|
|
|
 |