Click here to Skip to main content
Click here to Skip to main content

Looking up items in HashTable/Dictionary objects that have multiple keys

By , 1 May 2008
 

Introduction

Dictionary objects take a single key as a look up key. This class simplifies using a Dictionary when you have multiple keys, such as two strings and an int, etc. Use this class when just tacking all the keys into a single string and using that as a key makes you feel dirty.

Here is an example of how this class can be used, with a sample class called TestClass. Define a new class for the key and, in it, implement the abstract method GetKeyValues which will return an array of values to use as the key. In that method, just return the properties you want to use as the key for looking up the object in a Dictionary.

/// <summary>
/// Define test class to use key for
/// </summary>
public class TestClass
{
    public string Column1 = null;
    public string Column2 = null;
    public TestClass(string Column1, string Column2)
    {
        this.Column1 = Column1;
        this.Column2 = Column2;
    }
}

//define key to use for test class
public class TestClassKey : ClassKey<TestClass>
{
    //Init with object
    public TestClassKey(TestClass ClassReference) : base(ClassReference) { }
    //return list of column values we need to use as a key
    public override object[] GetKeyValues()
    {
        return new object[] { 
            ClassReference.Column1, 
            ClassReference.Column2
        };
    }
}

And, here is an example of a unit test confirming that it does indeed work:

TestClass model1 = new TestClass("abc", "def");
TestClass model2 = new TestClass("abc", "def");

Assert.AreEqual(new TestClassKey(model1), new TestClassKey(model2));
Assert.IsTrue(new TestClassKey(model1) == new TestClassKey(model2));

//change side of one and make sure not equal
model1.Column1 = "xyz";
model2.Column2 = "123";

Assert.AreNotEqual(new TestClassKey(model1), new TestClassKey(model2));
Assert.IsTrue(new TestClassKey(model1) != new TestClassKey(model2));

Using the code

Since this is for use primarily on Dictionary objects, I had to get very familiar with equality overriding and the GetHashCode method. This is something you want to do once and in one place as it is very, very easy to get wrong. I've been using this in production systems for more than a few months, and have added tests for various bugs I've seen - so I'm pretty confident this code works well. I'm posting it here because I'm curious what other people think, or if there is an easier way to do this. Here is the entire code for ClassKey.cs:

/// <summary>
/// Defines a common set of operations and functionality for creating concrete 
/// key classes which allow us to lookup items in a collection
/// using one or more of the properties in that collection.
/// </summary>
public abstract class ClassKey<T> where T : class
{
    /// <summary>
    /// The collection item referenced by this key
    /// </summary>
    public T ClassReference
    {
        get { return _CollectionItem; }
        set { _CollectionItem = value; }
    }

    private T _CollectionItem = null;

    /// <summary>
    /// Init empty if needed
    /// </summary>
    public ClassKey() { }

    /// <summary>
    /// Init with specific collection item
    /// </summary>
    /// <param name="CollectionItem"></param>
    public ClassKey(T CollectionItem)
    {
        this.ClassReference = CollectionItem;
    }

    /// <summary>
    /// Compare based on hash code
    /// </summary>
    /// <param name="obj"></param>
    /// <returns></returns>
    public override bool Equals(object obj)
    {
        if (obj is ClassKey<T>)
        {
            return (obj as ClassKey<T>).GetHashCode() == this.GetHashCode();
        }
        else
            return false; //definitely not equal
    }

    public static bool operator ==(ClassKey<T> p1, ClassKey<T> p2)
    {
        //if both null, then equal
        if ((object)p1 == null && (object)p2 == null) return true;
        //if one or other null, then not since above 
        //we guaranteed if here one is not null
        if ((object)p1 == null || (object)p2 == null) return false;
        //compare on fields
        return (p1.Equals(p2));
    }

    public static bool operator !=(ClassKey<T> p1, ClassKey<T> p2)
    {
        return !(p1 == p2);
    }

    //must override to get list of key values
    public abstract object[] GetKeyValues();

       /// <summary>
       /// Implement hash code function to specify 
       /// which columns will be used for the key
       /// without using reflection which may be a bit slow.
    /// </summary>
    /// <returns></returns>
    public override int GetHashCode()
    {
       object[] keyValues = GetKeyValues();
       //use co-prime numbers to salt the hashcode 
       //so same values in different order will 
       //not return as equal - see TestClassKeyXOROrderProblem
       //                      to reproduce problem 
       //http://www.msnewsgroups.net/group/microsoft
       //          .public.dotnet.languages.csharp/topic36405.aspx
       //http://directxinfo.blogspot.com/2007/06/gethashcode-in-net.html

       //first co-prime number
       int FinalHashCode = 17;
       //other co-prime number - ask me if I know what 
       //co-prime means, go ahead, ask me.
       int OtherCoPrimeNumber = 37;
       //get total hashcode to return
       if(keyValues != null)
           foreach (object keyValue in keyValues)
           {
               //can't get hash code if null
               if (keyValue != null)
               {
                   FinalHashCode = FinalHashCode * 
                         OtherCoPrimeNumber + keyValue.GetHashCode();
               }
           }
           return FinalHashCode;
    }
}

Points of interest

You can also inherit directly from ClassKey if you don't want to use a separate class for comparisons. It is also a handy class to use if you want to override the == or GetHashCode methods so you don't have to remember all the little details involved in overriding all the methods that need to be done any time you touch any one of them. I would be interested to hear from people if the portions dealing with prime and co-prime numbers can be done in a different or better way.

Struct implementation

I added another download file which has a struct only implementation of the key, after reading one of the reader comments. I dislike the implementation because the contents of the key have to be specified multiple times, i.e., each time it is used since there isn't inheritance with structs. That being said, it does appear to be about 15-20% faster for lookups, but on a million rows of lookups, that ends up being 1200ms vs. 1000ms, so I still prefer the class/inheritance method.

Performance measurements on various methods

I did the below by running the various unit tests for the different methods. I tried to keep all of them more or less the same to try and keep it fair. To measure the memory, I simply killed ProcessInvocation.exe (the TestDriven.NET test runner), and ran the perf test which does a million iterations 10 times and averages the results for initialzation and lookup. No matter which way you go, there is a tradeoff. I still prefer the ClassKey method. Though it takes a bit longer (200ms on a million rows), I think it makes bugs far less likely to appear, and is much more intuitive. The Struct implementation takes a bit longer to initialize, but is faster for lookups, but less maintainable. The Dictionary of Dictionaries is the fastest for lookups, but takes longer to initialize and uses twice as much memory as the ClassKey method - presumably because it is creating another dictionary object for each item in the list. I also consider it to be the worst syntax and maintainability-wise. The concatenated string key isn't too terrible for performance, so if you were lazy and not wanting to implement something like this, then I think that would be the way to go as long as you have a common method for constructing the key that can be reused (and not specified on every use). It also took a significantly larger amount of memory though. These numbers aren't guaranteed or perfect, just some back of the envelope measurements I'm using on my system to have some basis for comparison.

Class key

Initialization: 3,018ms
Lookups: 1,144ms
Memory - Never above 313MB

Struct

Initialization: 3,210ms
Lookups: 1,064ms
Memory - Never above 354MB

Dictionary of Dictionaries

Initialization: 4,313ms
Lookups: 919ms
Memory - Never above 555MB

Concatenated string key dictionary

Initialization: 3,305ms
Lookups: 1,039ms
Memory - Never above 460MB

Tuple method

Initialization: 3,810ms
Lookups: 3,241ms
Memory - Never above 316MB

History

  • 22-Apr-2008
    • Initial version.
    • Fought with formatting, got sick of dealing with the CodeProject editor.
  • 23-Apr-2008
    • Added new zip file with struct implementation and unit test to run 1 million times.
    • Added new zip file with straight dictionary implementations and all of the above.
  • 30-Apr-2008
    • Added code download for unchecked/tuple discussions.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Paul B.
United States United States
Member
I've been a software developer since 1996 and have enjoyed C# since 2003. I have a Bachelor's degree in Computer Science and for some reason, a Master's degree in Business Administration. I currently do software development contracting/consulting.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralEqualsmemberQuimbo12 Feb '10 - 10:23 
Hi there!
 
I think you made a major mistake in your article (if I didn't miss anything).
 
As the domain of the hashcodes is limited, you will unfortunately get different items which have the same key.
The dictionary usually handles that by looking up the list of items of that key and using Equals to get the unique item which was requested.
 
What you did is to just compare hashcodes in Equals - which may return true for items which are in fact not equal!
 
And thats the thing why a generic key class can't work - the logic in equals is really something which is dependent on the type of item.
Users would have to derive from your key class and implement a problem-specific Equals override.
GeneralRe: EqualsmemberPaul B.12 Feb '10 - 10:28 
You are the first to notice and are correct that the initial article post had that bug - see the comment "Better equals method" in the coprime comment thread for the solution to that. Since the interface defined will return us all objects we are using for the kye we can call their equals method to compare them.
GeneralCoprimemembercls2deg1 May '08 - 17:35 
//first co-prime number
int FinalHashCode = 17;
//other co-prime number - ask me if I know what
//co-prime means, go ahead, ask me.
int OtherCoPrimeNumber = 37;
 
Two integers are considered coprime if their greatest common divisor is 1. It takes two numbers to be coprime. So, your "first co-prime number" and "other co-prime number" comments are technically incorrect. Smile | :)
 
Incidentally, the integers don't have to be prime in order for them to be coprime. For example, 12 and 25 are coprime.
 
Best Regards,
Dustin Campbell
Developer Express, Inc.

GeneralRe: CoprimememberPaul B.2 May '08 - 3:29 
"Technically you are correct... the best kind of correct" - Futurama
 
Yeah, basically I didn't really understand that portion but in all my searches and reading that was the one thing that fixed the problem and passed all my tests. You're the first one to ask me, so thanks for the tip. Smile | :)
GeneralRe: Coprimemembercls2deg2 May '08 - 3:32 
Paul B. wrote:
"Technically you are correct... the best kind of correct" - Futurama

 
Except that I said you were technically incorrect. Smile | :)
 
FYI, the Wikipedia article on coprime is accurate.
 
Best Regards,
Dustin Campbell
Developer Express, Inc.

GeneralRe: Coprimememberfire_birdie31 Jan '09 - 1:26 
cls2deg wrote:
FYI, the Wikipedia article on coprime is accurate.

 
At least I didn't do all that reading for nothing...
 
Now the way I understand it, 1 is coprime of every other integer, so does that mean you could use 1 and 2 as the coprime numbers when calculating the hash value or am I missing something? Confused | :confused: I know it seems arb but int * 2 is the same as int << 1 which will probably not impact performance at all but does make me feel a little better when dealing with perfomance critical code.
 
Oh, I haven't checked it at all but I think there might be bug in the GetHashCode implementation presented in the article. If I'm right, a ClassKey with 2 keys, one of which is null, will generate the same hash as a second ClassKey that has the values swapped. It's caused by this if statement:
 
if (keyValue != null)
{
    FinalHashCode = FinalHashCode * 
        OtherCoPrimeNumber + keyValue.GetHashCode();
}
The FinalHashCode = FinalHashCode * OtherCoPrimeNumber part of the equation should happen regardless of the keyValue hash code. Replacing it with this should solve the problem:
 
FinalHashCode = FinalHashCode *
    OtherCoPrimeNumber + (keyValue == null ? 0 : keyValue.GetHashCode());
It was really late when I looked at this, so please correct me if I'm wrong. Another thing to keep in mind is that Microsoft says[^] GetHashCode is not allowed to throw exceptions, so there should probably be some overflow checking in there somewhere.
 
Thanks for the article, I was completely stumped on how to correctly implement GetHashCode before finding it.
GeneralRe: CoprimememberPaul B.1 Feb '09 - 6:38 
I'd need to look into the null thing because I'm pretty sure I had a unit test explictly for that, but perhaps it was the actual values swapped and there wasn't a null in the test. Seems like you are correct though.
 
The definition for co prime is that the numbers aren't divisible by any other number than 1, so I don't think that would work, but to tell you the truth I don't know. I used it from another source I had found and just trusted their math skills. Smile | :)
 
As far as the overflow, I did test that and it does occur, in fact it is ok for the overflow to occur. By default overflow checking is not on in VS projects so you never see it. I believe one of these comment threads in here gets into that, you just put an "unchecked" block around the code that could overflow if I remember correctly.
 
The bigger problem with this code (I think) is that I don't have a different Equals implementation, i.e. I tied together gethashcode and equals. When a hashcode collision occurs it needs to be able to tell if the two items are different. I haven't run into this being a problem yet but I've been pondering that for a while waiting for some strange dictionary behavior but haven't seen it yet. I have some plans to change the Equal to just call the object equals on all the items in the object array provided by the interface but haven't gotten around to it yet.
GeneralBetter Equals methodmemberPaul B.14 Feb '09 - 4:30 
Here is a better implementation os the Equals method since it should not use the hashcode to determine equality in case of collision - passes all existsing tests:
 
        
        /// <summary>
        /// Compare using Equals on all the sub keys
        /// </summary>
        /// <param name="obj"></param>
        /// <returns></returns>
        public override bool Equals(object obj)
        {
            ClassKey<T> typedObj = (obj as ClassKey<T>);
            if (typedObj != null)
            {
                object[] typedObjKeyValues = typedObj.GetKeyValues();
                object[] theseKeyValues = this.GetKeyValues();
 
                if (typedObjKeyValues == null && theseKeyValues == null) return true;
                if (typedObjKeyValues == null || theseKeyValues == null) return false;
                if (typedObjKeyValues.Length != theseKeyValues.Length) return false;
 
                //as soon as one not equal, we know to exit
                for (int i = 0; i < theseKeyValues.Length; i++)
                {
                    object oTheseKeyValue = theseKeyValues[i];
                    object oTypedObjKeyValue = typedObjKeyValues[i];
 
                    if (oTheseKeyValue == null && oTypedObjKeyValue == null) continue;
                    if (oTheseKeyValue == null || oTypedObjKeyValue == null) return false;
 
                    if (!oTheseKeyValue.Equals(oTypedObjKeyValue))
                        return false;
                }
 
                //if haven't existed yet they are equal
                return true;
            }
            else
                return false; //definitely not equal
        }

QuestionWhy not a real Tuple?memberJay R. Wren29 Apr '08 - 3:50 
Why not a real tuple? That is what I have used as a dictionary key for multi-keyed dictionaries, because... they aren't really multikeyed. they are keyed by a tuple. This is VERY natural for anyone with python experience.
 
http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2069[^]
 
http://diditwith.net/2008/04/03/ApplesAndOranges.aspx[^]
 
The tuple has other uses too, such as a type param to generic EventArgs class, or even just a list of tuples, or really anytime you don't want to explicitly create a new type.
 
Also, does AnonymousTypes in C# 3 do the same thing? couldn't you just key the dictionary by "new { A= firstkeyval, B= secondkeyval}"? The key of the dictionary can't be infered by a constructor, but a CreationMethod that wraps the constructor and first element creation might allow the compiler to infer the types.
 

I couldn't let the above message go as speculation, so I confirmed that the following test does pass:

 
[TestFixture]
public class ExperimentTests
{
[Test]
public void DictionaryWithAnonymousKeyTest()
{
var x = DictionaryHelper.Create(new { a = 1, b = 2 }, "one");
x.Add(new { a = 2, b = 3 }, "two");
Assert.AreEqual(x[new { a = 2, b = 3 }], "two");
Assert.AreEqual(x[new { a = 1, b = 2 }], "one");
}
}
public class DictionaryHelper
{
public static Dictionary<K, V> Create<K, V>(K key, V value)
{
Dictionary<K, V> retval = new Dictionary<K, V>();
retval.Add(key, value);
return retval;
}
public DictionaryHelper()
{

}
}
 

AnswerRe: Why not a real Tuple?memberPaul B.29 Apr '08 - 9:33 
I'm surprised that works - from looking at your code I would think it would do a reference comparison and not look at all the properties of the anonymous type. I'm definitely going to research that to see how that works.
 
That being said, it still suffers from the same problem as the other dictionary implementations, namely that you specify what the key is every time it is used. What I like about ClassKey is that the access method is in a single place so when it turns out one more key is needed for uniqueness that isn't a big operation to find and replace all those instances.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 1 May 2008
Article Copyright 2008 by Paul B.
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid