12,396,095 members (65,379 online)
alternative version

56.9K views
111 bookmarked
Posted

# Understanding Generic Dictionary in-depth

, 3 Dec 2012 CPOL
 Rate this:
A deep-dive of what is going on under the hood.

## Introduction

Generic Dictionary is a great instrument to have in your toolset.

The Generic part is keeping us type-safe and helps avoid boxing/unboxing while the Dictionary part allows us to manage Key/Value pairs and access them easily.
It also allows us to add, remove and seek items in a constant time complexity - O(1) - that is, if you know how to use it properly.

Let's start with a simple example of adding items to a dictionary and see what the time complexity for each operation is:

```Dictionary<int,string> myDictionary = new Dictionary<int,string>();

myDictionary.Add(1, "Item no'1"); // O(1)
myDictionary.Add(2, "Item no'2"); // O(1)
myDictionary.Add(3, "Item no'3"); // O(1)
myDictionary.Add(4, "Item no'4"); // O(N)
myDictionary.Add(5, "item no'5"); // O(1)
myDictionary.Add(6, "Item no'6"); // O(1)
myDictionary.Add(7, "item no'7"); // O(1)
myDictionary.Add(8, "item no'8"); // O(N)
myDictionary.Add(9, "item no'9"); // O(1) ```

We can see that while adding the three first items we've got O(1) time complexity – that is what we have expected. Adding the fourth and eighth item has O(N) time complexity (where N represents the amount of items that are already exists in the dictionary).

In order to understand what happened here we need to understand how Generic Dictionary manages its items.

## Background

In order to provide O(1) time complexities in common add/remove/seek operations, Generic Dictionary is built using a hash table.

In short, Hash table is a data structure which consists on an array of "buckets" for storing the elements.The way Hash tables handle insertion of new items is by extracting a Hash code for each object, and using that hash code to determine in which bucket to place the item – usually by performing additional simple calculation to adjust the object's hash code to the size of the buckets array.

## Object.GetHashCode method

One of `System.Object`'s small group of methods is a virtual method called GetHashCode which has the following method signature:

```public virtual int GetHashCode()
```

As `System.Object `is the ultimate base class in .Net, all types (even Value Types that also inherits from System.Object) has built-in hash code generation functionality, which could also be overridden by sub-classes when needed.

This important method is being used by all hash-based collection while objects are being added, removed or accessed.

The default GetHashCode implementation provided by `System.Object` should be sufficient in most cases, but later on this article we will discuss where overriding GetHashCode method and creating your own hashing algorithm is a must.

The reason that the above figure describes a "simple" hash table, is that it does not support collisions.

Hash code in .Net (that being returned from the object's GetHashCode virtual method) are of type Int32 – which means that there are 2^32 possible values which can be returned by the method.

Does it is means that we cannot have more than 2^32 objects in our application? Of course not. In fact, even the default object's GetHashCode is not guaranteed to return unique values for different objects.

In addition, our buckets array is probably much smaller than that. In the figure above the buckets array is set to 30 items.

The obvious question here is - could it be that two items will be getting the same index in the buckets array? Sure. The pigeonhole principle states exactly that.

When two objects are getting the same index in the buckets array it means that we have a collision – we will get to see how .Net Dictionary handles collisions next.

Considering the example above, we've used the Dictionary's default constructor.
`Dictionary<int,string> myDictionary = new Dictionary<int,string>();`

When instantiating a Dictionary with default constructor an array of 3 items is being allocated.

Then, we are adding the first three items to the Dictionary.

```myDictionary.Add(1, "Item no'1"); // O(1)
myDictionary.Add(2, "Item no'2"); // O(1)
myDictionary.Add(3, "Item no'3"); // O(1)
```
After these instructions have been completed, our Dictionary inner-array of buckets is full.

Note - it means that the total number of items added to the dictionary is equal to the size of the array. It does not mean that all of the buckets array slots are occupied – as we seen above, there could be a collision that two items will be having the same index (after the hash code have been adjusted to the buckets array size – in this case using MOD operator).

Let's add another item.

`myDictionary.Add(4, "Item no'4"); // O(N)`

While trying to add the 4th item, the dictionary checks if the number of items is equal to the size of the array, in this case it does, so it and allocates new larger array. In this case, the new size of the array will be 7, and after adding couple of more items the dictionary will be resized again to 17 items.

Three? Seven? Seventeen? Exactly.
Dictionary implementation has an inner pre-calculated list of prime numbers that will be used for the array size.

`internal static readonly int[] primes = {3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919, 1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591, 17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437, 187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263, 1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};`

When there is a need to resize the array, the new size will be the next item in the prime numbers array (above) that is larger than the result of multiplying the old capacity by 2.

For example:

On the first three insertions of items the array size is 3 – next item will cause the Dictionary to resize to 7 by the following calculation:
3 * 2 = 6 –> next larger prime number in the list – 7.

After couple of items, when we will try to insert the 8th item, the Dictionary will be resized to:
7 * 2 = 14 –> next larger prime number in the list – 17.

Note: If will add more items than the maximum number in the table (i.e 7199369), the resize method will manually search the next prime number that is larger than twice the old size.

Note: The reason that the sizes are being doubled while resizing the array is to make the inner-hash table operations to have asymptotic complexity. The prime numbers are being used to support double-hashing.

Going back to the first example – passing 30 as the capacity argument to the Dictionary's constructor will make sure we will have O(1) time complexity while adding our items to the dictionary (up to 30 items in this case).

```Dictionary<int,string> myDictionary = new Dictionary<int,string>(30); // The real size will be 37 – using the prime numbers table

myDictionary.Add(1, "Item no'1"); // O(1)
myDictionary.Add(2, "Item no'2"); // O(1)
myDictionary.Add(3, "Item no'3"); // O(1)
myDictionary.Add(4, "Item no'4"); // O(1)
myDictionary.Add(5, "item no'5"); // O(1)
myDictionary.Add(6, "Item no'6"); // O(1)
myDictionary.Add(7, "item no'7"); // O(1)
myDictionary.Add(8, "item no'8"); // O(1)
myDictionary.Add(9, "item no'9"); // O(1)
//…
myDictionary.Add(30, "item no'30"); // O(1)```

Always set initial size while instantiating a dictionary – even if you only have rough estimation on how many items will be added to the dictionary. It will help avoid re-allocating and copying large arrays

Note: When cloning a dictionary, prefer using the relevant constructor that accepts source Dictionary instead of naively going through the old collection and adding it to the new one. That way there won't be any redundant allocations and array copying.

## Handling Collisions

There are several ways to handle collisions. Dictionary does that by using a method called chaining.

Dictionary actually has two arrays with the same number of items:

• Buckets array – stores the index of which the object is stored in the entries array

• Entries array – stores the actual items within a special data structure (if it's a reference type – the reference to the item will be stored)

Each item in the entries array is a struct with the following fields:

```private struct Entry
{
public int hashCode; // Entry (Key) hash code after being adjusted to the array size using MOD operator
public int next; // Index of which the next item in the collision chain resides in the entries array - if there is no collision in that entry the value is -1
public TKey key; // Entry generic Key
public TValue value; // Entry generic Value
}
```

Note that the size of the struct in memory will be determine by the types of provided Key and Value. For example, for `Dictionary<int, string>`, the size of each entry in the array will be 16 bytes on an x86 operating system (every `int `takes 4 bytes, `string `is actually a reference to the real string which also takes 4 bytes).

The above figure represents that state of the hash table after two objects have been added.
Both objects are getting the same index (4) when calculating the reminder of dividing the hash code with the array size.
Obj1 is stored in the first slot within the entries array.
Obj2 is stored in the second slot within the entries array.
Obj2 is holding Obj1 index.
When trying to get Obj1 (by using Obj1 key), the hash code is getting index 4 in the buckets array (after MOD calculation). In that bucket, there is the index to the last object that got the same index – in that case the value 1 is retrieved – which is Obj2 – the root of a linked list.
In order to find Obj1 (or to determine if it is already exists) a loop thought the linked list is being performed, while checking each node for equality.

Note: Interestingly, Generic Dictionary does not seems to support custom load factors as Hashtable does, and the load ratio will always remain 1:1 – which means that the Dictionary size cannot be less than that amount of items in it (even if not all buckets are being used, all entries array slots are used). For more information on HashTable load factor, please refer to the Wikipedia page.

## Customizing hashing and equality algorithms

Some of Dictionary's constructors contains `IEqualityComparer<T>` that is being used for both equality and hash code purposes.
`IEqualityComparer<T>` interface has two methods, Equals and GetHashCode.

In which cases we should use a custom IEqualityComparer? Consider the following scenarios:

• You are using a 3rd party type as a Dictionary key and you want to replace its GetHashCode implementation (for performance, distribution of data, etc…)
• You want a custom equality mechanism – e.g. case-insensitive string keys so ".txt" and ".TXT" are equal

When EqualityComparer is not being passed to the Dictionary's constructor, `EqualityComparer<TKey>.Default `is being used.

## Automatic distribution adaptation for string keys

Having more than one Dictionary key with the same bucket-array index can be troublesome for performance.
Accessing the Dictionary using this key can cause a lot of array traversals, which calls our expected O(1) time complexity into question.
For string keys only, there is a special optimization-
In order to make sure that each 'get' and 'add' operations will not go over more than 100 items for each bucket, a collision counter is being used.

If while traversing the array to find or add an item the collision counter goes over 100 (limit is hard-coded) and the `EqualityComparer```` ```is of type `EqualityComparer<string>.Default`, a new `EqualityComparer<string>`instance is being generated for alternative string hashing algorithm.

If such provider is found, the Dictionary will allocate new arrays and copy the content to the new arrays using the new hash code and equality provider.

This optimization might be useful for a scenario where somehow your string keys are not being distributed evenly, but could also lead to massive allocations and waste of CPU time for generating new hash codes of what could be a lot of items in the dictionary.

Note: there is another case where such provider can be generated even if the equality comparer is not `EqualityComparer<string>.Default`, by implementing the internal interface `IWellKnownStringEqualityComparer`. As this interface is used internally and not exposed externally it wasn't mentioned above.

## Using custom ValueTypes as Dictionary keys

All C# primitives inherits from System.ValueType (except `string` which is not a ValueType), and all of them overrides GetHashCode to implement their own hashing algorithm.

Hashing algorithms should consist on couple of important principles (actually there are more, you can find them on MSDN):

* Being fast!

* Well-distributed (in our case, distributed to be a valid 32 bit integers)

* If two objects are equal, GetHashCode should return the same value

* GetHashCode should return the same hash code consistently on the same object while it is not modified

For example, GetHashCode for `int` returns the number itself.
GetHashCode for `DateTime` returns the inner tick count, etc…

The reason all primitive types overrides GetHashCode is that the default implementation of ValueType.GetHashCode is relatively slow and based on going through all fields and XORing them, with special treatments to memory gaps between the fields and another special treatment for reference type fields… all of this for getting the object's hash code? Remember the first principle for good hashing algorithm? That's right, it should be fast.

That is the reason that when you are implementing your own custom ValueType that is intended to be used as a dictionary key, make sure to override GetHashCode with a good hashing algorithm that will suit your type's fields – we saw in the previous examples how important it is to have a well distributed hash code to minimize collisions and by that having better add, access and remove time complexities while working with dictionaries.

As the new type you have created is a custom ValueType, IEqualityComparer<T>.Default will get `ObjectEqualityComparer<T>` in return. When the Dictionary will try to compare the keys it will eventually invoke the `Equals(object obj) `method that will box the object. for more information about boxing/unboxing, review MSDN documentation.

For that reason, implementing  IEqualityComparer<T> for your custom value type should be a must.  (Thanks  Jonathan C Dickinson for highlighting this important issue)

Note: When implementing custom Hash code generation algorithms, make sure that the hash code is dependent upon fields that are immutable by their meaning – as changing one of the fields that was used to generate the hash code while the object is being used as dictionary key will make it inaccessible and could lead to duplicated items and unexpected results.

## Synchronization

It should go without saying, but it is important enough to mention it again – use proper synchronization while working with a Dictionary concurrently - Dictionary is not thread-safe for read and write operations simultaneously (only multiple readers are supported).

The most common phenomenon while working concurrently (reading and writing at the same time) is infinite loops (both for the readers and the writer). You can see the following post that described the symptoms.

If you need thread-safe dictionary, consider using `ConcurrentDictionary<K,V>` (Framework 4.0 and above).

## Conclusion

Generic Dictionaries in .Net are probably the best option to use for key/values scenarios and they are widely used almost in all application, but as we seen above, understanding the inner-working can help us get an improved and consistent performance.

## About the Author

 Architect Israel
I'm a Consultant at Sela Group, love planning distributed systems from the ground up, designing and implementing infrastructure components in a performance-oriented approach.
I have a real passion for performance tuning, troubleshooting, .Net internals and multi-threading patterns and an enthusiastic Windows Phone application developer (Author of Windows Phone Bazaar - www.wp-bazaar.com/).

I'm speaking at Sela Developer Practice conference about CLR 4.5 Improvments in May, come to say hello.

Subscribe to my blog:
http://blogs.microsoft.co.il/blogs/OfirMakmal/

## Comments and Discussions

 First PrevNext
 excellent article la_boaz22-Jun-15 20:24 la_boaz 22-Jun-15 20:24
 How entry is deleted from Entries array aldema16-Apr-15 14:07 aldema 16-Apr-15 14:07
 Re: How entry is deleted from Entries array Paulo Zemek16-Apr-15 14:56 Paulo Zemek 16-Apr-15 14:56
 My vote of 5 noshirwan13-Aug-14 3:59 noshirwan 13-Aug-14 3:59
 Great article have a doubt? Thava Rajan4-Jun-14 5:27 Thava Rajan 4-Jun-14 5:27
 Re: Great article have a doubt? Ofir Makmal5-Jun-14 23:20 Ofir Makmal 5-Jun-14 23:20
 Very good keshavgadia13-Feb-14 22:42 keshavgadia 13-Feb-14 22:42
 Removing elements from dictionary .d3vi1h3aRt7-Aug-13 21:59 .d3vi1h3aRt 7-Aug-13 21:59
 Re: Removing elements from dictionary Ofir Makmal7-Aug-13 22:48 Ofir Makmal 7-Aug-13 22:48
 My vote of 5 Maimonides21-Jul-13 22:52 Maimonides 21-Jul-13 22:52
 My vote of 5 Member 977515622-Apr-13 17:17 Member 9775156 22-Apr-13 17:17
 My vote of 5 peteSJ20-Feb-13 11:35 peteSJ 20-Feb-13 11:35
 My vote of 5 Paulo Zemek11-Feb-13 13:04 Paulo Zemek 11-Feb-13 13:04
 Re: My vote of 5 Ofir Makmal11-Feb-13 20:42 Ofir Makmal 11-Feb-13 20:42
 Extremely useful article stokehome3-Jan-13 2:30 stokehome 3-Jan-13 2:30
 My vote of 5 Jim Meadors18-Dec-12 19:35 Jim Meadors 18-Dec-12 19:35
 My vote of 5 Mihai MOGA14-Dec-12 5:10 Mihai MOGA 14-Dec-12 5:10
 My vote of 5 xcchcaptain13-Dec-12 19:27 xcchcaptain 13-Dec-12 19:27
 Thank you ，I have a question xcchcaptain13-Dec-12 19:21 xcchcaptain 13-Dec-12 19:21
 Re: Thank you ，I have a question Ofir Makmal14-Dec-12 6:04 Ofir Makmal 14-Dec-12 6:04
 Re: Thank you ，I have a question xcchcaptain14-Dec-12 17:53 xcchcaptain 14-Dec-12 17:53
 Very Nice CIDev12-Dec-12 5:10 CIDev 12-Dec-12 5:10
 Nice jibesh11-Dec-12 22:48 jibesh 11-Dec-12 22:48
 My vote of 4 blue_developer10-Dec-12 22:39 blue_developer 10-Dec-12 22:39
 My vote of 5 Savalia Manoj M10-Dec-12 20:53 Savalia Manoj M 10-Dec-12 20:53
 Last Visit: 31-Dec-99 18:00     Last Update: 24-Jul-16 8:28 Refresh 12 Next »

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.