Understanding Dictionaries in .NET

Paulo Zemek

4.87/5 (32 votes)

Oct 24, 2013

CPOL

15 min read

60189

This is a basic article explaining the Dictionary class in .NET.

The Very Basic

If we look at MSDN, the definition of the Dictionary<TKey, TValue> class is this:

"Represents a collection of keys and values."

But in my opinion such definition is far from useful. If I needed to define a dictionary in a single sentence, I would at least try to define it like this:

"A dictionary is an indexed collection that allows values to be found by user-defined keys."

Such definition, at least to me, let it clear the most important trait of a dictionary: It is indexed, which also suggests that it is fast. And it also says that the keys may be user-defined, suggesting they are not required to be part of the "value".

Yet, I think that trying to define such class in a single sentence is not enough, so I will try to point the biggest points:

It is somewhat a double collection, being a collection of keys and a collection of values and, at the end, a collection of key/value pairs;
It binds keys to values, even if they are unrelated, so it is useful to create a relationship between objects;
It is indexed by keys, which allows very fast searches using the keys, be it to know if a key is present, to find the appropriate value or to remove an entry (to remove a key/value pair).

Why is the Dictionary class important?

The best way I can explain the importance of a dictionary to someone that doesn't know the class yet (be a new developer or an experienced developer that was focused on another domain) is: Imagine that you have a collection with many items (I mean, many, many items, like thousands, millions or even more items). I will use as example a collection of products. Imagine that you want a method to return a product by Name. What do you do?

If we have a normal list or array in memory, we may end-up doing something like this:

public Product GetProductByName(string name)
{
  foreach(Product product in list)
    if (product.Name == name)
      return product;

  return null;
}

public Product GetProductByName(string name)
{
  for(int i=0; i<array.Length; i++)
  {
    Product product = array[i];

    if (product.Name == name)
      return product;
  }

  return null;
}

Did you see the problem with this approach?

If not, let me explain: What will happen if you search for a name that's not present in the list or array?

You will iterate the entire collection, only to discover that the item is not there. The bigger the list, the slower the search is. Now imagine that you are doing this during a large import and you don't want to stop if the item is not found, you will only log an error and continue. Items near the end are slow to find, invalid names are the slowest ones. The only products that are returned really fast are those on the beginning of the collection.

You may think it is not a problem, as you will be iterating in-memory collections, yet it will become slower when there are more items (and with millions of items it will be a problem). So, why not use a solution that's already built to be fast?

In such a situation, you will benefit from using a Dictionary, as it is indexed and, in many situations, checking if an item exists is an O(1) operation (that is, independently if there's only 1 item or 10 millions of items in the dictionary, the search for an item by key [like the Name] consumes time as if there was a single item).

Another presentation

I don't know if you are already interested in the Dictionary class or not. But I know that when I used Delphi, I was usually forced to use the TList class. Then I started to use C++ Builder at work and, thanks to templates (as in C++ we have templates, not generics, but the main idea is the same) I created the TypedList class, which had methods almost identical to the TList class, but rightly typed, and everyone loved.

That's, in fact, the same reason many people love the List<T> type in .NET. It is a typed list, that avoids boxing, avoids invalid item types to be added, makes errors more apparent (at compile time instead of at run-time, at the callers place instead of making exceptions to happen in a future use etc). And the dictionary has all of those benefits. And it is usually faster for searches!

The problem is: Many people don't know about the existence of dictionaries and, if they do, they think it is simply a "funny" list that puts keys and values together without seeing its indexing capability.

So, if you want a very fast solution to find your items, consider using dictionaries. If you have a common problem (like finding items by name), consider using a Dictionary<string, YourItemType> to store such collection.

Using a Dictionary - The Basic Methods

Probably the most important members of the Dictionary class are:

Add: Adds a key/value pair. The key must be unique and can't be null but there aren't any constraints for the value;
The indexer: You can use the indexer (the dictionary[someKey] construct) to get or to set a value of the dictionary. While setting a value, if another value with the same key already exists it is replaced (instead of throwing an exception like happens with the Add method) but when used for reads, if the key does not exist an exception is thrown so, in most cases, it is preferable to use the TryGetValue method to get a value for a key;
TryGetValue: The TryGetValue is the best way to check if an item exists and also get its value. It's only problem is the use of an out parameter which makes its use a little messy. So, if you want to get the value for a key, when you are not sure if there's a value for such key in the dictionary, you should do something like this:
```
Product product;
if (productsByNameDictionary.TryGetValue(productName, out product))
{
  // Here the product was got.
}
else
{
  // There's no product with such name.
  // In fact, the product value will have default(Product), which is null for reference types.
}
```
To me, the messy part of the TryGetValue is the fact that I usually don't add null values, so a return of null will be enough to tell me there's no such key in the dictionary instead of having to declare a local variable just before doing the call. So, for my situation I created an extension method like this:
```
public static TValue GetValueOrDefault<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, TKey key)
{
  TValue result;
  dictionary.TryGetValue(key, out result);
  return result;
}
```
And, as the TryGetValue will fill the result with default(TValue) if the item is not found (that is, null for reference types) and I never add null values, I can use it like this:
```
  Product product = productsDictionaryByName.GetValueOrDefault(productName);

  if (product != null)
  {
    // here a product was found.
  }
  else
  {
    // if it is appropriate to do something if an item was not found, you can do it here.
  }
```
ContainsKey: This method is useful if you want to check if a key was already added to the dictionary or not when you don't want to get the value. For example, you may be checking if such key was already added before calling Add() and you also don't want to replace the value if there's one already, so using the indexer is not appropriate. There are some people that use the ContainsKey for reading purposes, like this:
```
if (productsByNameDictionary.ContainsKey(productName))
  DoSomething(productsByNameDictionary[productName]);
```
But it is a bad practice to do this. Use the TryGetValue (or at least the GetValueOrDefault extension method) in place of such ContainsKey and indexer read. You will replace a double search by a single search when the item is there.

Using a Dictionary - The Planning

Maybe the biggest constraint in using the Dictionary class is knowing the search criteria at creation time. I just used the example of Products. It is easy to load a list of Products in memory and create a list with it. Later, if you want to search products by Name, Price or Id (considering that they usually have such an Id in the database) is easy. But how do you create a dictionary?

Well, if we try to transform a Product list into a dictionary, we may end-up with many of them. The products, for sure, are a collection. But each search criteria, if we consider as valid for indexing, must have its own Key type. So, to search for products by Name we need a Dictionary<string, Product>, to search for products by Id we need a Dictionary<long, Product> (considering Id to be a long). But, if you have a method that returns a collection and you want to return an indexed collection, which return type will you use? Will it be a dictionary indexed by name, by Id or by what?

I am not going to say it is an easy decision. In fact, using a dictionary is usually a matter of how many inserts and searches are expected. If you have many different inserts, and an amount of searches equivalent to the number of inserts, then a list is usually OK. But in many cases you add some items (be it 10, be it 10 millions) and then you search for them many, many, many times.

Using a collection of products as example (again), a method that returns such collection indexed by Id and by Name will, in fact, need to return another object, with should contain the appropriate dictionaries as members and should only publish methods to find the items by the appropriate fields (like a FindByName, FindById etc).

Using dictionaries to bind unrelated objects

I was using the example of a class Product which you probably expect to have an Id and a Name, so building a dictionary to search products by Id or by Name is only useful to speed-up searches, yet you will be capable of finding such Id or Name if you already have the Product instance.

But what about "adding properties" to existing objects? I am not talking about changing the source code of those objects to add new properties as I am considering those objects to be created elsewhere. The purpose is really to add new information to already created instances.

For example, imagine that you want to add help descriptions to all the controls that are inside a Window. Those controls are the standard one and you can't add new properties. Also you don't want to use the Tag property as it isn't typed or because it may be already used by something else. So, you can create a Dictionary<Control, string> and you can add the help string for each control, like doing:

helpStrings.Add(buttonOK, "This button closes the window and applies all the modifications");

And to get the help string, instead of doing something like:

string helpString = buttonOK.HelpString

As such property doesn't really exist, you can do:

string helpString = helpStrings.GetValueOrDefault(buttonOK);

Simple, isn't it?

Note: In this example, the Dictionary must be inside the Window/Form to work properly. If you use a static dictionary you will keep all controls in memory, even after they are disposed. If you want to "add properties" to objects without keeping them alive longer than needed, check the ConditionalWeakTable (which is a kind of dictionary that allows the keys to be collected).

A warning: LINQ and Dictionaries don't combine

I recently wrote an article about Easiness duality. Maybe there I didn't write a good situation to explain the problem, but I think that LINQ has such problem, at least when combined with Dictionaries.

LINQ is a generic purpose solution to deal with collections. It really helps in many situations by allowing us (the developers) to write less code to achieve the same result and, in many situations, with better speed or memory consuption than by writing "basic" solutions. For example, when filtering collections we can simply enumerate the results instead of creating new lists that represent the filtered result, so we avoid unnecessary memory allocations and if we want to stop at the first item that match the criteria we don't spend time populating an entire list.

But LINQ is terrible when combined with dictionaries. The truth is that LINQ usually iterates over the entire collection to do its job. It may create its own dictionaries (or similar structures) when doing joins, but it will not take advantage of an already existing dictionary (it simply can't, as a Dictionary<long, Product> doesn't tell if it is indexed by Id or by some other long property. So, if you only want to search for a single item in a dictionary (like verifying if a key exists) and you use the Keys.Where(someCondition).Single() you will iterate over all keys.

That's not a LINQ bug. That's not a Dictionary bug either. The Dictionary allows you to access it as a collection of key/value pairs or to access the keys or values as independent collections because you may have such a need. You may really want to iterate over all items, be it to debug, to serialize or the like.

Then LINQ is there to add capabilities to collections, so it ends up adding methods to the dictionary itself and to its Keys and Values properties. But if you use LINQ with a dictionary you will be losing the most important trait of a dictionary: Its indexation. So, except that you have a very specific need and that you know what you are doing, don't use LINQ over dictionaries.

Another warning: The TKey must have a consistent Equality

The Dictionary class allows you to define the Comparer to be used to compare keys for equality during its constructor but, by default, well, it uses the EqualityComparer<TKey>.Default, which will use the GetHashCode and Equals methods. This may be a little advanced as in most cases, like any key type that doesn't implement such methods or by using primitive types as key it will simply work.

But it is important to be aware of this, as if you use a Key type that has a mutable hashcode (for example, a HashCode based on the actual property values) or if you want to extend a specific instance of a type that overrides the Equals method you can have strange bugs. So, if you think your dictionary is "bugged", check the type used as Key.

Kinds of Dictionaries

The name Dictionary is related to the class itself and to the kind of solution it gives (that is, searching values that are indexed by keys).

There are many classes that provide IDictionary implementations and even dictionary like implementations that don't share an interface, like SortedList, ConcurrentDictionary, ImmutableDictionary, ConditionalWeakTable etc.

You can check every implementation if you want, but I consider the main points to be:

The Dictionary itself doesn't guarantee the items to be in any particular order (well, it originally keep the items in the order they are added, but if you remove an item from the middle, the next added item will take its place... so, the general idea is that it doesn't keep any order) and, even if it support multiple readers concurrently, it is not thread-safe, so it is up to you to do the synchronization between many threads if you need to;
The ConcurrentDictionary is the thread-safe equivalent to the Dictionary class and it has many multi-action methods (like AddOrUpdate, TryRemove, GetOrAdd etc). Personally I don't like how its GetOrAdd works, as two values may be created in parallel and only one will be used, so in most cases I prefer a normal dictionary with a lock. If you want to understand this particular case better, see my article: Dictionary + Locking versus ConcurrentDictionary;
The SortedList keeps all the entries ordered by their Keys. It is slower than the normal Dictionary class for searches and it can be much slower on inserts, as it may require to reposition all the items to accomodate the new one. Personally, I only tried to use it when I believed it was fast. Later, I stick with the normal Dictionary;
The ImmutableDictionary, well, there's an entire concept related to Immutable objects. They are thread-safe by default. There's the guarantee that if you store an immutable dictionary, no one else will change the contents of your dictionary (independently if you are the one that created the dictionary and gave it to some other method or the one that received such dictionary), but its usage is very different (each Add, Remove etc will return a new dictionary instead of changing the actual object). Also there are many speed and memory consuption differences, so I think it is better to check by yourself;
Finally, the ConditionalWeakTable is the best one if you want to "add" properties to objects at run-time, as it will only keep values alive as long as the keys (the original objects) are still alive. Considering garbage collections may happen at any moment, this class is also thread-safe. And considering its purpose is to add properties to instances, it doesn't use a equality comparer for the keys. Only instances are considered.

Some "places" that already use dictionaries

If you program ASP.NET applications you have probably used dictionaries even if they don't implement the IDictionary interface. The ViewState, Session, Application and even Cookies are dictionary like classes. I don't know their internal implementation, but that may be enough to understand how useful they are.

But if you don't program ASP.NET applications, don't worry, there are plenty of places that use dictionaries. In fact, I can't imagine how to write caching solutions without a dictionary. Caches are a great example of data that's "found by a key", be it a name or an Id. So, a dictionary will help by doing a fast search.

But if you don't care about these, what about Dependency Properties? In special, attached properties are properties that you can "add" to other objects... how do you think they are implemented internally? I am not saying you will see a real dictionary there, but the concept (that in fact comes from "hash tables") is there.

Finally, I will say that any good IoC (Inversion of Control) container will use a dictionary to decide which "constructor" to use to create instances of a requested type. So, it's time to explore these amazing classes.