Click here to Skip to main content
15,868,016 members
Articles / Programming Languages / C#

GetOrCreateValueDictionary

Rate me:
Please Sign up or sign in to vote.
4.90/5 (33 votes)
3 Feb 2018CPOL25 min read 41.9K   407   33   17
A dictionary implementation optimized for caches and the GetOrCreateValue method, supporting real parallelism while reading and avoiding some problems encountered in the ConcurrentDictionary

Code Update

After discussing with a friend about concurrent value generation, without duplicates, I created a new ConcurrentGetOrCreateValueDictionary.

If you are interested in it, download the new sample, or read a brief explanation of the changes at the end of the article.

Background

Last year I published the article Dictionary + Locking versus ConcurrentDictionary in which I explained why using normal dictionaries + some locking technique may be better than using the ConcurrentDictionary<TKey, TValue> and I finished by presenting a ThreadSafeDictionary<TKey, TValue> that uses the same lock-free reading technique used by the ConcurrentDictionary<TKey, TValue> while avoiding the problems of such dictionary and even having some more methods.

Well, at this moment I am focusing in writing small solutions, so people can simply copy one file and use the solution and the ThreadSafeDictionary<TKey, TValue> I presented doesn't fit that category. In fact, when I use such dictionary for caches I always use the GetOrCreateValue() and sometimes the setter, the Remove() or the Clear() but I don't use any of the other methods. So, why not write a class specialized to do only that, making it very small and optimized for such use case?

Well, that's what this article is all about.

Purpose

The GetOrCreateValue() is the method ideal for any kind of cache. You pass a key (for example, a record's Id) and, if the value (the record) is there, you get it directly. If not, a delegate is invoked that's capable of creating such value (loading it from the database), then it is cached so it doesn't need to be loaded again in a future call and finally it is returned to the user.

Note that even if I used a database record as example, this applies to any "function" that always generates the same value if it is provided with the same keys. So, any time consuming computation may benefit from such kind of solution (in fact, I always expected some kind of automatic cache for functional languages).

Possible implementations

If you need a GetOrCreateValue() method, you can actually look for an existing solution.

The Dictionary<TKey, TValue> class is actually the "default" solution. It doesn't come with a GetOrCreateValue() method but it is possible to write an extension method by using the TryGetValue() and Add() methods. Actually having such method as part of the dictionary could help improve performance, as the Add() method actually does a new search before adding the item, but as such search is very fast and we don't add an item twice, it is not a real problem. The real problem with this solution is that it is not thread-safe and, in many situations, that's exactly what we need.

As I presented in the other article, actually doing a full lock during the search (independently if we add the value or not) may be faster than using a ReaderWriterLockSlim, as such lock is not as slim as the name suggests. But a full lock really forbids two (or more) threads from accessing the dictionary at the same time. One thread must wait the other to finish. Also the locks clear the CPU caches, so the next calls (even unrelated to the dictionary) may need to access the main memory again, becoming slower (something extremely hard to measure).

This is where the ConcurrentDictionary<TKey, TValue> is a winner. Actually its method is named GetOrAdd(), but we can say it is the GetOrCreateValue with a different name. It really allows two or more threads to access its contents in parallel by using lock-free techniques, so when the contents are already in the dictionary no lock is needed and two or more threads can really access the same item in parallel.

There are two downsides with its implementation. The first one is that it uses a Thread.MemoryBarrier(), which completely clears the CPU caches and which is not really needed considering the .NET strong memory model (and, if it was needed, it is bugged as it is done in the wrong place). The second downside, and this is a big one, is that when the values aren't in the cache they are generated outside of a lock, which means that two or more threads can generate a value for the same key. Even if every thread receives the same instance in the end, the other values generated in parallel are lost. This can be very problematic if such values must be disposed or if they actually can't be created in parallel.

So, the solution was to create a lock-free dictionary (for reads) like the ConcurrentDictionary<TKey, TValue>, but one that does locks while creating the values. That's what I presented in the other article, but it was a big dictionary if we only need that cache functionality. In fact, because of the other functionalities it was not really optimized for such use case, doing an UpgradeableLock (which only affects the other methods) and then upgrading it. This means that a class that doesn't have those other methods could avoid that UpgradeableLock (as it is effectively working as a full lock) and go to a full/write lock directly.

The ThreadSafeGetOrCreateValueDictionary

The ThreadSafeGetOrCreateValueDictionary<TKey, TValue> is the new implementation optimized for real parallelism, CPU cache utilisation and to avoid creating two values for the same key. It does lock-free reads and when a value needs to be added it uses a full-lock (the normal Monitor lock) to create and add it, so it is impossible to create two values for the same key and lose one. And, by using the normal lock, it is also capable of being used recursively (this may happen if the delegate used to create the value needs to read another value of the dictionary, like in a recursive Fibonacci implementation).

It also has methods to Set(), Remove() and Clear() the dictionary, but nothing else. It is not intended to be seen as a "collection", it is intended to be used as a cache.

The code for this optimized dictionary is this:

C#
public sealed class ThreadSafeGetOrCreateValueDictionary<TKey, TValue>
{
  private sealed class _Node
  {
    internal _Node(int hashCode, _Node nextNode, TKey key, TValue value)
    {
      _hashCode = hashCode;
      _nextNode = nextNode;
      _key = key;
      _value = value;
    }

    internal readonly int _hashCode;
    internal _Node _nextNode;
    internal readonly TKey _key;
    internal readonly TValue _value;
  }

  private readonly object _lock = new object();
  private readonly Func<TKey, TValue> _creator;
  private readonly IEqualityComparer<TKey> _comparer;
  private _Node[] _baseNodes;
  private int _count;

  public ThreadSafeGetOrCreateValueDictionary(Func<TKey, TValue> creator):
    this(creator, 31, null)
  {
  }
  public ThreadSafeGetOrCreateValueDictionary(Func<TKey, TValue> creator, int capacity, IEqualityComparer<TKey> comparer)
  {
    if (creator == null)
      throw new ArgumentNullException("creator");

    capacity = DictionaryHelper.AdaptSize(capacity);
      
    if (comparer == null)
      comparer = EqualityComparer<TKey>.Default;

    _creator = creator;
    _comparer = comparer;
    _baseNodes = new _Node[capacity];
  }

  public TValue GetOrCreateValue(TKey key)
  {
    if (key == null)
      throw new ArgumentNullException("key");

    int hashCode = _comparer.GetHashCode(key) & int.MaxValue;
    var baseNodes = _baseNodes;
    int bucketIndex = hashCode % baseNodes.Length;

    var node = baseNodes[bucketIndex];
    while(node != null)
    {
      if (hashCode == node._hashCode)
        if (_comparer.Equals(key, node._key))
          return node._value;

      node = node._nextNode;
    }

    lock(_lock)
    {
      bucketIndex = hashCode % _baseNodes.Length;
      node = _baseNodes[bucketIndex];
      while(node != null)
      {
        if (hashCode == node._hashCode)
          if (_comparer.Equals(key, node._key))
            return node._value;

        node = node._nextNode;
      }

      if (_count >= _baseNodes.Length)
      {
        _Resize();
        bucketIndex = hashCode % _baseNodes.Length;
      }

      TValue value = _creator(key);
      var newNode = new _Node(hashCode, _baseNodes[bucketIndex], key, value);
      _baseNodes[bucketIndex] = newNode;
      _count++;
      return value;
    }
  }

  private void _Resize()
  {
      
    int newSize;
    checked
    {
      newSize = DictionaryHelper.AdaptSize(_baseNodes.Length * 2);
    }

    var newNodes = new _Node[newSize];

    foreach(var baseNode in _baseNodes)
    {
      var oldNode = baseNode;
      while(oldNode != null)
      {
        int hashCode = oldNode._hashCode;
        int bucketIndex = hashCode % newSize;
        var newNode = new _Node(hashCode, newNodes[bucketIndex], oldNode._key, oldNode._value);
        newNodes[bucketIndex] = newNode;

        oldNode = oldNode._nextNode;
      }
    }

    _baseNodes = newNodes;
  }

  // Note: If you download the sample you will see that it has
  // the methods Clear(), Set() and Remove(), which I am not presenting
  // here to make the code smaller.
}

And you use it like this:

C#
var dictionary = new ThreadSafeGetOrCreateValueDictionary<int, int>(DelegateToGenerateTheValue);

// And when needed:
int value = dictionary.GetOrCreateValue(57);

Why does the ThreadSafeGetOrCreateValueDictionary<TKey, TValue> receive the delegate in its constructor?

There are two reasons:

  1. It is expected that we always use the same logic to generate values, so giving it in the constructor guarantees that it is always the same logic;
  2. When we write anonymous lambdas the compiler generates a new delegate instance per call instead of creating a single delegate instance, but as we give it to the constructor, it will be created only once instead of being created at each call to the GetOrCreateValue() method.

In fact the first item is the important one. The second is only a good consequence, but those allocations are so fast that I can't consider it important.

Multi "column" keys

A problem I often see is related to multi-column keys. Normal dictionaries only support a single key and the solution I am presenting is in the same situation. I often see people writing specialized dictionaries that try to deal with those multiple columns, many times using a .NET dictionary where the key if the first column and the value is another dictionary, using the second column as the key for that inner dictionary, but the simplest and better performing solution is to use a KeyValuePair<TKey, TValue> or a Tuple<AnyNumberOfArguments> as the key. So, don't worry, even if you can only use a single object as the key, such object can be composed of other objects allowing you to have as many "columns" as needed as the key.

KeyValuePair<TKey, TValue> problem

For years I used the KeyValuePair<TKey, TValue> as a dictionary key when I needed a 2-column key. The worst is that I did performance tests and it worked great but, well, I was doing a bad thing for years.

I always believed the KeyValuePair<TKey, TValue> implemented the GetHashCode() and the Equals() methods, but it doesn't!

This means that only the first field is used to generate the hashcode (so, only the Key of such pair is used to generate the hash code). In my cases it was usually not a performance problem because I didn't usually had too many repetitions of the first column, but this actually means that every item that has the same value for the first column will end-up in the same node.

So, to solve this problem I created the Pair<TKey, TValue> struct. It is very similar to the KeyValuePair<TKey, TValue> but it implements the GetHashCode() and the equality methods (the one coming from the System.Object type and the one from the IEquatable<T> interface). So, this is the ideal type to use when you want two column keys.

The code for the struct is this:

C#
public struct Pair<TKey, TValue>:
  IEquatable<Pair<TKey, TValue>>
{
  public Pair(TKey key, TValue value)
  {
    _key = key;
    _value = value;
  }

  private readonly TKey _key;
  public TKey Key
  {
    get
    {
      return _key;
    }
  }

  private readonly TValue _value;
  public TValue Value
  {
    get
    {
      return _value;
    }
  }

  public override int GetHashCode()
  {
    int hashCode = 0;

    if (_key != null)
    {
      hashCode = _key.GetHashCode();
      hashCode = (hashCode << 16) | (hashCode >> 16);
    }

    if (_value != null)
      hashCode ^= _value.GetHashCode();

    return hashCode;
  }

  public override bool Equals(object obj)
  {
    if (obj is Pair<TKey, TValue>)
      return Equals((Pair<TKey, TValue>)obj);

    return false;
  }
  public bool Equals(Pair<TKey, TValue> other)
  {
    return
      EqualityComparer<TKey>.Default.Equals(_key, other._key) &&
      EqualityComparer<TValue>.Default.Equals(_value, other._value);
  }

  public override string ToString()
  {
    return string.Concat(_key, ", ", _value);
  }
}

And you can test the performance difference by using this sample:

C#
var stopwatch = new Stopwatch();
stopwatch.Start();
var hashsetOfPair = new HashSet<Pair<int, int>>();
for(int i=0; i<1000; i++)
  for(int j=0; j<1000; j++)
    hashsetOfPair.Add(new Pair<int,int>(i, j));

stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);

stopwatch.Reset();
stopwatch.Start();
var hashsetOfKeyValuePair = new HashSet<KeyValuePair<int, int>>();
for(int i=0; i<1000; i++)
  for(int j=0; j<1000; j++)
    hashsetOfKeyValuePair.Add(new KeyValuePair<int,int>(i, j));

stopwatch.Stop();
Console.WriteLine(stopwatch.Elapsed);

Actually, the version with Pair<int, int> takes less than a second to complete in my computer while the other version takes more than 40 seconds.

Lesson learned: Always check if a struct implements the GetHashCode() method. And this only remembers me that I always considered the GetHashCode() to be an error in the System.Object class, it would be better if it only existed in an interface (inside the IEquatable<T> would be a good place) that should be implemented by types that can be used as a hash-table key. If that was the case, a KeyValuePair<TKey, TValue> would not be a valid type for a dictionary key and I wouldn't have committed such a mistake (and I imagine many other people wouldn't either).

Static Caches?

Usually I use this kind of solution to create static caches.

I know, some people immediately see a problem there. Everything that's put in the cache will stay alive for the entire life of the application. So, is it right?

I can say that I usually use these caches to keep delegates to access field and properties in memory and, as there is a limit of types and properties I will use, I don't see a problem if they are kept alive for the entire application's lifetime. We can say the code of the application itself almost fits this category.

But for other caches it may be a problem. This is why I also created a weak version of it. The WeakGetOrCreateValueDictionary<TKey, TValue>.

The principle is the same. A class with only the methods Clear(), GetOrCreateValue(), Set() and Remove(). But the generated values are stored in WeakReferences, so they can be collected.

The thing that complicates this solution is the fact that when items are collected the dictionary is not notified. I thought about using one of my old solutions so it gets notified but I decided to do such analysis on the _Resize() method. Every time the dictionary must be resized it actually counts how many items are still alive (recalculating the _count) and, depending on the situation, it simply removes the collected nodes. So, there's a small risk of keeping a large array of WeakReferences in memory if you put lots of items in the dictionary (which are eventually collected) and never use your dictionary again but I really think it works for situations where it is going to be used, and WeakReferences are probably much smaller than the items that can still be collected.

I am not going to put its code here in the article's text, but you can download the sample and check the WeakGetOrCreateValueDictionary<TKey, TValue>.

The GCKeepAliver

There's an extra class that you will probably like to use with the WeakGetOrCreateValueDictionary<TKey, TValue>. The purpose of the GCKeepAliver is to keep items alive during the next collection (usually the next full collection). The reason of this is simple: It is not important how often you use a cached item, if at the moment of a garbage collection it is not in use, it will be collected.

Now let's be extremist. Imagine that you have an important object that takes 1 minute to be generated. After its generation you use it once per second, but you keep a strong reference to it for only a millisecond. If a collection happens in any of the other 999 milliseconds that exist in a second, it will be collected. I am sure you don't want that.

Even if I am using an extreme situation, this is something that happens when using weak-references and so, if you want to guarantee that recently used objects are kept alive for some time, call a GCKeepAliver.KeepAlive() giving such object and it will survive the next collection. After surviving the next collection, if it is used again, you will say that it will survive yet another collection. This means that those objects don't have a specific time to live, yet they will be kept in memory if they are used frequently and at the same time they will be collected if they aren't used between two garbage collections. The best is that by being collection related, not time related, they are allowed to be collected if the computer is running really short on memory and doing garbage collections frequently.

The WeakGetOrCreateValueDictionary doesn't call the GCKeepAliver.KeepAlive() by default on purpose, as you may want to use a different technique to keep items alive, if you want to do that at all.

Sample

The sample application is included only to show how to use the dictionary methods and to show that it is really fast, but it is not doing anything useful.

It is up to you to use it in your applications if you need a cache that's weak or one that can be accessed in parallel without having the problems of the ConcurrentDictionary<TKey, TValue> (if it is a problem to you or even if you are not sure).

Understanding the code - Advanced

If you stop reading the article here you can already use the dictionary to cache your data. This topic is more advanced than the rest of the article but it can be useful if you want to understand how it works.

To understand this code we must understand what is a hash code, how it can be used to index values and how lock-free programming works. So...

The indexing

Imagine that you have this index for a Phonebook:

A Page 57
B Page 50
C Page 20
D Page 15
...
P Page 57
...

I know it is not ordered and this is not how phonebooks are organised. Yet it is valid for my purpose. Also consider that inside a page the names aren't ordered either, but all the names for a given starting letter are always put in a single page.

So, if you want to look for my name (Paulo), will you look for my name at page 2? Or page 30? I am pretty sure that you will look directly at page 57. You may lose time if there are too many names in page 57 and you may need to read the entire page if my name is not there, only to be sure it is really not there. You will also see that there are names that start with A in this page, yet you don't need to lose time reading the page 50, 20, 15 etc.

So we can say that this is very similar to how hash codes work. A hash code doesn't guarantee the data will be ordered, and if there are many items with the same hash code (or that end-up in the same page [known as bucket]) we may spend too much time looking for the right one, effectively having to look for all the items on that "page" if the key is not there.

But differently from letters (26 if we use English letters) we have 32-bit hash codes, which gives us more than 4 billion values, so collisions are expected to be less common. But this is one of the reasons that make the hash code very important: If only the first letter was used to generate the hash codes of strings we will only generate 26 values for all the possible names, effectively having a bad indexing (don't worry, .NET strings have a good hash code).

The HashCode and the Bucket

I put the letter A and the letter P on the same page for a reason. When we use dictionaries those pages (buckets) are pre-allocated, but we don't expect to pre-allocate 4 billion buckets for those 32-bit values. In the implementation already presented I start with 31 buckets. That's why the bucketIndex is calculated as hashCode % _baseNodes.Length. That _baseNodes.Length is the number of buckets/pages that we have. So, values ranging from 0 to 30 will have a bucketIndex equalling their hash code. But a value of 31 % 31 gives 0, so items with a hash code of 31 will also be in page index 0 (and items with 62 or any multiple of 31 will also be in bucket/page 0).

For example, if our keys are ints (ints return themselves as the hash codes) that increase by 10 we will have the following result:

Value Bucket
0 0
10 10
20 20
30 30
40 9
50 19
60 29
70 8

I think it is already possible to see that even if we jump values from 10 to 10, the tendendy is to use all the buckets (you can see that we did not have a single collision yet). Eventually we will have to increase the number of buckets (after all, with 31 buckets if we have 100 items well distributed we expect to have 3 items per bucket, but increase the number of items to 10000 and we expect to have 300 items per bucket).

Increasing the number of buckets

Actually, every time the _count becomes bigger than the number of buckets we resize the buckets array. Such process actually requires all nodes to be visited to recalculate the node index by the new number of buckets.

It is not mandatory to recalculate the number of buckets when the _count becomes bigger than the buckets length. I do this because to very well distributed hash codes (for example, when we use int values that increase by one each time) this means that we will never have a bucket collision. But in fact in some situations it could be better to use different resize rules, like only resizing when the count is bigger than the 2 times the number of buckets, but this is something my code is not prepared to deal with.

The primes

When resizing the nodes I try to use prime numbers. This actually helps to create a good distribution of values when the hash codes are multiples of some value. For example, Imagine that instead of 31 we had 30 nodes.

If the hash codes were increasing 10 by 10, then the values 0, 30, 60 etc will end-up in the same bucket (the same applies to the values 10, 40, 70) while all the buckets that aren't multiple of 10 will be empty. This is bad.

Of course we will have problems when we have 31 buckets if all the values are multiples of 31, yet it is more common to have values that are multiples of something smaller (usually 2, 10 or 16). Also, even with hash codes multiples of 31 we will solve the problem when resizing, as the next prime will not be a multiple of 31.

But if you look at the code you will see that I am not limiting the results to primes as this will take too much time and goes against the purpose of having a fast dictionary. So I only try to search a value that's not divisible by the primes from 2 to 31.

Why is there that hashCode & int.MaxValue?

If we have a negative hashCode % someValue we finish with a negative number, which is invalid for our bucket index. We could use Math.Abs() or we can use the & int.MaxValue to lose the negative sign. I didn't test it again, but the last time I checked using that & operator was faster.

Comparing the hash codes before comparing the items

If you look at the code you will see that I compare the hash code of the node with the hash code of the key I am searching before comparing the key itself. I also store the hash code in the node itself.

This is done for performance reasons. We don't know how much time the key will take to calculate the hash code, so by storing it in the node we avoid a possible performance hit. Even if calculating the hash code is fast we avoid a virtual call.

We also don't know how much time the Equals() method actually takes to compare the objects. For an int we can say that first checking the hash code to then check the item is slower than comparing the value directly, but that's the fast case. If the equals is slow we don't want to lose time comparing objects that actually have a different hash code as they should be different.

Lock-free reading

If you arrived here you probably understand why the hash codes are important and why I did those "strange calculations" to get the bucketIndex and the number of buckets when resizing. But now we have the other part of the problem, the lock-free reading.

For example, if you look at the _Resize() method you will see that new nodes are created. If we didn't have lock-free readers we could simply change the _nextNode of our existing nodes instead of creating new ones (the garbage collector will love that). In fact, we could do something more similar to what the normal Dictionary<TKey, TValue> does by using the nodes as structs instead of classes, so all the nodes are stored in a single array and count as a single object, with is again better for the garbage collector.

But we actually can't change our objects by using "non-atomic" operations and even if changing the _nextNode reference is an atomic operation, we must be sure that all items that enter the new node are changed together and, as we can't do that, we create new objects that will not be seen until we do the:

C#
_baseNodes = newNodes;

Many people believe that we must mark our objects as volatile to do lock-free reads or that we should use those Volatile methods, but we don't. The .NET has a strong memory model that means it is impossible for a different thread to see the _baseNodes referencing a new array without seeing the right contents that were put inside it before doing the _baseNodes = newNodes;.

Also some people are worried that by not doing a volatile read we may not see an item that was just added by another thread. And that's true, but that's why we do a first read and, if we don't find an item, we do a lock and search for the item again. If the item was just added, our lock guarantees that we will see the latest version of the items in memory and also that no-one else will be able to add an item at the same time.

So, what do we need to do the lock-free read?

  • We must access the _baseNodes only once. That's why I put the _baseNodes into a local variable to then get its length and to access its contents;
  • We must guarantee that items don't change in a non-atomic manner. Actually the Set() method could change the _value of nodes directly for reference types and for primitives like int or smaller (long is safe only on 64-bit processors), but it can't for other value types, so we simply consider we don't have such a guarantee;
  • We must guarantee that changes to the _baseNodes contents are atomic. That's why our _Node type is a class (a reference type) and not a struct. .NET will never allocate an array of references with the wrong alignment (which would cause the reads and writes to be non-atomic);
  • We must also guarantee that when we resize the nodes we only publish the new nodes array when it is completely filled. That's why I keep the new array in a local variable until the resize is finished and only then I set the _baseNodes to the newNodes;
  • All changes that can be seen by the lock-free read must also be atomic. This affects all the methods that can actually change the contents of the dictionary, independently if those methods use locks (as a lock only protects other code that uses uses the same lock). Considering our methods we have such a guarantee because:

    • The Clear() method only replaces nodes with null (an atomic operation) and the _count is never seen by the lock-free reads;
    • The Remove() method only changes the base node or the _nextNode field of a previous node to reference the _nextNode of the removed one and replacing the value of a reference is an atomic operation;
    • The Set() method guarantees that the new item is completely filled before making it accessible;
    • And as those changes can be problematic related to each other (for example, between the search and the addition of a new node, another one could've been added), they all use a lock to protect from each other.

Thread.MemoryBarrier()

In the beginning of the article I said that the ConcurrentDictionary<TKey, TValue> has a downside of using the Thread.MemoryBarrier() and that it is in the wrong place.

To explain it in details, the GetOrAdd() method first uses the TryGetValue method. Such method gets the first item in the bucket, then calls the Thread.MemoryBarrier() and then processes the entire bucket.

The problems are:

  • The Thread.MemoryBarrier() does a full-fence. This means that all data that was modified by the actual thread (and consequently the CPU) must be sent to the main memory and also that local caches for read must be cleared. In .NET 4.5 we have the Volatile class that allows us to use half-fences (in this case, a Volatile.Read() would do a better job);
  • Even using the Volatile.Read() is not necessary by the .NET strong memory model. We are not reading two or more fields or accessing two or more array items in a specific order that must be enforced because those fields are changed by other threads in the same specific order. We are accessing an item and then the contents of such item and, if the item is becoming available for this thread/CPU for the first time, it is the responsibility of .NET to guarantee that its contents are all OK. If .NET doesn't guarantee this then we could actually read real garbage from memory, not only "uninitialized" objects, which goes against .NET being safe. So, it may be an implementation detail, but this is a guarantee of the Microsoft .NET implementation and I really believe all other implementations should give the same guarantee;
  • If you still believe my code is bugged by not using a barrier or the Volatile.Read(), then you should know that the ConcurrentDictionary<TKey, TValue> is bugged too. It actually only uses a single Thread.MemoryBarrier() before starting to process a bucket, not between one item and the other and, well, it is possible to change values for existing items. This actually can change the intermediate nodes and so, if such barrier was needed, it should be inside the loop, not outside it.

Finally, if you want, you can check this link http://joeduffyblog.com/2007/11/10/clr-20-memory-model/. To me the most important one is:

  • Rule 1: Data dependence among loads and stores is never violated.

This rule is actually everything that I need in my implementation as I don't try to change more than one field in any particular order. The only thing important is to publish a new object and have its contents correctly visible to other threads.

And if you still don't believe me, then change the code to use the Volatile.Read() and Volatile.Write() methods available in .NET 4.5 (or the Thread.VolatileRead()/Thread.VolatileWrite() available in previous versions) when reading the bucket array, one of its items or the _nextNode. Note that simply making the array as volatile will not guarantee that items will be accessed as volatile.

ConcurrentGetOrCreateValueDictionary

The ConcurrentDictionary has a problem. It allows the execution of a value generation by more than one thread, and then discards the result of those that finished later. Such double (triple or more) execution is not only bad for performance, it can have very bad consequences on resource utilization.

The ThreadSafeGetOrCreateValueDictionary doesn't have such a problem, but it doesn't allow the generation of values in parallel. They are lock-free for reads, but if you really need generation of values in parallel, you are stuck.

The ConcurrentGetOrCreateValueDictionary effectively solves both problems. It is based on the idea of allowing a lazy object to be generated, possibly in parallel and discarded. But it doesn't matter. Discarding a Lazy<T> object doesn't really hurt performance or resource consumption.

Then, we immediatelly get the value of such a lazy object, which guarantees that no more than one thread will generate the value. That is, different key will generate different lazy objects and are allowed to run them in parallel, but the same key will always use the same lazy object. My problem with this approach is that we kept a lazy object alive forever, and always needed an extra indirection.

So, to solve that issue, as soon as a value is obtained, it is set into the dictionary, replacing the lazy object. If many threads try to do this in parallel, there's no issue, as all of them will put the same value for the same key. The only thing that remains is that we should check if we already have a value or only a lazy object on reads. But, guess what, even the Lazy<T> has such a cast, so we are simply avoiding the indirection through the lazy object.

In the end, we only have an added cast for when values are added, and the creation/execution of the lazy object when doing things in parallel, with no chance of double expensive executions for the same key... and the read performance is still astonishing good when the values are already in the dictionary.

Take a look at the ConcurrentGetOrCreateValueDictionary (for reference-type values) and ConcurrentGetOrCreateStructValueDictionary (for value-type values... that is, structs).

Version History

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Microsoft
United States United States
I started to program computers when I was 11 years old, as a hobbyist, programming in AMOS Basic and Blitz Basic for Amiga.
At 12 I had my first try with assembler, but it was too difficult at the time. Then, in the same year, I learned C and, after learning C, I was finally able to learn assembler (for Motorola 680x0).
Not sure, but probably between 12 and 13, I started to learn C++. I always programmed "in an object oriented way", but using function pointers instead of virtual methods.

At 15 I started to learn Pascal at school and to use Delphi. At 16 I started my first internship (using Delphi). At 18 I started to work professionally using C++ and since then I've developed my programming skills as a professional developer in C++ and C#, generally creating libraries that help other developers do their work easier, faster and with less errors.

Want more info or simply want to contact me?
Take a look at: http://paulozemek.azurewebsites.net/
Or e-mail me at: paulozemek@outlook.com

Codeproject MVP 2012, 2015 & 2016
Microsoft MVP 2013-2014 (in October 2014 I started working at Microsoft, so I can't be a Microsoft MVP anymore).

Comments and Discussions

 
QuestionA question on GCKeepAliver and the processor-cache-consumer pattern Pin
wmjordan7-Aug-16 23:23
professionalwmjordan7-Aug-16 23:23 
AnswerRe: A question on GCKeepAliver and the processor-cache-consumer pattern Pin
Paulo Zemek8-Aug-16 5:15
mvaPaulo Zemek8-Aug-16 5:15 
PraiseRe: A question on GCKeepAliver and the processor-cache-consumer pattern Pin
wmjordan8-Aug-16 14:53
professionalwmjordan8-Aug-16 14:53 
GeneralRe: A question on GCKeepAliver and the processor-cache-consumer pattern Pin
Paulo Zemek8-Aug-16 16:52
mvaPaulo Zemek8-Aug-16 16:52 
GeneralMy vote of 5 Pin
Avelino Ferreira4-Aug-16 6:20
professionalAvelino Ferreira4-Aug-16 6:20 
GeneralRe: My vote of 5 Pin
Paulo Zemek4-Aug-16 6:41
mvaPaulo Zemek4-Aug-16 6:41 
QuestionMy vote of 5 Pin
TeoSh25-Sep-14 22:42
professionalTeoSh25-Sep-14 22:42 
AnswerRe: My vote of 5 Pin
Paulo Zemek26-Sep-14 3:47
mvaPaulo Zemek26-Sep-14 3:47 
GeneralMy vote of 5 Pin
XIA Gao28-Feb-14 17:19
XIA Gao28-Feb-14 17:19 
GeneralRe: My vote of 5 Pin
Paulo Zemek1-Mar-14 2:22
mvaPaulo Zemek1-Mar-14 2:22 
QuestionI could have done with this a while back. Pin
Pete O'Hanlon13-Feb-14 9:20
subeditorPete O'Hanlon13-Feb-14 9:20 
AnswerRe: I could have done with this a while back. Pin
Paulo Zemek13-Feb-14 15:46
mvaPaulo Zemek13-Feb-14 15:46 
QuestionGreat article about a dictionary implementation optimized for caches and the GetOrCreateValue method Pin
Volynsky Alex12-Feb-14 0:21
professionalVolynsky Alex12-Feb-14 0:21 
AnswerRe: Great article about a dictionary implementation optimized for caches and the GetOrCreateValue method Pin
Paulo Zemek12-Feb-14 1:33
mvaPaulo Zemek12-Feb-14 1:33 
Thanks!
GeneralRe: Great article about a dictionary implementation optimized for caches and the GetOrCreateValue method Pin
Volynsky Alex12-Feb-14 9:50
professionalVolynsky Alex12-Feb-14 9:50 
GeneralMy vote of 5 Pin
Florian Rappl11-Feb-14 21:04
professionalFlorian Rappl11-Feb-14 21:04 
GeneralRe: My vote of 5 Pin
Paulo Zemek12-Feb-14 1:33
mvaPaulo Zemek12-Feb-14 1:33 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.