Performance Tests Between Multiple "multi-key" Dictionary Implementations





5.00/5 (1 vote)
Performance tests between multiple "multi-key" Dictionary implementations
In relation to my post about the MultiKeyDictionary
object that I created (you can read about it on CodeProject here, or on my blog here), I created some performance tests to illustrate the difference between my method of creating this class, vs. some alternatives.
For instance, you could use Tuple<int, string, string>
, or even Dictionary<int, KeyValuePair<string, string>>
as the underlying dictionary
, iterating over the collections using LINQ. The functionality would be there, no doubt about that.
Unfortunately, when you use LINQ for much of anything, the performance immediately goes downhill. But just to make sure that I wasn't crazy, before I started writing this blog entry, I put together a performance comparison using the aforementioned objects for storage - of course, benchmarking them against my MultiKeyDictionary
.
The results are pretty cut & dry - the MultiKeyGenericDictionary
outperforms the other two methods by a factor of about 1000x for the Dictionary<int, KeyValuePair<string, string>
, and is about 400x faster than the Tuple<int, string, string>
. Also, once you factor in synchronizing massively multi-threaded access to the underlying objects, you quickly find out that the comparison between the methods are not even close.
Take a look at these code snippets on the access to the underlying objects - keep in mind that LINQ is incredibly expensive, and doesn't parallelize well when you have multiple threads running multiple LINQ queries on the same object, all trying to parallelize their queries. In fact, you're better off not using the AsParallel()
extension.
This is the equivalent of the MultiKeyDictionary Associate
method:
genericDictionaryLock.EnterUpgradeableReadLock();
try
{
if (genericDictionary.ContainsKey(i))
{
try
{
genericDictionaryLock.EnterWriteLock();
// Notice that you have to create a new KeyValuePair
// in order to associate the sub-key,
// since KVPs can only have their values set in the constructor.
genericDictionary[i] = new KeyValuePair<string,
string>(string.Format("Sub Key {0}", i), genericDictionary[i].Value);
}
finally
{
genericDictionaryLock.ExitWriteLock();
}
}
else
{
// Realistic error checking/handling would usually go here
throw new Exception();
}
}
finally
{
genericDictionaryLock.ExitUpgradeableReadLock();
}
And here is a lookup by sub-key:
genericDictionaryLock.EnterReadLock();
try
{
var val = (from d in genericDictionary.Values
where d.Key == string.Format("Sub Key {0}", i) select d.Value).Single();
Debug.Assert(val == string.Format("Value {0}", i));
}
finally
{
genericDictionaryLock.ExitReadLock();
}
What you should notice is that the lookups and assignments are very expensive, creating many more objects that you really need.
The Tuple
method is similar to the above snippets, and while it is faster than the Dictionary<int, KeyValuePair<string, string>>
, it is still nowhere near as fast as indexing directly into the collection to the desired value based on a hashed key, as is done in the MultiKeyDictionary
class.
As easy as it might be to do, the poor performance of the other options precludes actually using them in any application that requires speedy access to the underlying data. The following image is from the most average 5 runs out of about 20, but you can always run the code for yourself (see the bottom of this post).
Also, keep in mind that these tests were done with data sets of 100 records. When you start going to larger data sets, say around 1000000, the differences in timing appear to grow exponentially. For instance, using a data set of 1000 records, it takes the generic dictionary 140 seconds to complete, it takes the tuple 45 seconds, and the MultiKeyDictionary .12 seconds.
Times below are in ticks. (Click to enlarge)
The full code for my performance test is below... If you want to use it, you'll need to have my MultiKeyDictionary
class.
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Linq;
using Aron.Weiler;
using System.Collections.Generic;
namespace MultiKeyGenericDictionarySpike
{
class Program
{
// Instantiate classes with a value (string),
// primary key (int) and sub key (string)
static MultiKeyDictionary<int, string, string> mkDictionary =
new MultiKeyDictionary<int, string, string>();
static List<Tuple<string, string, string>> tuples =
new List<Tuple<string, string, string>>();
static ReaderWriterLockSlim tupleLock = new ReaderWriterLockSlim();
static Dictionary<int, KeyValuePair<string, string>> genericDictionary =
new Dictionary<int, KeyValuePair<string, string>>();
static ReaderWriterLockSlim genericDictionaryLock = new ReaderWriterLockSlim();
static CountdownEvent countDown;
static void Main(string[] args)
{
perfTest:
Console.WriteLine("------------------");
Console.WriteLine("");
Console.WriteLine("Performance Test Running...");
Console.WriteLine("");
// Slowest to fastest ;)
RunTuplePerfTest();
RunGenericDictionaryPerfTest();
RunMultiKeyDictionaryPerfTest();
Console.WriteLine("Press any key to quit, or P to run perf test again...");
if (Console.ReadKey().Key == ConsoleKey.P)
goto perfTest;
}
private static void RunTuplePerfTest()
{
Task[] tasks = SetUpPerfTask(TuplePerfTest);
RunTasks(tasks, "Tuple");
}
private static void RunMultiKeyDictionaryPerfTest()
{
Task[] tasks = SetUpPerfTask(MultiKeyDictionaryPerfTest);
RunTasks(tasks, "MK Dictionary");
}
private static void RunGenericDictionaryPerfTest()
{
Task[] tasks = SetUpPerfTask(GenericDictionaryPerfTest);
RunTasks(tasks, "Generic Dictionary");
}
private static void RunTasks(Task[] tasks, string name)
{
Stopwatch sw = ResetStopWatchAndCountdown();
sw.Start();
foreach (Task task in tasks)
{
task.Start();
}
countDown.Wait();
sw.Stop();
Console.WriteLine("Done in {0} ticks ({1} seconds) for {2}",
sw.ElapsedTicks, sw.Elapsed.TotalSeconds, name);
}
private static Task[] SetUpPerfTask(Action<int, int> perfTest)
{
int seed = 100;
int count = 99;
Task[] tasks = new Task[10];
for (int i = 0; i < 10; i++)
{
int temp = i;
tasks[i] = new Task(() => perfTest(seed * temp, count));
}
return tasks;
}
private static Stopwatch ResetStopWatchAndCountdown()
{
Stopwatch sw = new Stopwatch();
countDown = new CountdownEvent(10);
return sw;
}
private static void MultiKeyDictionaryPerfTest(int seed, int count)
{
Parallel.For(seed, seed + count, (i) =>
{
mkDictionary.Add(i, string.Format("Sub Key {0}", i),
string.Format("Value {0}", i));
});
Parallel.For(seed, seed + count, (i) =>
{
Debug.Assert(mkDictionary[i] == (string.Format("Value {0}", i)));
});
Parallel.For(seed, seed + count, (i) =>
{
mkDictionary.Associate(string.Format("Sub Key {0}", i), i);
});
Parallel.For(seed, seed + count, (i) =>
{
Debug.Assert(mkDictionary[string.Format("Sub Key {0}", i)] ==
(string.Format("Value {0}", i)));
});
Parallel.For(seed, seed + count, (i) =>
{
mkDictionary.Remove(i);
});
countDown.Signal();
}
private static void GenericDictionaryPerfTest(int seed, int count)
{
Parallel.For(seed, seed + count, (i) =>
{
genericDictionaryLock.EnterWriteLock();
try
{
genericDictionary.Add(i, new KeyValuePair<string, string>
(string.Format("Sub Key {0}", i), string.Format("Value {0}", i)));
}
finally
{
genericDictionaryLock.ExitWriteLock();
}
});
Parallel.For(seed, seed + count, (i) =>
{
// Have to create a new KVP each time
KeyValuePair<string, string> kvp = new KeyValuePair<string, string>
(string.Format("Sub Key {0}", i), string.Format("Value {0}", i));
genericDictionaryLock.EnterReadLock();
try
{
// == doesn't appear to work...
Debug.Assert(genericDictionary[i].Equals(kvp));
}
finally
{
genericDictionaryLock.ExitReadLock();
}
});
// Associate
Parallel.For(seed, seed + count, (i) =>
{
genericDictionaryLock.EnterUpgradeableReadLock();
try
{
if (genericDictionary.ContainsKey(i))
{
try
{
genericDictionaryLock.EnterWriteLock();
genericDictionary[i] = new KeyValuePair<string, string>
(string.Format("Sub Key {0}", i), genericDictionary[i].Value);
}
finally
{
genericDictionaryLock.ExitWriteLock();
}
}
else
{
// Realistic error checking/handling would usually go here
throw new Exception();
}
}
finally
{
genericDictionaryLock.ExitUpgradeableReadLock();
}
});
// Lookup by subkey
Parallel.For(seed, seed + count, (i) =>
{
genericDictionaryLock.EnterReadLock();
try
{
var val = (from d in genericDictionary.Values where d.Key ==
string.Format("Sub Key {0}", i) select d.Value).Single();
Debug.Assert(val == string.Format("Value {0}", i));
}
finally
{
genericDictionaryLock.ExitReadLock();
}
});
Parallel.For(seed, seed + count, (i) =>
{
genericDictionaryLock.EnterUpgradeableReadLock();
try
{
// Realistic error checking
if (genericDictionary.ContainsKey(i))
{
try
{
genericDictionaryLock.EnterWriteLock();
genericDictionary.Remove(i);
}
finally
{
genericDictionaryLock.ExitWriteLock();
}
}
else
{
throw new Exception();
}
}
finally
{
genericDictionaryLock.ExitUpgradeableReadLock();
}
});
countDown.Signal();
}
/// <summary>
/// For the purpose of actually checking the performance of this list,
/// I will have a string primary key, forcing me to use LINQ to look up my values.
/// </summary>
/// <param name="seed"></param>
/// <param name="count"></param>
private static void TuplePerfTest(int seed, int count)
{
// Add all of the primary keys, sub keys and values
Parallel.For(seed, seed + count, (i) =>
{
tupleLock.EnterWriteLock();
try
{
tuples.Add(new Tuple<string, string, string>(i.ToString(),
string.Format("Sub Key {0}", i), string.Format("Value {0}", i)));
}
finally
{
tupleLock.ExitWriteLock();
}
});
// Verify that the correct value exists for each item
Parallel.For(seed, seed + count, (i) =>
{
tupleLock.EnterReadLock();
try
{
Debug.Assert((from t in tuples where t.Item1 ==
string.Format("{0}", i) select t).Count() > 0);
}
finally
{
tupleLock.ExitReadLock();
}
});
// Re-associate the sub-key
Parallel.For(seed, seed + count, (i) =>
{
tupleLock.EnterUpgradeableReadLock();
try
{
// Can't just associate with this tuple, as it is readonly,
// have to create a new one.
// Separating the reading and writing here
Tuple<string, string, string> item = (from t in tuples
where t.Item1 == string.Format("{0}", i) select t).Single();
item = new Tuple<string, string, string>(item.Item1,
string.Format("Sub Key {0}", i), item.Item3);
try
{
tupleLock.EnterWriteLock();
tuples.Remove(item);
tuples.Add(item);
}
finally
{
tupleLock.ExitWriteLock();
}
}
finally
{
tupleLock.ExitUpgradeableReadLock();
}
});
// Verify that I can get the correct values using the subkey
// as the key into the tuple
Parallel.For(seed, seed + count, (i) =>
{
tupleLock.EnterReadLock();
try
{
Debug.Assert((from t in tuples where t.Item2 ==
string.Format("Sub Key {0}", i) select t).Count() > 0);
}
finally
{
tupleLock.ExitReadLock();
}
});
Parallel.For(seed, seed + count, (i) =>
{
tupleLock.EnterUpgradeableReadLock();
try
{
// Separating the reading and writing here
Tuple<string, string, string> item =
(from t in tuples where t.Item1 == string.Format("{0}", i)
select t).Single();
try
{
tupleLock.EnterWriteLock();
tuples.Remove(item);
}
finally
{
tupleLock.ExitWriteLock();
}
}
finally
{
tupleLock.ExitUpgradeableReadLock();
}
});
countDown.Signal();
}
}
}