(untagged)

Implementing a super-fast, size-constrained generic cache

Alexander Mossin

0.00/5 (No votes)

23 May 2008

The article shows how to combine two of the most common data structures into a high-performance, self-organizing cache with a maximum size.

Download source - 8.11 KB

Introduction

I work with performance-critical software that often requires some form of cache to function efficiently. However, there is also often a need to constrain memory usage, so I set out to create a high-performance cache with a maximum size.

Requirements

We want the cache size to be constrained, which means that when the cache is full, either we would want it to throw an exception, or something will have to be removed. For most scenarios, I imagine throwing an exception is not the best approach, since the developer then would have to handle this himself. So, I want the cache to automatically remove the least used item instead. This way, it would always be an up to date cache, elegantly phasing out the least useful data.

There is one other major concern we need to address before starting the implementation, namely performance. I always try to make the code I write as efficient as possible, and when designing general-purpose data structures, I find this even more important. For this cache, I would love it to perform in constant-time (O(1)), both on insert and lookup. Think that sounds too ambitious? Keep reading...

Data Structures

My first thought on choosing an underlying data structure for my cache was either an array or a hash table. An array is the most basic constant-time structure, and a hash table has a best case performance of constant time as well (worst-case performance in a hash table is insertion and lookup in linear time). But one question remains, how can I keep track of the usage data so that I can throw away the least recently used item when needed?

We need a structure that can keep track of the usage of each item. As an item is accessed, it would have to be moved to the top, so that it is marked as the most recently used. This means that the structure has to be flexible, unlike an array. A linked list is flexible, both in terms of memory allocations and in terms of logical access. You can easily move an item from one position to another, especially with a doubly linked list. So, if we represent our usage data with a linked list, the head can represent the most recently used item, and the tail represent the least recently used. When an item is accessed, it should be the new head. When the cache is full and the least recently used item has to go, we simply update our tail pointer to the previous item, set Next to null, and let the garbage collector clean the dishes.

Still, a linked list alone can't give us constant-time lookup for every value. In fact, the lookup time in a linked list is linear, and with an extremely large cache (how about 50 million items?), linear time is obviously unacceptable. This is why we need to combine the linked list with a constant-time lookup in another data structure. For a cache, it makes sense to use a hash table, since one would expect a cache to use some sort of key to identify each item.

Implementation

With the information we have so far, here is my initial proposal to the signature of my cache class;

public class Cache<TKey, TValue>
{
    public Cache();
    public Cache(int maxSize);
    public Cache(int maxSize, int initialSize);
    public void Add(TKey key, TValue value);
    public bool ContainsKey(TKey key);
    public TValue this[TKey key];
    public bool TryGet(TKey key, out TValue value);
}

As you can see, this closely resembles parts of the IDictionary interface. I believe this is a good thing; when designing APIs, I always try to make them as familiar as possible, to let users intuitively understand how to use them. If you have ever used the standard Dictionary, you know that trying to access a non-existing key throws an exception. You also probably know that TryGet provides a convenient way of testing for existence and getting the item in one operation. I often use this pattern myself, so I wanted to be able to use my cache that way as well. You may also note the overloaded constructor, which provides a few convenient ways of creating the cache. The most efficient of these is the one with an initialSize, because then the internal hash table can be initialized with a correct size, eliminating the need for it to grow later.

When it comes to the combination of the two data structures, we may store the actual data (which may or may not be a pointer) in one, and a pointer in the other. In this case, I am going to put the data in the linked list, and a pointer to the linked list node in the hash table. That way, we can do a lookup in the hash table first, then update the linked list afterwards, before finally returning the data. So, the pseudocode for an add operation will be as follows:

if cache is full
    remove tail
    tail = tail.previous
    tail.next = null

node = new node(key, value)
head.previous = node
node.next = head
head = node
add node to hash table

The beauty of the linked list in this scenario is that if we always promote the most recently used item to the front (head) of the linked list, the end (tail) of the list will always contain the least recently used item. The usage is updated on two occasions; when inserting an item, and when accessing an item. When inserting into a full cache, we also need to remember to cut the tail of the list in order to constrain the total size. Without further ado, let's jump into the implementation of the most important method, Add.

public void Add(TKey key, TValue value)
{
    if (IsFull)
    {
        //Remove the least used node - the tail
        _hashtable.Remove(_tail.Key);
        
        //Move the tail pointer
        _tail = _tail.Previous;
        
        //Clear the new tail's Next-pointer to allow GC
        if (_tail != null) 
            _tail.Next = null;
        
        //Maintain the total count
        _count--;
    }

    //Create the new node, and put it at the head of the list
    CacheNode<TKey> node = new CacheNode<TKey>(key, value);
    PromoteOrInsertNode(node);
    _hashtable.Add(key, node);
    _count++;
}

An important operation is the promotion that occurs when accessing a node. This operation is almost identical to what is done when inserting an item, so we combine the two and call this method when needed.

private void PromoteOrInsertNode(CacheNode<TKey>)
{
    if (_head == null)
    {
        _head = _tail = node;
    }
    else
    {
        if (node.Previous != null)
            node.Previous.Next = node.Next;

        //bugfix - courtesy of zavitax
        if (node.Next != null)
            node.Next.Previous = node.Previous;

        node.Next = _head;
        _head.Previous = node;
        _head = node;
    }
}

Using the Code

Using the cache is very intuitive, and as I mentioned, it closely resembles the usage of a standard IDictionary:

//Create a cache with both initial size and maximum size of 100.
Cache<int, string> cache = new Cache<int, string>(100, 100);

for (int i = 1; i <= 100; i++)
    cache.Add(i, i.ToString());

//Retrieve one of the items - this will now be the most recently used    
string eight = cache[8];

//The cache was full, now the least recently used item will disappear.
cache.Add(Int32.MaxValue, "big number");

string one;

if (cache.TryGetValue(1, out one))
    Console.WriteLine("This shouldn't happen...");
else
    Console.WriteLine("1 is no longer there, as expected!");

Please bear in mind that the memory consumed by the cache varies greatly depending on what you put into it. The maxSize is the number of objects in the cache, and has nothing to do with the bytes of memory that is consumed.

Performance

As I briefly mentioned, and as eagerly debated in the forum, the performance of this cache can have great variations. The critical variable is the the hash performance — from O(n) in the worst case, to O(1) in the best case. The actual performance you will attain when using the cache is totally dependent on the hashing algorithm (the GetHashCode() implementation in the class of the value) and the distribution of keys. These variables are out of control for the cache, and cannot be predicted in this article. Therefore, I strongly encourage readers to test the cache thoroughly with actual data before implementing it in a performance-critical scenario.

Conclusion

I won't go into detail on the other methods, as they are quite straightforward. I have also implemented some unit tests for a few of the basic usage scenarios. I also ran a test on my machine which showed that insertion and lookup of one million integers/strings took less than 1 second, which I think is a reasonably good performance for a generic, self-organizing data structure. However, note that the relative speed is not linear to the number of items once you get to a certain amount, because then the OS will need to start paging the memory, slowing down the cache considerably.

This concludes my first CodeProject article, I hope you like it. Please don't hesitate to comment on bugs or possible improvements. Happy coding!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here