|
Introduction
I work with performance-critical software that often requires some form of cache to function efficiently. However, there is also often a need to constrain memory usage, so I set out to create a high-performance cache with a maximum size.
Requirements
We want the cache size to be constrained, which means that when the cache is full, either we would want it to throw an exception, or something will have to be removed. For most scenarios, I imagine throwing an exception is not the best approach, since the developer then would have to handle this himself. So, I want the cache to automatically remove the least used item instead. This way, it would always be an up to date cache, elegantly phasing out the least useful data.
There is one other major concern we need to address before starting the implementation, namely performance. I always try to make the code I write as efficient as possible, and when designing general-purpose data structures, I find this even more important. For this cache, I would love it to perform in constant-time (O(1)), both on insert and lookup. Think that sounds too ambitious? Keep reading...
Data Structures
My first thought on choosing an underlying data structure for my cache was either an array or a hash table. An array is the most basic constant-time structure, and a hash table has a best case performance of constant time as well (worst-case performance in a hash table is insertion and lookup in linear time). But one question remains, how can I keep track of the usage data so that I can throw away the least recently used item when needed?
We need a structure that can keep track of the usage of each item. As an item is accessed, it would have to be moved to the top, so that it is marked as the most recently used. This means that the structure has to be flexible, unlike an array. A linked list is flexible, both in terms of memory allocations and in terms of logical access. You can easily move an item from one position to another, especially with a doubly linked list. So, if we represent our usage data with a linked list, the head can represent the most recently used item, and the tail represent the least recently used. When an item is accessed, it should be the new head. When the cache is full and the least recently used item has to go, we simply update our tail pointer to the previous item, set Next to null, and let the garbage collector clean the dishes.
Still, a linked list alone can't give us constant-time lookup for every value. In fact, the lookup time in a linked list is linear, and with an extremely large cache (how about 50 million items?), linear time is obviously unacceptable. This is why we need to combine the linked list with a constant-time lookup in another data structure. For a cache, it makes sense to use a hash table, since one would expect a cache to use some sort of key to identify each item.
Implementation
With the information we have so far, here is my initial proposal to the signature of my cache class;
public class Cache<TKey, TValue>
{
public Cache();
public Cache(int maxSize);
public Cache(int maxSize, int initialSize);
public void Add(TKey key, TValue value);
public bool ContainsKey(TKey key);
public TValue this[TKey key];
public bool TryGet(TKey key, out TValue value);
}
As you can see, this closely resembles parts of the IDictionary interface. I believe this is a good thing; when designing APIs, I always try to make them as familiar as possible, to let users intuitively understand how to use them. If you have ever used the standard Dictionary, you know that trying to access a non-existing key throws an exception. You also probably know that TryGet provides a convenient way of testing for existence and getting the item in one operation. I often use this pattern myself, so I wanted to be able to use my cache that way as well. You may also note the overloaded constructor, which provides a few convenient ways of creating the cache. The most efficient of these is the one with an initialSize, because then the internal hash table can be initialized with a correct size, eliminating the need for it to grow later.
When it comes to the combination of the two data structures, we may store the actual data (which may or may not be a pointer) in one, and a pointer in the other. In this case, I am going to put the data in the linked list, and a pointer to the linked list node in the hash table. That way, we can do a lookup in the hash table first, then update the linked list afterwards, before finally returning the data. So, the pseudocode for an add operation will be as follows:
if cache is full
remove tail
tail = tail.previous
tail.next = null
node = new node(key, value)
head.previous = node
node.next = head
head = node
add node to hash table
The beauty of the linked list in this scenario is that if we always promote the most recently used item to the front (head) of the linked list, the end (tail) of the list will always contain the least recently used item. The usage is updated on two occasions; when inserting an item, and when accessing an item. When inserting into a full cache, we also need to remember to cut the tail of the list in order to constrain the total size. Without further ado, let's jump into the implementation of the most important method, Add.
public void Add(TKey key, TValue value)
{
if (IsFull)
{
_hashtable.Remove(_tail.Key);
_tail = _tail.Previous;
if (_tail != null)
_tail.Next = null;
_count--;
}
CacheNode<TKey> node = new CacheNode<TKey>(key, value);
PromoteOrInsertNode(node);
_hashtable.Add(key, node);
_count++;
}
An important operation is the promotion that occurs when accessing a node. This operation is almost identical to what is done when inserting an item, so we combine the two and call this method when needed.
private void PromoteOrInsertNode(CacheNode<TKey>)
{
if (_head == null)
{
_head = _tail = node;
}
else
{
if (node.Previous != null)
node.Previous.Next = node.Next;
if (node.Next != null)
node.Next.Previous = node.Previous;
node.Next = _head;
_head.Previous = node;
_head = node;
}
}
Using the Code
Using the cache is very intuitive, and as I mentioned, it closely resembles the usage of a standard IDictionary:
Cache<int, string> cache = new Cache<int, string>(100, 100);
for (int i = 1; i <= 100; i++)
cache.Add(i, i.ToString());
string eight = cache[8];
cache.Add(Int32.MaxValue, "big number");
string one;
if (cache.TryGetValue(1, out one))
Console.WriteLine("This shouldn't happen...");
else
Console.WriteLine("1 is no longer there, as expected!");
Please bear in mind that the memory consumed by the cache varies greatly depending on what you put into it. The maxSize is the number of objects in the cache, and has nothing to do with the bytes of memory that is consumed.
Performance
As I briefly mentioned, and as eagerly debated in the forum, the performance of this cache can have great variations. The critical variable is the the hash performance — from O(n) in the worst case, to O(1) in the best case. The actual performance you will attain when using the cache is totally dependent on the hashing algorithm (the GetHashCode() implementation in the class of the value) and the distribution of keys. These variables are out of control for the cache, and cannot be predicted in this article. Therefore, I strongly encourage readers to test the cache thoroughly with actual data before implementing it in a performance-critical scenario.
Conclusion
I won't go into detail on the other methods, as they are quite straightforward. I have also implemented some unit tests for a few of the basic usage scenarios. I also ran a test on my machine which showed that insertion and lookup of one million integers/strings took less than 1 second, which I think is a reasonably good performance for a generic, self-organizing data structure. However, note that the relative speed is not linear to the number of items once you get to a certain amount, because then the OS will need to start paging the memory, slowing down the cache considerably.
This concludes my first CodeProject article, I hope you like it. Please don't hesitate to comment on bugs or possible improvements. Happy coding!
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 43 (Total in Forum: 43) (Refresh) | FirstPrevNext |
|
|
 |
|
|
For those interested in more information about this type of cache, perform a search for LRU (Least Recently Used). Such structures are used in everything from operating systems to history lists. They are perhaps most critical in paging systems, which are an important aspect of modern OSes and Database Systems.
Not to take away from the author of this article, but the linked list / hash table combination is the standard way of implementing a basic LRU. There are many variations and enhancements as well.
For instance, it has been observed that in many applications, Least Recently Used by itself is not optimal because occasionally accessed items are pushed to the top of the list along with commonly accessed items. For this reason, it has been demonstrated that introducing an element of frequency will improve most uses. Several algorithms have been developed for this purpose, such as LRU(k) and 2Q.
Best,
-- Nathan Allan [Alphora]
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
|
|
 |
|
|
Nice article, was lookung for something like this.
I extended the code a little: added ability to expire cached items when specified amount of time elapsed since an item was last accessed. Here are the changes:
[1] Constructor - added Expiration parameter:
private int _expirationTreshold; public Cache(int maxSize, int initialCapacity, int expirationTresholdSeconds) { MaxSize = maxSize; _hashtable = new Dictionary<TKey, CacheNode<TKey, TValue>>(initialCapacity); _expirationTreshold = expirationTresholdSeconds; }
[2] Added TimeStamp property to CacheNode class:
private DateTime _timeStamp; public DateTime TimeStamp { get { return _timeStamp; } set { _timeStamp = value; } }
public CacheNode(K key, V value) { _key = key; _value = value; _timeStamp = DateTime.Now; }
[3] When a node is promoted to the head, its time stamp is updated:
private void PromoteOrInsertNode(CacheNode<TKey, TValue> node) { if (_head == null) { _head = _tail = node; } else { if (node.Previous != null) node.Previous.Next = node.Next;
node.Next = _head; _head.Previous = node; _head = node; node.TimeStamp = DateTime.Now; } }
[4] Check for expired nodes when retrieving from cache:
public bool TryGetValue(TKey key, out TValue value) { CacheNode<TKey, TValue> node;
if (_hashtable.TryGetValue(key, out node)) { TimeSpan ts = DateTime.Now - node.TimeStamp;
if (ts.Seconds > _expirationTreshold) { Expire(node); value = default(TValue); return false; }
if (node.Previous != null) node.Previous.Next = node.Next;
if (node.Next != null) node.Next.Previous = node.Previous;
PromoteOrInsertNode(node); value = node.Value; return true; } else { value = default(TValue); return false; } }
[5] And finally, implementation of the Expire function:
private void Expire(CacheNode<TKey, TValue> node) { // System.Diagnostics.Debug.WriteLine("expiring " + node.Key.ToString()); while (node.Next != null) { Expire(node.Next); }
//remove the tail node _hashtable.Remove(node.Key); _tail = node.Previous; node.Previous.Next = null; _count--;
// System.Diagnostics.Debug.WriteLine("removed " + node.Key.ToString()); }
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
Thanks for liking the code! It's a good idea you have there repque, and it seems to be well implemented. In my humble opinion though, I believe it might have made for a better design if you had inherited the original Cache, or maybe made a new implementation altogether.
The reason I'm saying this is that now the Cache-class has two quite different ideas merged together in one version of the code, and it's no longer quite as clear and readable as before. The user of the Cache may easily be confused, is this a size-constrained or a time-constrained cache? If I'm to write a function which accepts this cache as an argument, how could I possibly predict it's behaviour? I like to have a clearly defined mission for a class, adhering to the principle of High Cohesion[^].
This being said, design is a quite subjective term, and you are of course entitled to disagree with me as much as you'd like...
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Alexander,
I agree completely. It should be a new implementation, to preserve the original design. I just wanted to share a possibility of extending the original functionality to provide an additional feature. My post wasn't meant to alter the original design but rather spark an idea and share it. I think it's a good thing. However, from the design perspective, you make a very valid point and I concur. Please continue the good work you've being doing.
Regards,
George
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Considering also the question of making the class support the iDictionary interface, would it be practical to construct the class so that element expiration (or deletion as a result of over-growth) would be suppressed temporarily, or would only occur when explicitly requested via "cleanup" call? Then any routine which expects a dictionary object could be used, with the knowledge/understanding that entries would not disappear until the routine was done.
To be sure, if the foreign code keeps a reference to the dictionary after it returns (e.g. for continuing display on a form), it could be surprised if items disappear from it, but that problem would exist with any dictionary implementation.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Good point, in cases where the behaviour of excessive elements needed to be more controlled, this could be a good solution. If the two mechanisms were merged like the original proposal by repque, it could even be overloaded so that the caller could specify which elements were purged; the least recently used, or the ones not accessed for a specific amount of time.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Alexander Mossin wrote: If the two mechanisms were merged like the original proposal by repque, it could even be overloaded so that the caller could specify which elements were purged; the least recently used, or the ones not accessed for a specific amount of time.
How easy is it in .net for a program to get a sense of the whether memory is 'tight'. I would think that the program using the cache might want to dynamically vary the cache characteristics so that things would remain in the cache longer when there was plenty of memory available (without excessive paging), and would be purged faster when memory was more tight.
Actually, what would be ideal if .net supported it (maybe in 4.0?) would be if the garbage collector could be informed of which cache entries were newer and which were older, so that entries would be grouped in pages by age. Under such a scenario, one could afford to keep many more old entries in cache; even if on a system with one gig of RAM there were a gig worth of entries that were never accessed, those entries would sit peaceably on the paging file without causing thrashing or other performance degradation.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
One of the purposes of managed code (.NET) is to hide the complexity of dealing with memory directly. For this reason, there are not many good API's in the framework which lets you do these things; the rationale may even be that you shouldn't do these things, but rather design the code so that it is not needed.
Some of the properties you specify are already a part of the garbage collector. For instance, garbage collection is run more often if the amount of available memory is low. This behaviour can be forced by the program by calling GC.AddMemoryPressure, in case you want increased GC activity even when there are lots of available memory. The age of objects are also taken into account by the GC, as it divides all referenced objects into generations 0, 1, or 2. Each time GC is run on a specific generation, 'survivors' are promoted to the next generation. GC is run most often on gen0, more rarely on gen1, and even more rarely on gen2.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Certainly there are many circumstances where letting the system's GC do its thing is the simplest and best approach. On the other hand, one of the general assumptions with a cache is that items in the cache have a finite 'value' (not zero, and not infinite). The system garbage collector has no clue how much a block of memory is 'worth', and thus can't possibly make optimal decisions.
BTW, I wonder why there's no way to push a chunk of memory into generation zero when performing a .dispose; while it's true that references to it may still exist (in which case it would persist into generation 1 or 2) I would think that most of the times .dispose is performed on an object it will be suitable for reclamation on the next GC.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
What happens with the _head.Next property when you get twice in a row one and the same node, sth. like:
string temp = cache[5]; temp = cache[5];
Take a look at the _head.Next property it becomes the same as the head one. This should be avoided.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Thanks for the heads up! 
I have fixed the bug and submitted an updated version of the code. It may take a day or two before the code download is actually updated.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
> For this cache, I would love it to perform in constant-time (O(1)), both on insert and lookup. [...] > a hash table with a good hash function gives you nearly constant time [...] _hashtable.Remove(_tail.Key); [...] _hashtable.Add(key, node); > I won't go into detail on the other methods, as they are quite straightforward.
Here _hashtable.Add() and _hashtable.Remove() belong to those other methods considered straightforward.
Please note, that the distribution of the keys to be stored in the hashtable is as unknown as a "good" hash function provided by a potential user. Therefore the hash function choosen permanently for use in the hash table may be "good" by chance. But there is no evidence, that the hash function will be "good" in the general case.
Please note, that hash tables are fast for inserting and retrieving keys, but not for deleting keys. But if the cache is full every insert operation is accompanied by a delete operation. There is no evidence, that the implementation of _hashtable.Remove() will be straightforward.
> There is even a test which shows that insertion and lookup of one million integers/strings > takes less than 1.5 seconds.
Please note, that the test provided by the author, works on a cache which is _not_ filled up to its maximum size. The test stops as soon as the critical limit is reached and no comparison is given to the performance in the case of a cache filled to its maximum size.
A minor issue is, that the author falls back on a foreign implementation for the hash table. There is no falsification given, that the implemenation may have a lower bound of Omega( log(CacheSize)) for each operation. This may hide a severe performance degradation, when indeed _using_ a hashtable.
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
I realize that the performance of the cache is greatly dependent on the hash function, and I'm not trying to hide that. However, it is up the user of this cache to specify the hash function in case of a custom class implementation, by overriding Object.GetHashCode(). No library can guarantee against a poor implementation of this, and no attempt is made to do so in the .NET framework either. Most classes in the framework however, already have reasonably good hash algorithm implementations.
Why do you say that hashtables are not fast for deleting items? I can't see any obvious reason for this, if the Dictionary is implemented over an underlying array, with a reasonable collision detection algorithm. The documentation[^]on MSDN states the following for Dictionary.Remove: "This method approaches an O(1) operation".
I completely agree that the simple performance test I provide is not good for a thorough measure of the cache's general performance. I will try to implement a more realistic test later, but for now I agree with you that it may be misleading to prove performance using only this test. If you or any of the other readers have suggestions for better tests, I'll appreciate if you post comments about it. And by all means, attempts to 'break' the performance of the cache are also appreciated, I'm not trying to convince anyone that my implementation is perfect.
P.S.
svv1999 wrote: Here _hashtable.Add() and _hashtable.Remove() belong to those other methods considered straightforward.
By saying the other methods are straightforward, I was aiming at my own method implementations, not those in the framework.
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
Alexander Mossin wrote: is up the user of this cache to specify the hash function in case of a custom class implementation
This is hiding complexity too. If you are able to describe an efficient algorithm any user can follow to establish a "good" hash function for his use case, then: why don't you make that algorithm part of your class? Answer: there is no such efficient algoritm in existence, at least there is none known to you.
Alexander Mossin wrote: I can't see any obvious reason for this, if the Dictionary is implemented over an underlying array, with a reasonable collision detection algorithm.
Not being able to see may be a reason to miss the obvious. Please note: hash functions distribute elements out of a large Universe Ul over a small universe Us. Because of this at least Omega(#Ul#/Us) elements of Us map on one element of Us. The same holds for the collision detecting and resolving routine: Omega(#Ul/#Us)Elements of Us will map on one element in Us in the first probe in case of a collsion.
Now constructing a bad example: Phase 1 Choose an element p of Us arbitrary. Let x be one of the elements of UL that hash to p. Let n be the maximum size of the cache. Choose at least b=Omega(n/2) Elements of Ul so that the hash function maps them onto b/2 elements of Us, and the collision detection maps the first probe on p. Phase 2 Insert x into the hash table. Insert all b elements from above in the hash table. Fill the hashtable randomly. Remove x from the hashtable. Analysis Removing x needs Omega(n) time, because the collision detection probed p for at least b/2 Elements. By construction of the example the collision detection couldn't place the colliding element into p--- and promoted it to another Element of Us. Because no information is hold, which elements probed p, all elements in the hashtable must be checked whether to undo the promotion over p. Because the hash table is fully occupied with Omega(n) elements, deleting x takes Omega(n) checks.
Alexander Mossin wrote: attempts to 'break' the performance of the cache are also appreciated
A sequence of n elements has about n! permutations. By being able to move an arbitrary element to the front of the sequence any permutation may be obtained. The remove-operation must handle any of the n! = Omega(n log(n)) permutations in a cache of size n. There is no operation that can help to reduce this complexity because all operations must be executed in constant time. The only way to reduce the time complexity is to distribute the work over several remove operations. This is called amortized analysis. ( not "approaching" as the MSDN documentation suggests) But n is the limit for the number of consecutive remove operations, because one cannot remove from an empty cache.
So the best one can reach for a seqeunce of n remove-operation is Omega(n log(n)) in total, giving at least Omega(log(n)) for a single remove operation.
Because in a full cache every add-operation is accompanied by a remove-operation one can as well spend O(log n) time for the add-operation.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
svv1999 wrote: If you are able to describe an efficient algorithm any user can follow to establish a "good" hash function for his use case, then: why don't you make that algorithm part of your class?
Of course, as you say, this is impossible. If I were to devise a hashing algorithm for my class, I would have to apply whatever principles, logic or math necessary to make it good. It would be my responsibility if I wanted to efficiently use my class in a hashtable. I could not later blame the hashtable implementation for poor performance caused by my terrible hashing algorithm.
svv1999 wrote: Not being able to see may be a reason to miss the obvious.
I'll ignore the sarcasm here, and stick to the topic...the cache I have implemented utilizes a Dictionary, with all its the benefits and limitations. If you put n objects into the cache, and they all hash to the same value, of course it would give a quite poor performance - so would any other structure using a hash table.
svv1999 wrote: Because in a full cache every add-operation is accompanied by a remove-operation one can as well spend O(log n) time for the add-operation.
Can being the keyword here - a worst-case scenario which is highly unlikely, and largely dependent on the hashing algorithm involved.
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
Alexander Mossin wrote: I could not later blame the hashtable implementation
One of the principles of software engeneering: if it fails, try to blame the innocent.
Alexander Mossin wrote: quite poor performance - so would any other structure using a hash table
Obvious solution: do not use a hash table, if you want O(1) time for the essential operations and do not know anything about the distribution of the keys.
Alexander Mossin wrote: worst-case scenario which is highly unlikely
How are you able to prove any probability if you do not know any about the distribution of the elements? And if you want O(1) in the average case only, then you are to define what such average cases are and provide a documentation for them as well as a description of the behaviour of your implementation if a non average case comes up.
Dont want me to put up an example in which you are to develop the "good" hash function for an unknown conflict resolution strategy, an unknown percentage of "unlikely" cases but an O(1) time bound for the add, refresh and remove operation in the "likely" cases.
I wouldn't do that, because even if I see you fail, that wouldn't change my status of beeing retarded.
|
| Sign In·View Thread·PermaLink | 1.00/5 (2 votes) |
|
|
|
 |
|
|
I agree that the article didn't clearly convey the great variations in performance that may occur, so I have updated the article in order to clarify that there is no guarantee for performance better than the worst case of O(n).
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
thanks to share this good thoughts.
maybe i can interest you in distributed caching for your application's. I'm developing a project on codeplex for distributed and replicated caching which is available under the following url: http://www.SharedCache.com[^]
regards, roni schuetz
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
What you just created is called a LRU cache. The implementation you created is very good for single threaded implementations; however in multi-threaded apps, locks are required during all reads and writes which turns the cache into a bottleneck.
Updating age flags in you node instead of physically moving nodes to first item in list is a thread safe mechanism which does not require locks on cache reads. Cache cleanup is then batched removing multiple items at a time. The cleanup operation tends to be slower than your method because node traversal is needed to locate older items. But it is still pretty quick.
I have a good reference implementation, but have been too lazy to write the code project article for it. If there is interest, I will do it.
|
| Sign In·View Thread·PermaLink | 4.71/5 (4 votes) |
|
|
|
 |
|
|
Yes, please.
Interest is there!
Alexander's article is great and it stirred the pot. Let's have a series of articles on this topic.
Regards, Radu
|
| Sign In·View Thread·PermaLink | 4.00/5 (1 vote) |
|
|
|
 |
|
|
+1
Would be great to link to it from this article as well...or at least in these comments so we can easily find it.
Thank you!
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
| | |