TagCache - A .NET cache that allows you to tag (aka label) items and invalidate based on tags





0/5 (0 vote)
Explains the usage and inner working of TagCache.
Introduction
TagCache is a full fledged, high performance cache which allows associating values with tags (a.k.a. labels). Individual tags or a combination of tags can be used to invalidate all cache items that match the tag specification. The highlight here is that the tag invalidation operation is ~O(1). For the latest information, visit project home.
Background
At work I come across lots of scenarios where a single change (e.g., hierarchical data) would require a bunch of cache items to be invalidated (e.g., a whole hierarchy). I couldn't find any free .NET cache library that could do this efficiently. The sub optimal approach is to iterate over the cache entries and invalidate those that match the criterion. This vacuum led to the creation of TagCache which is a full fledged cache implementation that supports annotating cache items with tags (a.k.a. labels) and enables efficient invalidation using tags. In this article, I explain the usage and the implementation details of TagCache.
Example usage
Let us directly jump into an example usage to get a feel of caching with tags.
TagCache tagCache = new TagCache();
Action fillCacheItems = delegate
{
tagCache.Set("honda", new object(),
new List<string> { "Vehicle", "Car", "Economy" });
tagCache.Set("lexus", new object(),
new List<string> { "Vehicle", "Car", "Luxury" });
tagCache.Set("harley", new object(),
new List<string> { "Vehicle", "Bike", "Luxury" });
tagCache.Set("yamaha", new object(),
new List<string> { "Vehicle", "Bike", "Economy" });
};
fillCacheItems();
tagCache.Invalidate("Bike"); // invalidates all bikes
fillCacheItems();
tagCache.Invalidate("Luxury"); // invalidates all luxury vehicles
fillCacheItems();
tagCache.Invalidate("Vehicle"); // invalidates all vehicles
fillCacheItems();
tagCache.Invalidate(new List<string> { "Car", "Luxury" });
// invalidates all luxury cars
fillCacheItems();
tagCache.Invalidate(new List<string> { "Bike", "Economy" });
// invalidates all economy bikes
fillCacheItems();
tagCache.Invalidate(new List<List<string>> {
new List<string> { "Bike", "Luxury" },
new List<string> { "Car", "Economy" }
}); // invalidates all luxury bikes, economy cars
Sample scenario
Imagine a system where we are dealing with a hierarchy of configuration information. This is similar to how web.config works in the case of ASP.NET where the hierarchy is typically a tree structure that descends from the root web.config all the way down to the sub folder level web.config.
However, our system is much simpler than ASP.NET, and we don't need to unload/reload the app-domain during configuration changes, we'll just have to re-compute the new configuration and move on. Also, we'll cache the computed configuration for performance reasons.
Now, what happens when a configuration update occurs while the system is in operation and the configuration values have been cached? Ideally, we expect the cached values corresponding to the node of change and all its descendents to be invalidated. With existing caching systems, the only approach would be to iterate through all the descendents and invalidate their cached configuration one by one, which is cumbersome and very inefficient.
Enter TagCache... you'll be able to invalidate the whole bunch of items in one go and in an efficient manner. You achieve this by associating each configuration value with a list of tags. The list of tags map to the list of nodes in the ancestral hierarchy of the node whose configuration has to be cached. So, the root level configuration will have only one tag - that of itself. The configuration of an Nth level node will have N tags associated to it. So, when a configuration of node K changes, you invalidate tag K which in turn would ensure that the cached configuration values of node K and all its descendents are removed from the cache!
The code for the above scenario follows. You'll find a fully working sample application attached. The code below is self explanatory...
static TagCache _TagCache = new TagCache();
static List<int> GetNodeAncestorList(int nodeId)
{
List<int> list = new List<int>();
while (nodeId > 0)
{
list.Add(nodeId);
nodeId = GetParentNode(nodeId);
}
return list;
}
static NodeConfig GetMergedNodeConfig(int nodeId)
{
// This method computes and caches
// the config (and associates with tags)
if (nodeId == 0)
{
return null; // no config available
}
string cacheKey = String.Format("config-{0}", nodeId);
NodeConfig mergedNodeConfig = (NodeConfig)_TagCache.Get(cacheKey);
if (mergedNodeConfig == null)
{
NodeConfig mergedParentConfig =
GetMergedNodeConfig(GetParentNode(nodeId));
NodeConfig nonMergedNodeConfig = ReadNodeConfigFromXml(nodeId);
mergedNodeConfig =
nonMergedNodeConfig.MergeWith(mergedParentConfig);
// create a tag list with tags corresponding
// to this node and all its ancestors
// basically a tag is like a dependency indicator;
// in this case this cached item has to be invalided
// when any of its ancestors change
List<string> tagList =
GetNodeAncestorList(nodeId).ConvertAll(
(nid) => String.Format("node-{0}", nid));
_TagCache.Set(cacheKey, mergedNodeConfig, tagList);
}
return mergedNodeConfig;
}
static void UpdateNodeConfig(int nodeId)
{
// some update logic
// invalidate the tag corresponding to this node
// note that this will invalidate cached config
// of this node and its descendent nodes at one go!
// the ancestor node configs or node configs
// in other hierarchies won't be affected
_TagCache.Invalidate(String.Format("node-{0}", nodeId));
}
Implementation details
Before getting into the implementation details, I would recommend that you download the sample application and dig through the code a bit.
Layered cache
Point to note is that TagCache is a full fledge cache which supports most of the cache functionalities like simple key value storage,
absolute and sliding time expiry, scavenging etc. I could have built tagging functionality upon any existing cache library but I chose
to implement the cache myself. One reason is that it is so much fun to do it and the other is that it helped me to leverage the
ConcurrentDictionary
to implement a high performance cache with minimal locks (many cache libraries I checked don't yet use ConcurrentDictionary
).
The classes forming the layered design are listed below...
CacheStore
- basic key value map that usesConcurrentDictionary
ExpirableCacheStore
- built uponCacheStore
and supports expiration of stored itemsScavengedExpirableCacheStore
- built uponExpirableCacheStore
and supports scavenging of stored items when a configurable limit is reachedTaggedScavengedExpirableCacheStore
- adds tagging functionality toScavengedExpirableCacheStore
CacheItem
- a common object used by all the layers and holds data that may be required in one or more layersCacheOptions
- a common object that holds configuration information for different layers of the cacheTagCache
- the public facing cache API that encapsulatesTaggedScavengedExpirableCacheStore
Please note that I was only experimenting with this sort of layered design and I am not advocating it. However, I am pretty pleased with the simplicity and elegance of this layered implementation.
From here, I'll focus my discussion on TaggedScavengedExpirableCacheStore
as that is the unique piece here.
Tagging
The approach is simple as sequenced below...
- For each tag, maintain a
TagInfo
bookkeeping object. TagInfo
will holdValidAfterTimeStamp
which is the latest invalidation timestamp for the tag.- Whenever a tag is invalidated, its
TagInfo.ValidAfterTimeStamp
is updated to the current timestamp. - Each cache item that has one or more tags associated with it that will capture the tag related information in a
TagExpirationSpecifier
object. - The
TagExpirationSpecifier
will maintain the cache item creation timestamp in itsCreationTimeStamp
field. - Whenever a cached item is looked up,
TagExpirationSpecifier
will iterate through the list of tags and ensure that allTagInfo.ValidAfterTimeStamp
s are behind theCreationTimeStamp
. - The cache item, if it passes the timestamp validation, would be considered valid, otherwise treated as invalid (and removed).
This is super easy, isn't it? As usual, the devil is in the details...
- While
ValidAfterTimeStamp
is good for single tag invalidations, there is no straightforward approach to invalidate efficiently on a combination of tags. - The
TagInfo
object map needs to be pruned of thoseTagInfo
s which are no longer used by any of the cached items.
We will delve deeper into the above two concerns.
Invalidating tag combinations
One approach is to maintain a tag-combination TagInfo
for all possible tag combinations and deal with invalidations similar to how we deal with
single tag invalidations. This approach would severely increase the bookkeeping memory overhead and will be impractical in most cases. The other approach is to
iterate through all the cache items and invalidate those items that match the tag combination. While this will work, it is very inefficient. Imagine iterating over
tens of thousands of objects each time a tag combination needs to be invalidated.
In the TagCache
implementation, we do check against every cache item. However, it happens in an on-demand fashion, thereby making the actual invalidation API to return quickly.
Here is what we do...
- In the Invalidate API, for a tag-combination, we capture the tag-combination and timestamp in the
TagInvalidationInfo
object. - The next step is to add the
TagInvalidationInfo
object to aTailList
maintained inTagInfo
of any one of the tags in the combination. - The list is called
TailList
because it tracks only the tail of the singly linked list. - The
TagExpirationSpecifier
object of each cache item would maintain the point-in-time head of theTailList
. - Note that the point-in-time head maintained by
TagExpirationSpecifier
may be different for different cache items. - While checking for cache item validity,
TagExpirationSpecifier
would iterate from the point-in-time head till the tail checking whether theTagInvalidationInfo
would invalidate the cache item. - At the end of the check,
TagExpirationSpecifier
would update the point-in-time head to the current tail ofTailList
. - So effectively, each cache item will iterate once through the
TagInvalidationInfo
objects related to each tag in the cache item's tag list (during the Get request or during expiry poll). - In the course of time, after all related cache items have iterated past a particular
TagInvalidationInfo
object, it'll be automatically available for GC. That is the beauty ofTailList
.
The validation logic for the tag combination available in TagExpirationSpecifier
is shown below...
// check whether any tag combinations have been invalidated
for (int i = 0; i < _TagInfoList.Count; i++)
{
TagInfo tagInfo = _TagInfoList[i];
object token = _TagInvalidationTokenList[i];
object newToken = tagInfo.TagInvalidationInfoList.ListHeadToken;
// take note of the new token before enumerating
foreach (TagInvalidationInfo invalidationInfo in
tagInfo.TagInvalidationInfoList.GetEnumerable(token))
{
// first: time stamp should be older
if (_CreationTimeStamp < invalidationInfo.ValidAfterTimeStamp)
{
// second: check if the invalidation tag
// combination matches with the available tags
if (_IsMatchedByTagSet(invalidationInfo.TagInfoList))
{
// this means that this item is captured
// by the invalidation and needs to expire, RIP
return true;
}
}
}
// update the list head token to point to the new token
_TagInvalidationTokenList[i] = newToken;
}
Pruning unused TagInfo objects
The TagInfoMap
object maintains a ConcurrentDictionary
of tag to TagInfo
mapping. The map will keep growing as
more and more tags are associated with cache items. However, in order to prevent the indefinite growth of TagInfoMap
, we need to
figure out those tags that are no longer associated with any object and remove them from the map. One way to do this is to iterate through all the cache
items from time to time and capture a list of all tags that are used and use it thereby removing unused tags from the map. This approach is very
inefficient as well as prone to race conditions (thereby requiring long standing locks).
The approach we follow in the TagCache
implementation is described below:
TagInfoMap
would maintain a map between a tag and aWeakReference
toTagInfo
.- When all cache items that are associated with a tag get removed eventually, the
TagInfo
would be garbage collected as well. - We scavenge the
TagInfoMap
at intervals to identify and remove thoseWeakReference
objects that have anull
target.
Conclusion
TagCache
helped solve a cache consistency pain-point that was haunting me at work for a long time. I sincerely hope that some of you may benefit from this piece of work.
History
- 9/26/2011 - First publication.