Introduction
Microsoft .NET provides some excellent data caching mechanisms. They are fairly easy to implement and integrate in your application. In this article we will discuss some
specific scenarios when a caching mechanism is not enough due to the fact that the data you want to cache is very expensive to retrieve.
Background
If you want to get your hands dirty with cache in general you can start by clicking
here: http://www.codeproject.com/search.aspx?q=Caching&doctypeid=1%3b2%3b3 .
What makes a good caching mechanism?
- It is fast: sounds like stupid to mention but this is definitely the primary goal.
- It is clearable: users must have the ability to clear the cache on-demand. Who wants to work against old data when knowing fresh
data is out there?
- It is time constrained: it’s unlikely that cached data will never ever change. So a good cache mechanism must provide a way to set content expiration.
You get these benefits and a few more like automatic invalidation through monitors when using
System.Runtime.Caching.MemoryCache .
Why it’s (sometimes) not good enough?
Under certain circumstances the cost of populating the cache is very expensive (data provided by 3rd party systems via webservices or the result of a heavy calculation process, etc…).
In such a scenario whenever the cache is invalidated (by automatic expiration or human invalidation) your application is blocked while getting fresh data. I have some web applications
in production for which 3rd parties take more than two minutes to provide me with their data!
How to improve that situation?
At initial population of the cache there is not much to do. Data must be retrieved
at whatever cost. The best solution you can offer is to load all your cached data
at startup of the application, in parallel threads if possible.
But then at runtime when the cache is invalidated why not repopulate your cached data in a background thread and then hide the cost of data retrieval
to the end-users?
A slightly different pattern for data caching
Standard cache mechanism requires the developer to populate it and then it auto-invalidates itself after content expiration. With the subsequent approach the cache mechanism
will auto-populate as well as auto-invalidate.
From a coder's perspective when using that pattern you will be entitled to provide a method to retrieve the data and its lifetime. In other words the mechanism translates
into an abstract class with the following abstract methods:
protected abstract T GetData();
protected virtual TimeSpan GetLifetime();
You will then access the cached data through a static property with the following signature:
public static T Data { get ;}
and you can force invalidation with the following static method:
public static void Invalidate();
Read this carefully
I am going to introduce you to an implementation for that pattern but please read this important note on the intrinsic limitation of the pattern:
You might not get “fresh” data out of the cache right after cache invalidation!
The reason is simple: the pattern states that the data takes a long time to retrieve and we are going to get it in the background to prevent UI blockings.
If you make a call to the Invalidate()
method then the GetData()
method is called asynchronously in the background. Meaning that while the data is repopulating your
only option is to serve the “old” data as long as you don’t get the fresh one.
That said there are a few things you have to take care of when providing an implementation for the pattern:
- When the cache is populated for the first time: all client threads must be blocked as long as the data is not fully there yet.
- To prevent blockings: when the cached data is expired or invalidated the cache should still serve the old content as long as
a fresh one is not available.
- You must make sure that one and only one thread is launched to refresh data in the background.
A possible implementation for that pattern
This one is the most basic you could implement, it stores data in memory, and each time the data is requested it checks if
the lifetime has expired. If the cache expiration is set
to 10 minutes and you have no call for one hour then the data will remain in the cache for one hour and 10 minutes minimum.
It is left to the reader to provide a different implementation which might deal with timers or anything else.
This one serves my purpose very well as I don’t cache data that is barely used (do you?)...
public abstract class Cache<U,T>
where U : Cache<U,T>, new()
where T : class
{
protected abstract T GetData();
protected virtual TimeSpan GetLifetime() { return TimeSpan.FromMinutes(10); }
protected Cache() { }
enum State
{
Empty,
OnLine,
Expired,
Refreshing
}
static U Instance = new U();
static T InMemoryData { get; set; }
static volatile State CurrentState = State.Empty;
static volatile object StateLock = new object();
static volatile object DataLock = new object();
static DateTime RefreshedOn = DateTime.MinValue;
public static T Data
{
get
{
switch (CurrentState)
{
case State.OnLine:
var timeSpentInCache = (DateTime.UtcNow - RefreshedOn);
if (timeSpentInCache > Instance.GetLifetime())
{
lock (StateLock)
{
if (CurrentState == State.OnLine) CurrentState = State.Expired;
}
}
break;
case State.Empty:
lock (DataLock)
{
lock (StateLock)
{
if (CurrentState == State.Empty)
{
InMemoryData = Instance.GetData();
RefreshedOn = DateTime.UtcNow;
CurrentState = State.OnLine;
}
}
}
break;
case State.Expired:
lock (StateLock)
{
if (CurrentState == State.Expired)
{
CurrentState = State.Refreshing;
Task.Factory.StartNew(() => Refresh());
}
}
break;
}
lock (DataLock)
{
if (InMemoryData != null) return InMemoryData;
}
return Data;
}
}
static void Refresh()
{
if (CurrentState == State.Refreshing)
{
var dt = Instance.GetData();
lock (StateLock)
{
lock (DataLock)
{
RefreshedOn = DateTime.UtcNow;
CurrentState = State.OnLine;
InMemoryData = dt;
}
}
}
}
public static void Invalidate()
{
lock (StateLock)
{
RefreshedOn = DateTime.MinValue;
CurrentState = State.Expired;
}
}
}
Example of usage
A cache holding a very expensive list of strings which takes three seconds to be built.
public class MyExpensiveListOfStrings : Cache<MyExpensiveListOfStrings, List<string>>
{
protected override List<string> GetData()
{
System.Diagnostics.Trace.WriteLine("Getting fresh data...");
Thread.Sleep(3000);
List<string> result = new List<string>();
for (int i = 0; i < 10000; i++)
{
result.Add("Data - " + i.ToString());
}
return result;
}
protected override TimeSpan GetLifetime()
{
return TimeSpan.FromSeconds(30);
}
}
Example of program using the cache
The program will create 10 threads accessing the cached list of strings.
All threads will be blocked the first time the cache is populated.
Then the cache will refresh in the background every 30 seconds or will be invalidated when you press the space bar. No threads will be blocked anymore.
class Program
{
static void Main(string[] args)
{
for (int i =0;i<10; i++)
{
var t = Task.Factory.StartNew(() =>
{
while (true)
{
System.Diagnostics.Trace.WriteLine("Looping " +
Thread.CurrentThread.ManagedThreadId + " -> " +
MyExpensiveListOfStrings.Data.Count);
Thread.Sleep(50);
}
});
}
ConsoleKeyInfo key = Console.ReadKey();
while (key.Key == ConsoleKey.Spacebar)
{
MyExpensiveListOfStrings.Invalidate();
key = Console.ReadKey();
}
}
}
A matter of point of view
I know it sounds weird to build a program which operates with data which is not the most up to date...
Now take a look at it from an helicopter view and you'll see that it is the end-user who is working towards slightly outdated data, the program does not care...
Push your view a bit further: whatever the way you cache data if the user needs the most up to date data and the data takes
two minutes to load then your user will have to wait two minutes.
Now the question becomes: Is there any good reason to block all threads that rely on cached data in your application while one user who needs the latest
version is waiting for it?
If your answer is:
- undoubtedly yes! then forget about this article
- it depends on the kind of data! then I hope you have read something of interest here.
Guys, your feedback is most welcome.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.