Click here to Skip to main content
Click here to Skip to main content

Ten Caching Mistakes that Break your App

By , 11 Jun 2011
 
Prize winner in Competition "Best ASP.NET article of October 2010"

Introduction

Caching frequently used objects, that are expensive to fetch from the source, makes application perform faster under high load. It helps scale an application under concurrent requests. But some hard to notice mistakes can lead the application to suffer under high load, let alone making it perform better, especially when you are using distributed caching where there’s separate cache server or cache application that stores the items. Moreover, code that works fine using in-memory cache can fail when the cache is made out-of-process. Here I will show you some common distributed caching mistakes that will help you make better decisions when to cache and when not to cache.

Here are the top 10 mistakes I have seen:

  1. Relying on .NET’s default serializer
  2. Storing large objects in a single cache item
  3. Using cache to share objects between threads
  4. Assuming items will be in cache immediately after storing them
  5. Storing entire collection with nested objects
  6. Storing parent-child objects together and also separately
  7. Caching Configuration settings
  8. Caching Live Objects that have open handle to stream, file, registry, or network
  9. Storing same item using multiple keys
  10. Not updating or deleting items in cache after updating or deleting them on persistent storage

Let’s see what they are and how to avoid them.

I am assuming you have been using ASP.NET Cache or Enterprise Library Cache for a while, you are satisfied, now you need more scalability and have thus moved to an out-of-process or distributed cache like Velocity or Memcache. After that, things have started to fall apart and thus the common mistakes listed below apply to you.

Relying on .NET’s Default Serializer

When you use an out-of-process caching solution like Velocity or memcached, where items in cache are stored in a separate process than where your application runs; every time you add an item to the cache, it serializes the item into byte array and then sends the byte array to the cache server to store it. Similarly, when you get an item from the cache, the cache server sends back the byte array to your application and then the client library deserializes the byte array into the target object. Now .NET’s default serializer is not optimal since it relies on Reflection which is CPU intensive. As a result, storing items in cache and getting items from cache add high serialization and deserialization overhead that results in high CPU, especially if you are caching complex types. This high CPU usage happens on your application, not on the cache server. So, you should always use one of the better approaches shown in this article so that the CPU consumption in serialization and deserialization is minimized. I personally prefer the approach where you serialize and deserialize the properties all by yourself by implementing ISerializable interface and then implementing the deserialization constructor.

[Serializable]
    public class Customer : ISerializable
    {
        public string FirstName;
        public string LastName;
        public int Salary;
        public DateTime DateOfBirth;

        public Customer()
        {
        }

        public Customer(SerializationInfo info, StreamingContext context)
        {
            FirstName = info.GetString("FirstName");
            LastName = info.GetString("LastName");
            Salary = info.GetInt32("Salary");
            DateOfBirth = info.GetDateTime("DateOfBirth");
        }

        #region ISerializable Members

        public void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("FirstName", FirstName);
            info.AddValue("LastName", LastName);
            info.AddValue("Salary", Salary);
            info.AddValue("DateOfBirth", DateOfBirth);
        }

        #endregion        
    }

This prevents the formatter from using reflection. The performance improvement you get using this approach is sometimes 100 times better than the default implementation when you have large objects. So, I strongly recommend that at least for the objects that are cached, you should always implement your own serialization and deserialization code and not let .NET use Reflection to figure out what to serialize.

Storing Large Objects in a Single Cache Item

Sometimes we think large objects should be cached because they are too expensive to fetch from the source. For example, you might think caching an object graph of 1 MB might give you better performance than loading that object graph from file or database. You would be surprised how non scalable that is. It will certainly work a lot faster than loading the same thing from database when you have only one request at a time. But under concurrent load, frequent access to that large object graph will blow up server’s CPU. This is because Caching has high serialization and deserialization overhead. Every time you will try to get an 1 MB object graph from an out of process cache, it will consume significant CPU to build that object graph in memory.

var largeObjectGraph = myCache.Get("LargeObjectGraph");
var anItem = 
    largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel.FourthLevel.TheItemWeNeed;

Solution is not to cache the large object graph as a single item in the cache using a single key. Instead you should break that large object graph into smaller items and then cache those smaller items individually. You should only retrieve from cache the smallest item you need.

// store smaller parts in cache as individual item
var largeObjectGraph = new VeryLargeObjectGraph();
myCache.Add("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel", 
  largeObjectGraph.FirstLevel.SecondLevel.ThirdLevel);
...
...
// get the smaller parts from cache
var thirdLevel = myCache.Get("LargeObjectGraph.FirstLevel.SecondLevel.ThirdLevel");
var anItem = thirdLevel.FourthLevel.TheItemWeNeed;

The idea is to look at the items that you need most frequently from the large object (say the connection strings from a configuration object graph) and store those items separately in the cache. Always keep in mind that the item that you retrieve from cache is always small, say max 8 KB.

Using Cache to Share Objects Between Multiple Threads

Since you can access cache from multiple threads, sometimes you use it to conveniently pass data between multiple threads. But cache, like static variables, can suffer from race conditions. It’s even more common when the cache is distributed since storing and reading an item requires out-of-process communication and your threads get more chance to overlap on each other than in-memory cache. The following example shows how in-memory cache rarely demonstrates the race condition but an out-of-process cache almost always shows it:

myCache["SomeItem"] = 0;

var thread1 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"]; // Most likely 0
    item ++;
    myCache["SomeItem"] = item;
});
var thread2 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"]; // Most likely 1
    item ++;
    myCache["SomeItem"] = item;
});
var thread3 = new Thread(new ThreadStart(() =>
{
    var item = myCache["SomeItem"];  // Most likely 2
    item ++;
    myCache["SomeItem"] = item;
});

thread1.Start();
thread2.Start();
thread3.Start();
.
.
.

The above code most of the time demonstrates the most likely behavior when you are using in-memory cache. But when you go out-of-process or distributed, it will always fail to demonstrate the most-likely behavior. You need to implement some kind of locking here. Some caching provider allows you to lock an item. For example, Velocity has locking feature, but memcache does not. In Velocity, you can lock an item:

// get an item and lock it
DataCacheLockHandle handle;
SomeClass someItem = _defaultCache.GetAndLock("SomeItem", 
   TimeSpan.FromSeconds(1), out handle, true) as SomeClass;
// update an item
someItem.FirstName = "Version2";
// put it back and get the new version
DataCacheItemVersion version2 = _defaultCache.PutAndUnlock("SomeItem", 
    someItem, handle);

You can use locking to reliably read and write to cache items that get changed by multiple threads.

Assuming Items will be in Cache Immediately After Storing It

Sometimes you store an item in cache on a submit button click and assume that upon the page postback, the item can be read from cache because it was just stored in cache. You are wrong.

private void SomeButton_Clicked(object sender, EventArgs e)
{
  myCache["SomeItem"] = someItem;
}

private void OnPreRender()
{
  var someItem = myCache["SomeItem"]; // It's gone dude!
  Render(someItem);
}

You can never assume an item will be in cache for sure. Even if you are storing the item in Line 1 and reading it from Line 3. When your application is under pressure and there’s a scarcity of physical memory, cache will flush out items that aren’t frequently used. So, by the time code reaches Line 3, cache could be flushed out. Never assume you can always get an item back from cache. Always have a null check and retrieve from persistent storage.

var someItem = myCache["SomeItem"] as SomeClass ?? GetFromSource();

You should always use this format when reading an item from cache.

Storing Entire Collection with Nested Objects

Sometimes you store an entire collection in a single cache item because you need to access the items in the collection frequently. Thus every time you try to read an item from the collection, you have to load the collection first and then read that particular item. Something like this:

var products = myCache.Get("Products");
var product = products[1];

This is inefficient. You are unnecessarily loading an entire collection just to read a certain item. You will have absolutely no problem when the cache is in-memory, as the cache will just store a reference to the collection. But in a distributed cache, where the entire collection is deserialized every time you access it, it will result in poor performance. Instead of caching a whole collection, you should cache individual items separately.

// store individual items in cache
foreach (Product product in products)
  myCache.Add("Product." + product.Index, product);
...
...
// read the individual item from cache
var product = myCache.Get("Product.0");

The idea is simple, you store each item in the collection individually using a key that can be guessed easily, for example using the index as a padding.

Storing Parent-child Objects Together and Also Separately

Sometimes you store an object in cache that has a child object, which you also separately store in another cache item. For example, say you have a customer object that has an order collection. So, when you cache customer, the order collection gets cached as well. But then you separately cache the individual orders. So, when an individual order is updated in cache, the orders collection containing the same order inside the customer object is not updated and thus gives you inconsistent result. Again this works fine when you have in-memory cache but fails when your cache is made out-of-process or distributed.

var customer = SomeCustomer();
var recentOrders = SomeOrders();
customer.Orders = GetCustomerOrders();
myCache.Add("RecentOrders", recentOrders);
myCache.Add("Customer", customer);
...
...
var recentOrders = myCahce.Get("RecentOrders");
var order = recentOrders["ORDER10001"];
order.Status = CANCELLED; 
...
...
...
var customer = myCache.Get("Customer");
var order = customer.Orders["ORDER10001"];
order.Status = PROCESSING; // Inconsistent. The order has already been cancelled

This is a hard problem to solve. It requires clever design so that you never end up having the same object stored twice in the cache. One common approach is not to store child objects in cache, instead store keys of child object so that they can be retrieved from cache individually. So, in the above scenario, you would not store the customer’s order collection in cache. Instead you will store the OrderID collection with Customer and then when you need to see the orders of a customer, you try to load the individual order object using the OrderID.

var recentOrders = SomeOrders();
foreach (Order order in recentOrders)
   myCache.Add("Order." + order.ID, order);
...
var customer = SomeCustomer();
customer.OrderKeys = GetCustomerOrders(); // Store keys only
myCache.Add("Customer", customer);
...
...
var order = myCache.Get["Order.10001"];
order.Status = CANCELLED; 
...
...
...
var customer = myCache.Get("Customer");
var customerOrders = customer.OrderKeys.ConvertAll<string, Order>
   (key => myCache.Get("Order." + key));
var order = customerOrders["10001"]; // Correct object from cache

This approach ensures that a certain instance of an entity is stored in the cache only once, no matter how many times it appears in collections or parent objects.

Caching Configuration Settings

Sometimes you cache configuration settings. You use some cache expiration logic to ensure the configuration is refreshed periodically or refreshed when the configuration file or database table changes. Since configuration settings are access very frequently, reading them from cache adds significant CPU overhead. Instead you should just use static variables to store configurations.

var connectionString = myCache.Get("Configuration.ConnectionString");

You should not follow such an approach. Getting an item from cache is not cheap. It may not be as expensive as reading from a file or registry. But it’s not very cheap either, especially if the item is a custom class that adds some serialization overhead. So, you should instead store the configuration settings in static variables. But you might ask, how do we refresh configuration without restarting appdomain when it’s stored in static variable? You can use some expiration logic like file listener to reload the configuration when configuration file changes or use some database polling to check for database update.

Caching Live Objects that have Open File, Registry or Network Handle

I have seen developers cache instance of classes which hold open connection to file, registry or external network connection. This is dangerous. When items are removed from cache, they aren’t disposed automatically. Unless you dispose such class, you leak system resource. Every time such a class instance is removed from cache due to expiration or some other reason without being disposed, it leaks the resources it was holding onto.

You should never cache such objects that hold open streams, file handles, registry handles or network connections just because you want to save opening the resource every time you need them. Instead you should use some static variable or use some in-memory cache that is guaranteed to give you expiration callback so that you can dispose them properly. Out of process caches or session stores do not give you expiration callback consistently. So, never store live objects there.

Storing Same Item using Multiple Keys

Sometimes you store objects in cache using the key and also by index because you not only need to retrieve items by key but also need to iterate through items using index. For example,

var someItem = new SomeClass();
myCache["SomeKey"] = someItem;
.
.
myCache["SomeItem." + index] = someItem;
.
.

If you are using in-memory cache, the following code will work fine:

var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Returns Hello, fine, when In-memory cache
/* But fails when out of process cache */

The above code works when you have in-memory cache. Both of the items in the cache are referring to the same instance of the object. So, no matter how you get the item from cache, it always returns the same instance of the object. But in an out-of-process cache, especially in a distributed cache, items are stored after serializing them. Items aren’t stored by reference. Thus you store copies of items in cache, you never store the item itself. So, if you retrieve an item using a key, you are getting a freshly made copy of that item as the item is deserialized and created fresh every time you get it from cache. As a result, changes made to the object never reflects back to the cache unless you overwrite the item in the cache after making the changes. So, in a distributed cache, you will have to do the following:

var someItem = myCache["SomeKey"];
someItem.SomeProperty = "Hello";
myCache["SomeKey"] = someItem; // Update cache
myCache["SomeItem." + index] = someItem; // Update all other entries
.
.
.
var someItem = myCache["SomeItem." + index];
var hello = someItem.SomeProperty; // Now it works in out-of-process cache

Once you update the cache entry using the modified item, it works as the items in the cache receive a new copy of the item.

Not Updating or Deleting Objects from Cache when Items are Updated or Deleted from Data Source

This again works in in-memory cache, but fails when you go to out-of-process/distributed cache. Here’s an example:

var someItem = myCache["SomeItem"];
someItem.SomeProperty = "Hello Changed";
database.Update(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // "Hello Changed"? Nope.

This works fine in an in-memory cache, but fails when it’s out-of-process or distributed cache. The reason is you changed the object but never updated the cache with the latest object. Items in cache are stored as a copy, not the original instance.

Another mistake is not deleting items from cache when the item is deleted from the database.

var someItem = myCache["SomeItem"];
database.Delete(someItem);
.
.
.
var someItem = myCache["SomeItem"];
Console.WriteLine(someItem.SomeProperty); // Works fine. Oops!

Don’t forget to delete items from cache, all possible ways it has been stored in cache, when you delete an item from database, file or some persistent store.

Conclusion

Caching requires careful planning and clear understanding of the data being cached. Otherwise when cache is made distributed, it not only performs worse but can also fail the code. Keeping these common mistakes in mind while caching will help you cash out from your code.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Omar Al Zabir
Architect BT, UK (ex British Telecom)
United Kingdom United Kingdom
Member

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionMy vote of 5memberRC_Sebastien_C24 Nov '11 - 8:59 
Thank you
GeneralMy vote of 5memberAnurag Gandhi17 Oct '11 - 4:50 
Nice piece of information.
GeneralMy vote of 5memberjj taylor14 Jun '11 - 4:36 
A clear, concise and sobering summary of the pros and cons of caching.
GeneralParital item in cachememberJinx1017 Feb '11 - 6:26 
Have you ever seen this behavior? I understand that a cached item maybe set on line 1, and removed by the time you execute line 3. However, have you ever seen an item partially exist? I'm having a sporadic issue where say on line 1 I put a DataTable object with 100 rows in the cache. On line 3, it still exists, but it only has 15 of those rows. I would have thought that the object either would have existed, or not. Am I someone catching this object while it's being cleared and that's why I'm receiving a partial table? I have verified that when it goes into the cache that it has more rows than when I get it from the cache. Like I said though, this does not happen often, but has happened enough to where I've seen it. It cannot be recreated with certaintity (I assume due to memory pressure and state on the server at the time it happens).
GeneralRe: Parital item in cachemvpOmar Al Zabir7 Feb '11 - 7:46 
As per the laws of physics this should not happen. Maybe you have got some code that's overwriting that cache entry using the same key.
(regards) => "Omar AL Zabir"
+ "C#, ASP.NET MVP"
+ "http://omaralzabir.com";

GeneralMy vote of 5memberBocochi26 Dec '10 - 17:36 
Great article
GeneralCongratulations, Omar!memberMarcelo Ricardo de Oliveira3 Dec '10 - 0:10 
For people learning caching, this is a must read. Keep it up! Smile | :)
cheers,
marcelo
Take a look at WPF Grand Prix here in The Code Project.

GeneralMy vote of 5memberthmo20 Nov '10 - 1:16 
Great article - Nice explanations and code examples
GeneralMy vote of 5memberranjan_namitaputra18 Nov '10 - 9:56 
Great work
GeneralSqlCacheDependency has performance hit on my Databasememberhappyspider27 Oct '10 - 3:12 
I save a lot of small objects in my cache and each entry has a SqlCacheDependency associated with it.
This works ok but now I got too much load on my db resulting from the SqlCacheDependency.
 
I dont see any way to change the poll time of the query notification system. I use Sql server 2008.
 
Do you have any idea on how to limit the number of queries resulting from SqlCacheDependency?
 
SqlCacheDependency
 

public SqlCacheDependency CacheControllerDependency(CacheControllerKeys cacheKey)
{
SqlConnection conn = new SqlConnection(System.Configuration.ConfigurationManager.ConnectionStrings["SQLOrderB2CConnString"].ConnectionString);
try
{
 
SqlCommand command = new SqlCommand("getCacheController");
command.CommandType = CommandType.StoredProcedure;
command.Connection = conn;
 
SqlParameter paraCacheName = new SqlParameter("@CacheName", SqlDbType.NVarChar, 100);
paraCacheName.Value = cacheKey.ToString();
command.Parameters.Add(paraCacheName);
 

SqlParameter paraApplicationKey = new SqlParameter("@ApplicationKey", SqlDbType.NVarChar, 100);
paraApplicationKey.Value = B2CShop.Model.Configuration.ApplicationConfiguration.ApplicationCountry;
command.Parameters.Add(paraApplicationKey);
 
SqlCacheDependency dependency = new SqlCacheDependency(command);
conn.Open();
command.ExecuteNonQuery();
return dependency;
}
catch (System.Exception ex)
{
_log.Error(ex);
throw;
}
finally
{
if (conn != null)
{
conn.Close();
}
}
}

 
Usage


 
completeListRim = ProductRespository.GetRimsAL(myKey);
 

 
SqlCacheDependency rimALCacheDependency = new CacheController().CacheControllerDependency(CacheController.CacheControllerKeys.RimALCacheDependency);
HttpRuntime.Cache.Add(myKey, completeListRim, rimALCacheDependency, DateTime.MaxValue, TimeSpan.FromDays(10), CacheItemPriority.Default, RemovedCallback);
 

QuestionIs it good to cache XpathDocument object in asp.net cache ?memberkarthik reddy chintaparthi23 Oct '10 - 4:40 
Hi Omar,
 
One of the point says "Caching live objects that has open file, registry or network handle" Does it mean we sould not cache an XpathDocument instance in asp.net immemory cache ?
 
Can you please give an example on this scenario ?
 
Finally, article gives some very nice pointers on how to use caching mechanism Smile | :)
 
Cheers, Karthik
QuestionCache needs to (de)serialize ??memberXmen W.K.17 Oct '10 - 3:22 
Well, I don't know about how velocity and other works internally. But Cache doesn't need to serialize, it stores live reference of object. It sort of like static object. Am I right ?


TVMU^P[[IGIOQHG^JSH`A#@`RFJ\c^JPL>;"[,*/|+&WLEZGc`AFXc!L
%^]*IRXD#@GKCQ`R\^SF_WcHbORY87֦ʻ6ϣN8ȤBcRAV\Z^&SU~%CSWQ@#2
W_AD`EPABIKRDFVS)EVLQK)JKQUFK[M`UKs*$GwU#QDXBER@CBN%
R0~53%eYrd8mt^7Z6]iTF+(EWfJ9zaK-i’TV.C\y<pŠjxsg-b$f4ia>
-----------------------------------------------
128 bit encrypted signature, crack if you can

AnswerRe: Cache needs to (de)serialize ??mvpOmar Al Zabir17 Oct '10 - 12:38 
Only in-memory ones. Any out of process or distributed cache always serializes.
(regards) => "Omar AL Zabir"
+ "C#, ASP.NET MVP"
+ "http://omaralzabir.com";

GeneralMy vote of 5memberJFergulbops14 Oct '10 - 0:19 
awesome article
GeneralMy vote of 4membertec-goblin11 Oct '10 - 10:47 
Very good, but what about DataContextSerializer? It is supposed to be faster, and provides an opt-in mode of declaring properties to serialize that's quite fine-grained.
GeneralCach recyclingmemberEhsanShemirani9 Oct '10 - 19:11 
Hi, its a great article and i learned many things. i'm one of your fan and i'll eat up your article whereever i find. If we store collection items separately, what's happen if cache recycle some of them, the same problem for parent-child objects, I think if cache recycle the whole object it is more reliable but the performance as you've told would be degrade, whats your suggestion on this problem ?
GeneralRe: Cach recyclingmvpOmar Al Zabir10 Oct '10 - 8:58 
Good observation. The general rule of thumb is, you should never assume you will get an item from cache for sure. Any item can be missing from cache. So, no matter how you store items, always ensure you have the fallback option.
(regards) => "Omar AL Zabir"
+ "C#, ASP.NET MVP"
+ "http://omaralzabir.com";

GeneralRe: Cach recyclingmemberEhsanShemirani10 Oct '10 - 18:40 
Yes you are right but in case of a large object, if one of its properties recycled from cache, we should reload it and save to cache again, but other properties of old object is still in cache. I don't know how much this is probable but (Hit) count of cache is degrade and it would be a bad use case.
Thanks
GeneralMy vote of 5memberthatraja6 Oct '10 - 20:55 
Good points
General"Storing same item using multiple keys" alternative...memberAndrew Rissing5 Oct '10 - 4:40 
You could just also not store an object multiple times and just store the string used to store the item in the cache (I know - confusing, but an example would explain better). Ex:
myCache["SomeKey"] = "SomeItem." + index; // Provides a 'pointer' to the desired item.
myCache["SomeItem." + index] = someItem; // The single location the item is stored.
 
So, if you need to have a reference to an item, you don't duplicate it. You just have one additional 'hop' in the cache to get your item. By doing this, it should reduce memory contention in the cache and improve CPU usage as you're not serializing/deserializing as much.
GeneralRe: "Storing same item using multiple keys" alternative...mvpOmar Al Zabir5 Oct '10 - 4:51 
Very good!
(regards) => "Omar AL Zabir"
+ "C#, ASP.NET MVP"
+ "http://omaralzabir.com";

GeneralVote of 5 and 2 questionsmemberignatandrei5 Oct '10 - 3:31 
1. Please share code for GetAndLock . I have had many problems of putting items in cache for ASP.NET applications.
 
2. Please tell how do you implement cache per user and cache per application ( like Application and Session in ASP.NET)
GeneralMy vote of 5membercwienands5 Oct '10 - 3:30 
Excellent article. Very founded!
GeneralMy vote of 5membergorgias995 Oct '10 - 2:26 
Very instructive
GeneralCache detailsmemberfederico.strati4 Oct '10 - 22:03 
May you add the details of how the cache is instantiated
and normally used in the introduction? Otherwise, some people
like me not used to ASP.Net may be at loss.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130516.1 | Last Updated 11 Jun 2011
Article Copyright 2010 by Omar Al Zabir
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid