Introduction

Purpose of this article is to show what are common mistakes in lazy load implementations and how to implement fast concurrent lazy load cache. I’m writing this after seeing many incorrect implementations. Hopefully, detailed explanations will help some to better understand concurrent execution of code and how locking fits in.

Article also includes app for simulating different lazy load implementations and their performance.

If you are sure you know this stuff, check out Code sample 5: good lazy load – just to be sure.

Multi loading
Thread safety

Naive locking
Locking on data read
Good example
Good example using Lazy<T>
Numbers
Conclusion

Lazy load pattern

According to Wikipedia:

“Lazy loading is a design pattern commonly used in computer programming to defer initialization of an object until the point at which it is needed. It can contribute to efficiency in the program's operation if properly and appropriately used.”

I think of lazy load pattern as a simple code convention where some property or method once prompted to return some data, first checks if data is already loaded in some local cache, if not loads it and returns the data to the caller. Cached data is only loaded once and besides the performance gain, there is simplicity and delayed performance impact from the data load. Below is simple example of lazy load implementation of Products.

Code sample 1: simplest lazy load

private IList<Product> localCache;

        
public override IList<Product> GetProducts()
{
    if (this.localCache == null) // LL1
    {
        this.localCache = this.LoadProductsFromDatabase(); // LL2
    }

    return this.localCache; // LL3
}

        
public override void Reset()
{
    this.localCache = null; // R1
}

Why is your lazy load wrong?

Above is the simplest possible implementation of lazy load, but it has two problems:

It might load data more than once (multi loading)
It is not thread safe

Multi loading

The table below demonstrates a case when the above code will load data more than once. For simplicity, only two threads are shown, but there could be more simultaneous threads resulting in more than two data loads. Our goal is to load the data only once.

Table 1: Multi loading

T	Thread 1	Thread 2
t1	LL1
t2		LL1
t3		LL2 (loads data)
t4	LL2 (second data load)

Loading data more than once might not be an issue for some apps but for some it can be fatal. It depends on the data loading speed and the size of the data. Longer data loading is higher is the probability of error occurring and higher will be performance impact. The above example is hiding the fact that LL2 line of code is not atomic and the longer it takes to complete the higher is the chance of multi load occurring. Here is another example of the problem.

Table 2: Second scenario for multi loading

T	Thread 1	Thread 2
t1	LL1
t2	LL2 (started but cache not initialized)
t3		LL1
t4		LL2 (loads data)
t5	LL2 (finished – second data load)

If your application is such that multiple simultaneous requests on application starting are likely to occur you have a higher chance of experiencing terribly slow (freezing) application cold starts. Luckily, all these problems can easily be solved.

Thread safety

Another problem with above example is that it is not thread safe and that is something that is rarely acceptable. Multithreading bugs are hard because:

They are hard to reproduce
They cannot be unit tested (at least not easily with deterministic unit tests)
Most developers are not good at multithreading

Because of all the above reasons, in my team we are not trying to unit test multithreading issues. We fight them with simple and predictable code and detailed code reviews.

The above example is not thread safe because another thread might reset the cache after the cache was loaded, but before it was returned to the code that requested it. In this case, the code will break or return invalid data depending on the concrete implementation. See following table for a detailed example of the problematic case.

Table 3: Compromised thread safety

T	Thread 1	Thread 2	state after
t1	LL1		null
t2	LL2		Products
t3		R1	null
t4	LL3		null

Naive locking

When people become aware that their code should be concurrent, they know there should be some lock somewhere in code, so they might end up with something like the following.

Code sample 2: naive locking

public IList<Product> GetProducts()
{
    if (this.localCache == null) // LL1
    {
        lock (this.syncLock) // LL2
        {
            this.localCache = this.LoadProductsFromDatabase(); // LL3
        }
    }
 
    return this.localCache; // LL4
}
 
public void Reset()
{
    this.localCache = null;// R1
}

This code has all the same problems as the previous example, but for now we will just try to solve the multi loading issue. Here is the execution sample showing how multi loading could still occur.

Table 4: Multi load in despite of locking

T	Thread 1	Thread 2	Description
t1	LL1		Thread 1 detects that cache is not loaded
t2	LL2		Thread 1 acquires lock
t3	LL3 (started)		Thread 1 start loading cache but still didn’t initialize it
t4		LL1	Thread 2 detects that cache is not loaded
t5		LL2	Thread 2 waits on lock (held by thread 1)
t6	LL3 (finished)		Thread 2 initializes cache and releases the lock
t7		LL2	Thread 2 acquires lock
t8		LL3	Thread 2 loads data (MULTI LOAD)

It should be obvious from the above table where the problem is. While some thread is waiting on the lock because of some condition that same condition might have changed. Here is a corrected example with the IF-LOCK-IF approach.

Code sample 3: IF-LOCK-IF

public IList<Product> GetProducts()
{
    // check if cache is loaded
    if (this.localCache == null)
    { 
        // acquire lock … this could take a while
        lock (syncLock)
        { 
            // important! recheck if cache is loaded
            // while we were waiting for the lock
            if (this.localCache == null)
            { 
                this.localCache = this.LoadProductsFromDatabase();
            } 
        } 
    } 
 
    return this.localCache;
} 
 
public void Reset()
{
    this.localCache = null; // R1
}

The above code solves the problem in moment t8 because Thread 2 detects that the data is initialized and will not load it again.

Remember that when you see IF-LOCK-IF in your code, it is not just repeated code as it seems, there’s a deeper meaning behind it all.

With this sample, we have solved the multi loading issue but our code is still not thread safe because of cache resetting. If you never reset your lazy loaded cache or you are sure no one will access data while it is being refreshed, than you are good to go.

Locking on data read

Here is an example I have seen many times in code.

Code sample 4: locking on every data read

public IList<Product> GetProducts()
{
    lock (this.syncLock) 
    {
        if (this.localCache == null) 
        {
            this.localCache = this.LoadProductsFromDatabase();
        }
 
        return this.localCache;
    }
}
 
public void Reset()
{
    lock (this.syncLock)
    {
        this.localCache = null; 
    } 
}

The irony is that this example is thread safe and will not multi load. However, concurrent code is not same as thread safe code. In above example, all threads do sequential cache access. While Thread 1 reads the cache, Thread 2 waits. The result is that you have the data already loaded in memory just sitting there and you have to wait to read it.

Of course, depending on the cache usage, all these things might not be a problem. For instance, in our example data reading is very fast because GetProducts() only returns reference to a list of products so waiting time is very low. But if the cache had some product search logic integrated in it this might take a while, which would of course additionally lower performance for simultaneous users.

This performance issue can be addressed by using reader lock for accessing data for reading and writer lock for cache resetting. This implementation means that data can be read concurrently, but once cache is being reset all read locks will be blocked. The consequence of this is that performance is not compromised and the implementation is thread safe. On the other hand, we still have multi loading issues. Also, implementations relying on reader/writer lock tend to be more complex and harder for the average developer to understand.

Another thing I do not like in this example is that reset method is now fighting for the lock with other threads that are reading the data. This is very unlikely to be any kind of issue, but still I do not like having code with behavior that is not easy to predict.

Good example

It is finally time to show an example of a good implementation. See the comments in the code for a detailed explanation.

Code sample 5: good lazy load

public IList<Product> GetProducts()
{ 
    // declare local variable for cache access
    // and use it to be safe on sudden cache resetting 
    List<Product> result = this.localCache;
    if (result  == null) 
    {
        lock (this.syncLock)
        {
            // recheck condition after acquiring lock
            // to be sure cache is not loaded while waiting for the lock
            result = this.localCache;
            if (result  == null)
            { 
                // it is important to first store result in local variable 
                // and only after in cache because in between cache 
                // might be reset 
                result = this.LoadProductsFromDatabase(); 
                this.localCache = result; 
            } 
        } 
    }
 
    // important – return local variable and not this.localCache
    // above code guaranties that result will have value
    // on the other hand this.localCache might be reset at any time 
    return result; 
}
 
public void Reset() 
{ 
    // no need for any kind of locking 
    // because our lazy load can handle resetting at any time 
    this.localCache = null; 
}

This implementation is thread safe. It supports concurrent reading and will not multi load.

The downside is an increase in complexity, but it is still less than 10 lines of code, and if it is confusing your coworkers you can always paste the above comment to your code.

A very important fact is that resetting is very simple and it can occur at any time. This could be very important if you have a cache implementation that will reset on some event or after some time interval that your code does not explicitly control (reset just happens).

Good example using Lazy<T>

While I was writing this article I found out about Lazy<T> class introduced in .NET Framework 4.0 so I was eager to try and implement same previous code sample using this class and here is the code:

Code sample 6: good lazy load using Lazy<T>

private Lazy<IList<Product>> lazyCache;
 
public Constructor()
{ 
    this.Reset(); 
}
 
public IList<Product> GetProducts() 
{ 
    return this.lazyCache.Value; 
}
 
public void Reset() 
{ 
    this.lazyCache = new Lazy<IList<Product>>(this.LoadProductsFromDatabase); 
}

Thread safe, supports concurrent reading, will not multi load and all that without conditions and locks. Perfect!

Numbers

I have implemented simple simulator (attached to this article) of all cache implementations mentioned here (the good ones and the bad ones) so I can prove what I have stated in this article. See table below for interesting simulation results I got on my PC.

Note that:

Multi loads don’t affect sped of simulator realistically because Thread.Sleep() is used for simulating DB access. This is why simulator scales indefinitely.
Only purpose of 5a (Good with 1ms delay) is to compare with 4a (Locking On Data Read with 1ms delay)
Only difference between 5c (Custom Lazy<T>) and 5b (Lazy<T>) is that 5c uses my implementation of Lazy<T>. Purpose of 5c is to demonstrate that Lazy<T> doesn’t use any “magic”.

Table 5: Simulation results

	Cache implementation	Number of threads	Number of reads per thread	Time it takes to load cache (ms)	Reset cache after reads	Result/Comment
1	Simple Lazy Load	100	100	100	50	Application breaks.
2	Naive Locking	100	100	100	-	Elapsed time: 3,9 sec Data loads: 39 Data should have been loaded only once.
3	If-Lock-If	100	100	100	50	Application breaks.
4	Locking On Data Read	100	100	100	50	Elapsed time: 25,8 s Data loads: 200 Seem ok, just a bit slower than good implementation (5). But compare 4a and 5a.
4a	Locking On Data Read with 1ms delay	100	100	100	50	Elapsed time: 120 s Data loads: 200 Very bad performance.
5	Good	100	100	100	50	Elapsed time:19,9 s Data loads: 200
		10.000	10.000	100		Elapsed time: 42,6 s Data loads: 1
5a	Good with 1ms delay	100	100	100	50	Elapsed time: 10,4 s Data loads: 102
5b	Lazy<T>	100	100	100	50	Elapsed time: 11,5 s Data loads: 200
		10.000	10.000	100		Elapsed time: 35,2 s Data loads: 1
5c	CustomLazy<T>	100	100	100	50	Elapsed time: 10,4 s Data loads: 200
		10.000	10.000	100		Elapsed time: 35,3 s Data loads: 1

Conclusion

Caching is something you will probably need in your applications. I hope I have managed to clarify some edge cases and save you from some nasty production bugs.