Understanding weak references in .NET

Paulo Zemek

4.89/5 (61 votes)

Oct 6, 2013

CPOL

12 min read

159434

Understand what is a weak reference and how it can help you in caching scenarios or to avoid memory leaks.

The very basic

In .NET, any normal reference to another object is a strong reference. That is, when you declare a variable of a type that's not a primitive/value type, you are declaring a strong reference.

If you hold a strong reference to an object directly in a static variable or in a local variable, it can't be collected. Also, if such an object holds references to other objects, those other objects can't be collected either. That is, an entire graph of objects that have "roots" is considered alive and can't be collected. This is why objects that you are using aren't collected while in use, they are used directly or indirectly through static variables or local variables on the callstack.

So, by knowing what is a strong reference we can understand a weak reference.

A weak reference is a reference to an object that still allows such an object to be collected. In such an event, the weak reference will become null.

Usefulness

The biggest challenge of using a weak reference is to know in which scenarios to use one, as understanding the weak reference itself is not really hard.

When I learned about WeakReferences I immediately thought they were useful for caching. The example of using a WeakReference in MSDN (at least at that moment) was a Windows Forms application that loaded a background image and kept only a weak reference to it. So, if memory was needed, .NET knew it could remove the background from memory, after all such code could reload the background to draw it when needed.

But in the description of the WeakReference itself it is said that WeakReferences aren't good for caching. So, what's wrong?

Well, the first thing was the example. If a WeakReference is not good for caching, why did we have such an example of a background image stored as a weak reference? Apparently they saw the error, as I couldn't find that example anymore.

But if we ignore that, what's the problem of using WeakReferences for caching purposes? The documentation still says that we should use a "clever" memory management, yet it doesn't explain what's the problem about using weak references as cache, and it appears that they are simply useless for caching scenarios, when that's not true.

Weak references may be collected too often

The main reason that weak references aren't useful for caching (not alone) is that you may be using the weak reference very often, yet its value will be collected at the next collection if you don't have a strong reference for its value. That is, you may have just used the object, but .NET decides to collect objects and your weak reference becomes null.

For example, to force the situation, we can use this code:

WeakReference reference = new WeakReference(null);
reference.Target = new object();

// Read the reference.Target many times if you want, only
// to represent that you are using it.
Console.WriteLine(reference.Target);

GC.Collect();
Thread.Sleep(1000);

// Now check if the reference.Target is null. It will probably be.
Console.WriteLine(reference.Target);

Surely in this sample we are forcing a garbage collection, but we should note that the .NET garbage collections aren't related to Windows running short on memory. Your computer may still have lots of free memory, but when new allocations are done, .NET will first try to release objects to reuse memory before requesting more memory to Windows. That is, we may use a weak reference to store a background and when we open a new form, that background is collected. We move that new window and we need to load that background again. Depending on what you do in that form the background may be collected again, to be reloaded just after. So, a solution to free memory and speed things up by removing an unused background from memory will make everything slower as it will keep loading and collecting the same background all the time.

So, are the weak references useless?

The answer is no. There are still many situations that may benefit from weak references and some of those are still related to caching data.

For example, imagine that you create a caching system using normal references and some timers to remove old items from the cache (for example, after one minute unused, the item is removed from the cache).

If we consider that one minute is only a minimum time, not a security time that must be enforced, we may still want to have weak references when items aren't used for more than one minute.

Think about the situation: You load lots of data into memory. Such data is kept in memory for at least one minute. Then, after one minute, you don't hold any strong references to it. But it is not guaranteed that such data will be collected immediately... depending on the situation (like object generation and memory allocations), it may take many minutes or even hours before such an item is really removed from memory.

Now, after some time, that item that was "expired" from the cache only to free memory is requested again. If you don't have a weak reference to it, you will end-up using more memory to load it again (and probably will lose more time loading it). But, what if you were using a weak reference in this case? You still use the one minute time as "it must stay alive" but, after that, you keep only the weak reference. If there's a collection, that item is gone. But, if there isn't a collection and you request the item again, you can still find it in memory and recreate a strong reference to it. That is, you will have a faster solution and you will even use less memory by reusing an object that was still in memory, even after you "let it die".

So, the clever memory management needed for caches doesn't mean we should avoid weak references, it only means that we should do something else to guarantee that we will not be removing recently used objects (or any object that we expect to use very soon) from memory. Yet, after we decide an object is not needed anymore, as .NET takes some time to really remove the object from memory, we may still want to recover a reference to such an object instead of loading a new copy of it.

In my particular case, I created a solution that's not time based, but collection based. I guarantee that used objects will survive the next full collection. So, if the object is used now, it will survive the next collection, be it in 10 milliseconds or 10 hours. If, then, it is not used, it will die at the following collection, but if it is used, it will survive again a new collection. But any solution to make recently lived objects work. The weak reference will help in reusing those cases that are still in memory, even if you thought they shouldn't be there. If you want to know that particular solution, I have explained it in one of my first articles here at CodeProject: WeakReferences as a Good Caching Mechanism.

Using a Weak Reference is unsafe! Using a finalized object is dangerous!

Many times when I was talking about weak references someone said that weak references are unsafe because they revive finalized objects, which is extremely dangerous. But that's not true.

In fact, it may be true, but we need to ask for it. If we use the parameter trackResurrection when creating the WeakReference, yes, we will be able to recreate a strong reference to an object after it executes its finalizer.

But the default value is false for that parameter. So, the normal WeakReference will only recreate a strong reference to objects that have never participated in a garbage collection. They were available for collection, but the collection process didn't even see that. So, don't worry, in normal situations you will not be "reviving" a finalized object. And, well, if you did write the finalizer, you may know how to check if the object was finalized or not and, if necessary, recreate a valid state. But don't use weak references that track resurrection to instances that have a finalizer if you don't know what such a finalizer is doing.

Avoiding memory leaks with weak references

In this situation we have the opposite of caching. Instead of trying to reuse an object that should be considered dead, we want to use an object as long as it is still alive, but we don't want to keep it alive.

We can say that many events suffer from the strong reference problem. You may create an object, register it into an event, use it for some time, then you lose your references to it... but it will never die because it has an event referencing it.

Well, in this particular case we must remember that we should always unregister objects from any event where it was registered, but considering that we may be creating components for others, we may want to give a guarantee that our events will not keep other objects alive. So, there's a pattern called Weak Event Pattern, which I will not explain as you can easily find information about it. So, I will only point the situations that may benefit from weak references:

Events, as I just said, are a very common case.
Component groups or Manager components. I know many situations in which we have a "master component" responsible for creating many sub-components, which may have different lifetimes. But, if we decide to destroy the group, we should immediately destroy all the inner components. The Dispose() method already helps us "kill" objects individually or as a group, yet by only using the Dispose method, if users aren't disposing the inner components, they will never be removed from their manager component. But, if the manager component uses weak references to reference its children, then there are no problems. Children can die at any moment by normal collections and Dispose can also be used to kill them fast.
Parallel but independent components. I usually implement this with weak dictionaries but, in fact, weak dictionaries are already a use of weak references, so we will use a weak reference directly or indirectly. If you are unsure what I mean by parallel but independent components, think about the rendered data-templates of WPF as the visual parallel of any data (another object that doesn't know about such data templates). There are many situations in which you may want a parallel object to be the same for a given instance but, when the instance dies, its parallel may die too. In fact, if you really need that, see the ConditionalWeakTable class.

Some things to be aware of

In this article I used both the terms "weak references" and "WeakReferences" as if they were the same but, in fact, WeakReference is a class that gives you access to weak references. Such weak references only mean that you have a reference that, well, is weak, without specifying how you achieved such a weak reference.

The WeakRefence class itself is only one of the ways to achieve weak references. Internally the WeakReference class uses GCHandles. The difference is that GCHandles are value types but, if you forget to Release them you will be losing references (not the full object, as it is weak, but the reference itself). The WeakReference is a reference type and so, when you allocate a WeakReference you are allocating an entire object (with a finalizer too) to reference another object. Only that other object will be "weakly referenced". So it is usually not recommended to use WeakReferences to reference small data.

Yet, the most important thing is that independently if you use WeakReferences, GCHandles, or maybe another weak solution, you should always have a plan to remove those weak references that become null. You will surely use less memory having thousands of WeakReferences that had their values collected than by having thousands of large arrays, yet if you never remove them you will face problems (be it in memory consumption or in performance, as you may be iterating very large collections that only have two or three references alive).

Thread-safety

Garbage collections can happen at any moment and so the weak references are thread safe. Yet, the thread safety is only related to single requests to the weak references. For example, the following code is not thread safe:

if (weakReference.Target != null)
  DoSomething(weakReference.Target);

Such code is not thread safe because the first call to weakReference.Target may have a result (so it enters the if) but, just before calling DoSomething(weakReference.Target) such Target may be collected, becoming null.

A solution to this problem will be to read the weakReference.Target only once, which can be done if we put it into a variable, like this:

object target = weakReference.Target;
if (target != null)
  DoSomething(target);

In this case, we read the weakReference.Target only once, immediately storing it into a strong reference, so it will not be available for collection. Then we check the variable, which is not going to change, so the result is either null or a value that will remain in the call to DoSomething.

Such logic is very important, as many people get confused with the IsAlive property. Such property should not be used in a situation like this:

if (weakReference.IsAlive)
  DoSomething(weakReference.Target);

The reason is the same as in the first example, between the if and the DoSomething the value may be collected. It is something rare, but it may happen. So, the IsAlive property should be used only if it is used alone. In my cases, I usually only check such property if I want to remove weak references from collections when their content was already collected, so I only check the IsAlive property once.

Some types that give you weak references

There are three main types in .NET that give us access to weak references:

WeakReference class: This is usually the best option if you want a weak reference to a single item;
GCHandle struct: We can see it as an "unsafe" reference. In fact, the GCHandle may be a strong reference, a pinned reference, or a weak reference (with or without trackResurrection);
ConditionalWeakTable class: This class only appeared in .NET 4. It is a dictionary like collection in the sense that you add a new value to an existing "key". While the key is alive, the value is alive. Different from a dictionary, though, is the fact that it does not use the hash code or equality comparer to find the items, it uses the real instance (which also forbids it from using value-types). So, two different instances that are considered equal can still have different values. This is usually the best class to use if you want to extend existing instances with new properties or fields at run-time.