Premature .NET garbage collection, or "Dude, where's my FooBar?"

Phil Atkin

5.00/5 (15 votes)

Aug 26, 2011

CPOL

6 min read

52053

A surprising and potentially destructive 'feature' of the .NET garbage collector

Introduction

Here's the scenario: I was trying to debug a large, multi-threaded, multi-module .NET scientific imaging application that combined managed and unmanaged (including COM) code modules. Some customers were reporting a crash at a particular point in the app which had generally taken them at least half an hour of work to get to - we were not winning friends. But the logs showed that the crash was happening in a well-tested piece of code, and further stress testing showed no reason to doubt its correctness. Certain customers seemed to see the crash a lot, whereas others never did. Last - and worst - we could never reproduce it in house. All we could do, it seemed, was to add more instrumentation, let the users run it, and hope that would shed some light - a nightmare.

Fortunately, I noticed that another app - a simpler one - was failing in a similar way. Since this app was on one of our company's machines - albeit one in Lisbon running Portuguese Windows 7 - I was able to take direct control over it and start to track down the bug. Nevertheless, it took me a long time, because (a) nothing I could find using debuggers and diagnostics gave any pointer to the root cause, and (b) there seems to be very little written about this phenomenon.

If you think you know about .NET garbage collection, read on - you may be surprised

The Nub

Here is a piece of code distilled from this experience. I started with tens of thousands of lines of code, multiple threads, camera hardware, COM servers etc., but this is all I needed to reproduce the bug:

Foo a = new Foo();
while (true)
{
    FooBar b = new FooBar();
    b.WorkWith(a);
}

See anything wrong with this? Any reason why the WorkWith method should crash horribly, given that it's been tried and tested many times? No, nor could I. The exception I saw was a c5 (0xC000005) exception, reported by the CLR as an AccessViolationException with the message "Attempted to read or write protected memory. This is often an indication that other memory is corrupt". The problem is compounded by the fact that it's generally very difficult to provoke the crash under a debugger. However, further instrumentation revealed that at the point of the exception the program was inside the WorkWith method, but b had been destroyed.

The final, crucial information you need is that Foo and FooBar are implemented in a mixed-mode (managed and unmanaged) class library, and that WorkWith invokes unmanaged code. In fact, it seems that is all you need to know to predict that the code above will sometimes fail with a c5 exception.

Foo a = new Foo();
while (true)
{
    FooBar b = new FooBar();
    b.WorkWith(a);
    GC.KeepAlive(b);
}

The code above does not fail. The call to KeepAlive simply tells the Garbage Collector that b is required up to that point, and mustn't be collected (finalized) before that. Why is this necessary? It turns out that as soon as execution enters the unmanaged portion of the WorkWith method, b becomes eligible for garbage collection (finalization). If b is finalized, of course, then the entire FooBar is going to be destroyed, and almost anything might happen (although I've only seen a c5 exception). It is generally understood that it is good practice to implement IDisposable on classes that contain unmanaged resources. However, there's a prevailing view that the provision of a finalizer in such a class ensures safe behaviour in the case that the programmer fails to call Dispose. However, the combination of forgetting to call Dispose with the use of unmanaged code, can be disastrous and in a way that is very difficult to track down. It is clear that any 'use' of b following the call to WorkWith will keep it alive. An article I refer to below claims that a call to Dispose, either explicitly or implicitly through the use of using, does not count as a 'use' of b in this context and therefore doesn't prevent the finalization of b. Testing has shown this claim to be incorrect, however, as Luc Pattyn (who clearly does know about .Net garbage collection) asserted in a comment to the original version of this tip. So if your object implements IDisposable, then Disposeing it when you're finished with it is a better way to prevent premature finalization. It is worth pointing out that it's not only mixed-mode objects that may suffer from this issue. The basic principle is that the execution of an object's member that results in unmanaged code being entered (mixed-mode is one way, P/Invoke is another) does not, of itself, keep that object alive beyond the point of entry to the unmanaged code. The object is therefore potentially immediately eligible for finalization (assuming it has a finalizer - and many such objects should have one), and this can lead to 'difficult' behaviour. So in summary, I recommend:

If your class encapsulates unmanaged resources or code, make sure you implement IDisposable.
If you are using a class that implements IDisposable, make sure you call Dispose; the best way to do that in C# is through using.
Do not regard the use of a finalizer as an acceptable way to make your code safe. Treat the execution of a finalizer as a bug. Search proactively for such cases, by detecting any calls to the finalizer during testing.
You can make your class safer for its users (who might forget to call Dispose) by appropriate use of GC.KeepAlive following places where it enters unmanaged code.

Points of Interest

This problem is one of the most slippery I've ever worked on. Under any sort of debugger, it generally failed to crash at all. The app's propensity to fail seemed to vary according to memory load and the behaviour under Win64 seemed markedly different to that under Win32.

It seems that a contributory factor was the fact that the WorkWith method generally allocated a lot of (unmanaged) memory: I presume that this allocation - which is associated with a call to AddMemoryPressure - tends to 'encourage' the GC to run.

I suspect that the most difficult aspect of the problem was the sheer unexpectedness of it: why would the garbage collector finalize an object that's still in use?

I want to acknowledge the contribution of this article, which was the only one I could find (after days of searching) that referred to this phenomenon - although as mentioned above, I believe it to be mistaken in some respects.

Finally, I'd like to thank Luc Pattyn for challenging my assertions, which has led to a more accurate Tip.

History

25 August, 2011: First version of this tip
3 September, 2011: Second version of this tip