How to write friendlier code for the Garbage Collector and to gain performance boost

Cohen Shwartz Oren

4.52/5 (32 votes)

Feb 9, 2006

15 min read

84220

Learn how to create objects in a way that diminishes the GC performance cost.

Introduction

Memory management is a very important issue, even, now, in the world of managed programming languages where the .NET Framework encapsulates it entirely. In my humble opinion, it is mandatory for developers to understand the memory management process in a managed environment. The old school of developers knew much more about memory management than the novice ones mainly because they had to. The .NET Framework manages the memory for us. It is great! It saves us the headaches generated by problems like leaks and memory overriding. The problem starts when a developer starts to get lazy and forgets about the issue all together. Some developers, mistakenly, think that they can not interfere with the .NET Framework memory management process. But the truth is that the current .NET Framework still needs the developer's help to perform better. The memory management process in .NET is well transparent and encapsulated. It almost looks like there is nothing that the developer can do to harm or improve it, but in fact it can perform well if the developer knows how to use it correctly.

Disclaimer

I am not an official expert of the Garbage Collection mechanism. The origin of the stuff I write here are books, articles, and filed experience. Please also pay attention to the fact that no one knows, except for the guys in the CLR group in the Microsoft Corporation, how really the GC works. Even the official released paper by Microsoft hides some of the information, mainly due to copyright reasons.

Why writing the article?

I guessed it would have been nice to share, with the CodeProject community, the knowledge about working correctly with memory allocation in managed code. At least, I hope that the article will trigger the reader to get interested in this important topic. This article's purpose is to shed some light about Managed memory management. The reader will find tips about how to create and use memory better, explanations about the main differences between managed and unmanaged memory management, and finally, a glance at the future of memory management in the .NET Framework.

Who should read this article?

Anyone who writes managed code, in any of the .NET languages, and is keen about writing better performing code. I expect the user to have some background knowledge about the GC.

What you will not find in the article

There are many good articles about the .NET GC, that describe in details the algorithms and mechanisms of the Managed memory management (I have added some good links at the end of the article). This article emphasizes only on the performance cost of allocation and how to diminish it.

Garbage Collector assumptions

It is important to read the following section before you continue. The reason for this recommendation is that the rest of the article leans on this information. Designing a memory management mechanism requires a set of assumptions about memory usage. These assumptions eventually translate to a set of rules.

Here are the published rules which the .NET GC complies to:

Objects are allocated contiguously.
The heap is divided in to several parts called generations.
References hence, used objects are being moved from one generation level to a higher generation level during the collection operation.
Objects in the lowest generation level are the youngest.
Recently created objects tend to have a short life.
The objects in the highest generation level are the oldest and are also known as the survivors.
The older an object gets, its necessity tends to be higher and it is assumed to live longer.
The age of objects in each generation is pretty much the same.
There are no gaps between objects (due to compacting).
The order of the objects in the memory is corresponding to the order of their creation.
The garbage collection engine determines the best time to perform a collection. GC runs as a response to allocation, and it starts collecting only if there is not enough space.
Collection of certain generations is done only in cases where the relevant heap portion does not have enough free space.
The GC is turned to collect the next generation level only if there is still lack of memory after the collecting of the previous generation.

The main differences between managed and unmanaged code in terms of memory management and performance

This section is important for the experienced developer who is used to developing in unmanaged languages like C/C++ and, god forbid, in Assembly. The reason for that is that the thumb rules for creating objects in the heap in managed and unmanaged environment are simply inverted. In an unmanaged environment, the cost of allocation is negligible as long as the memory is free (without or with few fragmentations). When the memory is fragmented, searching for free space is required and it is very costly.

In an unmanaged environment, there is no direct connection between the amount of memory you are trying to allocate to the memory condition. Therefore, in case the memory is heavily fragmented, then allocating 1K or 1MB may take approximately the same time. In a managed environment, the cost is the size of the allocation. In the normal cases, where there is enough available memory, the managed memory management allocates space in a sequential manner. Therefore, there is no need to spend time on searching for free space. The performance cost is when you are running out of memory. Then the GC is activated and it starts to perform operations of clearing, compacting, and restructuring the memory, which are extremely costly.

The following affect how hard the GC will work:

How many objects you allocate.
The size of the objects.
The lifetime of the objects.

How to write friendlier code for the GC

Implement a destructor only when needed
An object with a destructor is marked in the GC as a Finalizable object. There is a real performance cost in terms of the GC operation for this kind of objects. Finalizable objects take longer to allocate and to reclaim.
- You should use a destructor mainly if you use PInvoke.
- Gather the unmanaged resources like handles in one object.
- Avoid using Finalizable objects in big arrays.
- Avoid referencing to such objects from regular objects else it will make them live longer.
Avoid calling the GC.Collect method
Calling the method without parameters cause the collecting of all generations. It is the same like calling the method with GC.MaxGeneration as the parameter.

It is a good thumb rule to count till 10 before calling this method. In a second thought, counting to 1000 is even better. Pay attention to the fact that during the collecting operation, the process threads are practically suspended. Do you need more information besides that?
Prefer creating small amounts of memory for long lived objects
Higher generations are collected rarely for optimization purposes. Large objects will cause the collection procedure to run quickly and hurt the performance.
Allocate only the exact amount of required memory
The smaller the object is, the less we pay in terms of performance because the reclaiming of memory space will be faster and the process thread's suspension time will be shorter.
Avoid references to temporary objects which might mistakenly survive
If you allocate small temporary objects all the time and these objects live for a short period of time and then die, it is fine. You just need to make sure that these objects will not be referenced later. When the time has come and the system runs out of memory, the GC will smoothly clean them and avoid the need of moving them to a higher generation and running the compacting operation which is extremely costly.
Avoid using pools
Pools are an obsolete feature in the managed environment. We have used pools in the past in, an the unmanaged environment, in order to avoid allocation during the process life time, reusing objects, and to assure that memory will be available for the process as long the execution time by allocating it up front. In the managed environment, the allocation is very fast (when there is available space) and it is better to allocate only when required.
Avoid middle range object allocation
As stated above, short lived objects are good for performance because we don't 'pay' the compaction price. A long lived object is fine only if it is really needed throughout the execution time. Long lived objects will survive during GC rounds. They are marked as survivors and the GC skips reclaiming them and deals with the temporary - short lived – objects instead. By that, we improve the performance.
Avoid heavy objects
Know that heavy objects (above 20 MB) never get compacted. They are allocated in a special space in the heap. The reason for this is that moving them in memory causes too much load for the CPU.

Finalizeable object cost

First of all, you should know that there is no real destructor in .NET languages. If there were destructors then it would have contradicted the concept of having memory management mechanisms like the .NET GC. The CLR team actually wanted to avoid the common destructor syntax altogether. Initially, the solution for clearing unmanaged code was to implement the Finalize method but because it required developers to add some additional code to the method, Microsoft decided to write it by itself. The C# compiler, for example, injects the developer destructor implementation into the Finalize method. So if you look at your code via the ILDASM tool, you will find out that your destructor is not there.

The GC handles the Finalize method differently. It creates a pointer to the object and places it in the Finalization queue. The GC, during the collection process, checks for each candidate object for deletion, if there is a pointer to it in the Finalization queue. If there is, it removes it from the queue and places it in the Reachable queue.

Later on, the CLR will run over this queue and use the pointer to call the Finalize method.

Object aliveness and their impact on the GC performance

The life of an object is the period of time in which the object survives in the managed heap. A good thumb rule is that the age of the object needs to be corresponding to its necessity. This means that it is completely OK to have a long lived elderly object but only if it is really required for all that time. For example, when you use MS Paint to draw a diagram, the previous drawn shapes are supposed to be kept in memory for a long time. The older objects will reach the highest generation. The GC checks the higher generation in significantly lower rate. The reason for that is quite logically, if the object survived a long time then there is a good chance that it will be required by the process in future. This rule marks the elderly objects as bad candidates for reclaiming. This is also why it is a bad idea to call GC.Collect because it causes the GC to run over all the generations including the higher ones, and that is a waste of time mainly because the process threads are suspended during the operation.

To summarize, let's put it like this. Necessarily, long lived object do not harm the performance in terms of memory management, with one exception - those objects are not 'heavy' in term of bytes. If they do, then there could be problems because the highest generation will run out of space and the GC will have to check them and try to reclaim the memory space. Short lived, temporary objects are also fine. Young objects that get in and out from generation 0 are wonderful. Since the allocation of objects is negligible, there is no significant hurt to performance in this behavior. In fact, this is the optimal case for the GC. Clearing generation 0 is also very fast and doesn’t have significant impact to performance.

Middle range objects are the problem. Who are they? Why and how do they mess up with the performance?

Middle range objects live long enough to be moved from generation 0 and might even get to generation 2 (the highest in the current .NET Framework version). These objects fall in between the definition of short lived and long lived objects. The problem is that, right after they arrive in the higher generation, they are no longer needed. Since the GC checks the higher generation in a lower frequency they might stay there for a long period of time, and since there is really no need to check and reclaim space in higher generations, we will pay the performance cost due to heavy compaction operations.

To illustrate this cost, take a look at the following example:

Object A is allocated and placed in Generation 0. After a certain period of time, Generation 0 is garbage collected. Object A is still required at this stage and so it moves to Generation 1. Until now, there are no problems with this scenario. After a small fraction of time (after Generation 0 is garbage collected), the object becomes useless. It will keep staying in Generation 1 till its collection which does not happen often. Now, let's say that Generation 0 collection is activated and there is still not enough memory space. In this case, the GC also collects Generation 1. Object A which is no longer required will be deleted and Generation 1 space will be compacted.

In some ways, the middle-range object problem is a side effect of the successful optimization of the generations mechanism. We win by avoiding the frequent checking of elderly objects but lose in the case of middle-range objects. These objects arrive at the long term area only to die there. They are mistakenly being treated as elderly objects while they not really behave as such.

What can we do to diminish the problem of middle range objects?

Cache it!

If you have objects that go through the following cycle of: creation, lives for a few minutes, then dies, then you should consider caching them. Instead of creating them (enters Generation 0) and then stop needing them after a short period of time (when they are in Generation 1 or 2), then creating new ones and doing the same over and over again, it will be a good idea to cache them and instead of disposing them and creating new ones, just change a state that indicates that the object is available.

For example:

Creating an instance of class A (Generation 0):

A obj = new A();

After a few minutes, the instance is positioned in Generation 1. Then we don’t need it any more:

Obj = null;

The object will stay in Generation 1 although it is dead. After a few seconds, we create a new instance of class A and etc. To improve this code, we can cache the first instance, and in the second snippet, instead of assigning it to null, just change the object state and mark its availability. By that, we return it to the cache instead of disposing it. In the long run, it will diminish the middle range problem.

A glance at the future

There is a good chance that the CLR and GC performance will improve in the future. The jump from Assembly language to C and C++ brought with it performance issues in favor of Assembly. Nowadays the performance difference between these two languages on Windows is negligible. It looks like the fox in Redmond is working hard to close the performance gap between managed and unmanaged code in terms of performance.

I am working with Microsoft technologies for the past seven years. From my knowledge, meeting the deadlines and the time to market considerations are in the top priority for the Microsoft business. This kind of business philosophy, without criticizing it, causes, sometimes, the delivery of less features and functionalities which is then set to be deployed in the next delivery wave, like in the next .NET Framework version.

It is reasonable to say here that Microsoft wants to stay the master of her domain, and works hard to optimize the .NET Framework. For instance, right now the .NET Collection library is better in terms of performance than STL, ASP.NET is better than ASP, and ADO.NET brought significant improvements to the previous ADO legacy that leans on COM.

Taking memory management away from developers helps not only in terms of avoiding memory leaks but also in other fields. For example, the use of pointers to manage memory causes exceptions or mishaps when memory is overridden or does not exist anymore. The experts in Microsoft will improve this management and test it for us programmers. The CLR and GC can look for trends and fragmentations, check the memory state etc. In terms of dynamic tuning, the GC will need to be more dynamic in terms of decreasing and increasing the size of generations, changing the number of generations, and adapting different types of algorithm to a certain process' activities, and even assigning different algorithms per generation. As we have seen, the Generations algorithm has drawbacks.

Currently, the CLR has two modes for the Garbage Collector, one for workstations and one for servers. The GC in the server mode makes use of higher resources of RAM and CPUs, and runs faster. Assigning bigger heap space to processes and using the CPU more sensibly will reduce performance hits. As such resources become cheaper and cheaper, the CLR can make the most of it.