A Generational Garbage Collector in C++






2.57/5 (15 votes)
Dec 18, 2001
5 min read

126292

2504
A Garbage Collector framework that is based upon Generational Copying
Introduction
The inspiration of writing this code came from a paper titled "A Real Time Garbage Collector Based on the Lifetimes of Objects" by H. Liberman and C. Hewitt. In the paper, the authors have described a heap storage framework that makes storage for short-lived objects cheaper than storage for long-lived objects. Although the algorithm was presented for LISP and similar systems, the same concept is implemented in C++. The paper is available at: http://lieber.www.media.mit.edu/people/lieber/Lieberary/GC/Realtime/Realtime.html
Dynamic memory management is often referred to as making memory requests from the
operating system during different courses of program execution. All dynamic memory
requests are satisfied through an area of memory called heap. In C++ this is
done through the new
operator. The operator is implemented by a call to
malloc
,
which grants a pointer to the object after allocating its memory on the heap. However
this flexibility of acquiring memory dynamically comes at a price: i.e. it becomes the
responsibility of the programmer to return dynamically allocated memory to the free
pool, by using the delete
operator. The delete operator in turn calls
free
, which
reclaims memory allocated on the heap. When delete has been called, the object is
destroyed and its destructor has been called. Further access to the pointer data can cause
unpredictable results.
The term "Garbage Collection" is an automated process of finding previously allocated memory that is no longer reachable by the program and then regaining that memory for future use. The garbage collector does this by several ways, one of which is traversing all pointers on the heap and finding weak pointers (pointer that allows the object memory to be recovered). In simple terms, use of Garbage Collectors leverages the programmer of worrying about calling delete every time new is called. Automated Garbage Collectors can reduce development cycles for a large-scale software by approximately 30% and additionally reduce the memory leaks, resulting in a more stable system.
Some systems also use reference counting for implementing garbage collection, however they have unnerving disadvantages of their own:
- The inability to reclaim circular structures i.e. circular structures can have non-zero counts, even when garbage.
- Often results in memory fragmentation.
- Its expensive since every allocation/freeing requires addition/subtraction.
Due to the above-mentioned problems, it is not a viable option to use reference counting as a primary answer to memory management problems especially when program code begins to increase. Nevertheless, there have been very few implementations of garbage collectors available in the public domain. One of them is by Silicon Graphics. This is intended to be a general purpose, garbage collecting storage allocator for gcc compiler. The algorithms used are described in:
- Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative Environment", Software Practice & Experience, September 1988
- Boehm, H., A. Demers, and S. Shenker, "Mostly Parallel Garbage Collection", Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, SIGPLAN Notices 26, 6 (June 1991), pp. 157-164.
Design Approach
The heap that this garbage collector maintains is made up of several generations. The user
creates pointers by a wrapper pointer class Pointer<T>
. Once memory allocation
request is made, the garbage collector returns a pointer to the object (created in its own heap space)
and also records its address (remember the pointer is itself created on the stack) in a
vector<void**>
. Similarly the size of object is also recorded for future generational copying.
Once the wrapper pointer runs out of scope, or a pointer assignment is made, the garbage collection
algorithm is run, to verify integrity of all pointers. The algorithm of garbage collection involves
moving all accessible objects out of current generation,
evacuating them to another generation and then iterating the heap for resolving all pointers to the
object that has been recently relocated. Finally the memory of current generation is reclaimed. If there
is any unreachable data in the generation, it is also recovered.
The garbage collector functionality is implemented by a class named GC
. It has all static member
functions. At any time during program execution, the process of garbage collection can be forcefully initiated by a
call to GC::Collect()
. The maximum number of generations can be queried by a call to
GC::GetMaxGeneration()
. If you are curious to find out the total number of bytes allocated on
the memory managed by the garbage collector, you can call GC::GetTotalBytesAllocated()
.
class GC { private: //Array of pointers to pointers that are made on the stack static std::vector< void** > _PointersOnStack; // Holds the size of objects that are made on the stack static std::vector<unsigned int > _SizeOfObjects; //Holds all the generations static std::vector< Generation* > _Generations; // Holds total bytes allocated on the heap static int BytesAllocated; public: // Invokes the GC for all generations static void Collect(); // Invokes the GC only upto and including the generation specified static void Collect( Generation*& pGeneration ); // Call this to allocate memory from the garbage collector static void* operator new( size_t, void** pStackPtr ); // Gets maximum number of generations that have been made static int GetMaxGeneration(); // Gets the total memory (bytes) that has been allocated on the heap static int GetTotalBytesAllocated() { return BytesAllocated; } // Returns the total number of generations in the GC static int GetGenerationCount() { return _Generations.size(); } // Sets the total bytes that have been allocated by the garbage collector static void SetTotalBytesAllocated( int Value ) { BytesAllocated = Value; } };
The
pointers that are iterated during the garbage collection process must point to objects of type Pointer<T>
.
This class implements the functionality of smart pointers. It overloads several operators including assignment
operator and automatic conversion operators. Garbage collection process is invoked whenever either the Pointer
object runs out of scope or an assignment is made.
template <class T> class Pointer { private: // Invoked on assignment and destruction void Destroy() { p = NULL; GC::Collect(); } public: // Wrapped pointer T* p; // Constructor Pointer( T* p_ = NULL ) : p( p_ ) {} // Destructor ~Pointer() { GC::SetTotalBytesAllocated( GC::GetTotalBytesAllocated() - sizeof( *p ) );p->~T(); // Explicitely call the destructor Destroy(); } // Assignment operator 1 Pointer& operator = ( Pointer<T> & p_ ) { return operator = ( ( T* ) p_); } // Assignment operator 2 Pointer& operator = ( T* p_ ) { Destroy(); p = p_; return *this; } // Automatic type conversion to T* operator T*() { return p;} // Dereferencing operator T& operator*() { return *p; } // Pointer indirection operator T*operator->() { return p; } // For automatic type conversion during new call operator void**() { return ( void** ) & p; } };
The various generations of heap are implemented by a class Generation
. Each generation wraps a table
of contiguous memory locations, so by having newly created objects close
together you can have fewer page faults and the objects will also reside in the
processor cache. There are many generations and generation with is the highest
Generation number contains the objects most recently created. Each generation
has certain capacity and when the objects on heap overrun that capacity, a new
generation is automatically created. The process of condemning a generation and
collecting memory lost as a result of weak references is called scavenging.
class Generation { private: // The generation number int _GenerationNumber; // Pointers to the objects in the generation std::vector<void* > _Pointers; // Points to the top of memory available in the generation void* _pTopOfMemory; // The generation allocates memory from this location void* _pNextObjPtr; // Returns maximum size available for one generation static int MaxSize; // Table of memory inside generation BYTE MemoryTable[MaxSize]; public: // Gets the remaining memory of the Generation int GetRemainingMemory() const; // Returns maximum memory that can be allocated for one generation int GetTotalMemory() const { return MaxSize; } // Allocates memory for an object and returns its void* void* Allocate( size_t Size ); // Gets the generation number int GetGenerationNumber() const { return _GenerationNumber; } // Performs bitbybit copying static void* CopyBitByBit( const void* pSourceLocation, size_t Size, Generation* pTargetGeneration ); // delete operator void operator delete( void* v ) { free( v ); } public: Generation( int GenNumber = 0 ); };
How to use it
Objects allocated with the built-in "::operator new
" are uncollectable. Only objects allocated with overloaded
new operator that takes address of pointer as the second argument are collectable. For ease of programming,
I have also written automatic conversion function of Pointer
to void**
:
void* operator new( const size_t sz, void** pVoid ) { return GC::operator new( sz , pVoid ); }
For example the following code demonstrates differences in object creation and usage by above garbage collector:
int* pInt = new int; // Traditional approach to create pointers - memory leaks !!! Pointer<int> pInt = new(pInt) int; // My approach - no memory leaks
Run the demo program and see memory allocated in the task manager of WinNT/2k. You would observe considerable difference in performance of the system with and without garbage collector. In order to prove the quality of garbage collection, I have also overloaded global new and delete operator to increment and decrement memory allocation counter. This counter can be a valuable indicator for detecting memory leaks in program.
Earlier I had many problems with a call to virtual functions in pointers allocated on GC's heap, but now all those have been solved. Right now memory allocation for arrays of objects has not been implemented, but that would be done in near future. I would love to have any suggestions for improving the design and functionality of the garbage collector.
Revision History
26 Jun 2002 - Resized and reformatted code to prevent scrolling