Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A garbage collection framework for C++

0.00/5 (No votes)
17 Jan 2001 1  
An article on using garbage collection in C++ through the use of smart pointers.
  • Download demo project - 6 Kb
  • Download source - 3 Kb

    Introduction

    One of the biggest complaints that is leveled at C++ is its lack of garbage collection. Programmers used to more modern languages such as Java find that programming in C++ is difficult and error prone because you have to manually manage memory. To some extent they are right, though most die hard C++ programmers would hate to give in entirely to a garbage collecting system.

    The idea behind garbage collection is that a runtime system will manage memory for you. When you need a new object you explicitly create it, but you never have to worry about destroying it. When there are no more references left to the object it is eventually automatically destroyed by the runtime system. This makes complex systems that must dynamically create objects frequently much easier to code, since you never have to worry about destroying the objects. This sounds very appealing, so why did I say "most die hard C++ programmers would hate to give in entirely to a garbage collection system?"

    The problem is that memory is just one type of resource. There are numerous other resource types, such as thread or database locks, file handles, Win32 GDI resources, etc. A GC (garbage collection) system only addresses memory, totally ignoring the other resource types. In most modern languages that support GC the programmer is still left to manually manage these other resources. In contrast, the C++ programmer can easily manage these other resources by wrapping the resource in an object that uses the RAII (Resource Acquisition Is Initialization) idiom. When the resource wrapper goes out of scope the destructor will automatically release the wrapped resource.

    At first you might think that adding destructors to objects in a GC langauge would be enough to allow the best of both worlds. This is almost true. Most GC languages have no concept of "stack data", so all objects are created on the heap and are destroyed only when the collector runs. This is non-deterministic, meaning that you have no idea *when* the collector will run. The result is that you won't know when the wrapped resource is released, which can be critical for many resource types. For instance, a thread lock should be held for the absolute shortest possible time. Hold it longer and you may starve other threads, or even theoretically cause a deadlock. You simply can't allow the lock to be released in a non-deterministic manner. So what you really want is a language that supports GC, stack data and destructors. That's why I said "most die hard C++ programmers would hate to give in entirely to a garbage collecting system."

    Smart Pointers

    C++ programmers have become accustomed to using the RAII idiom to manage memory. They wrap a pointer to an object created with operator new up in a class that emulates a regular pointer by defining certain operators such as '->' and '*'. These wrappers are known as "smart pointers" because they emulate a regular pointer while adding more intelligent behavior. A classic example of such a smart pointer comes with the standard C++ library and goes by the name of std::auto_ptr. This smart pointer was specifically designed to insure that memory is freed if an exception occurs, but it's not really usable for more complex memory management. With std::auto_ptr there can be only one "owner", or pointer referring to an object instance. Copying or assigning an std::auto_ptr transfers ownership, setting the original std::auto_ptr to NULL in the process. This prevents the std::auto_ptr from being used in a standard container such as std::vector, for instance.

    Many other smart pointer implementations attempt to allow multiple "owners", or pointers referring to the same object instance, through reference counting. A reference count is maintained that's incremented when the smart pointer is copied and decremented when the smart pointer is destroyed. If the ref-count ever reaches zero the object is finally destroyed. CodeProject includes a couple of smart pointer implementations that use this idea.1,2 There's also a version that's undergone thorough peer review that can be found in the Boost library.3

    Reference counted smart pointers are quite handy and work for many situations, but they aren't as flexible as true garbage collection. The problem is with circular references. If object A has a ref-counted pointer that points at object B which has a ref-counted pointer that points back at A then neither object will ever be destroyed since their ref-counts will never reach zero. It can be difficult to prevent such circular references.

    One approach to fixing this problem is to use a "weak pointer". A weak pointer is a pointer that references an object, but does not increment or decrement the ref-count. A regular C++ pointer is a "weak pointer" but it has a draw back to use. When the ref-count goes to zero the object is deleted which may leave regular pointers used as weak pointers "dangling". For this reason another smart pointer type is often used for a weak pointer. This other smart pointer can cooperate with the ref-counting smart pointer so that when the object is deleted the "weak pointer" will be automatically set to NULL, thus preventing dangling pointers.

    This works, but still leaves the programmer with the responsibility of spotting circular references and correcting them. This is not always easy, or even possible, to do. So, in some cases the C++ programmer is still left wishing that C++ had a GC system on top of the other facilities it already has.

    Conservative GC Libraries

    There are several free and commercial implementations of garbage collectors written for C/C++ that replace malloc and new respectively. All of these collectors are known as "conservative" collectors. This means that they err on the side of caution when trying to determine if there are any remaining pointers referring to an object. The reason for this is that in C and C++ it's possible to manipulate a pointer by storing it in other data types, such as unions or integral types.4 Unfortunately, this means that sometimes they fail to collect objects that are really no longer in use.

    Another problem with such libraries is that they only work on objects created with special forms of malloc and/or new. Memory that was allocated by third party libraries, for instance, won't be collected.

    gc_ptr

    So, is there a better solution for C++? This article and the accompanying code attempts to define a better solution by marrying the concept of smart pointers with the concept of traditional garbage collection. The result is a class template called gc_ptr.

    The gc_ptr class addresses the first issue with traditional GC libraries, namely the ability to be non-conservative. The only type that needs to be evaluated as a candidate for referring to an object is the gc_ptr itself. All other types of references to the object are considered to be "weak pointers" and are not considered. Another benefit to the smart pointer approach here is that the implementation is made much simpler since the collector doesn't have to try and find references to the objects, since all such references can be registered with the system at the time of creation.

    The second issue is harder to deal with. The implementation of gc_ptr was originally based on the implementation for a similar class, circ_ptr, submitted for consideration to Boost by Thant Tessman.3,5 This original implementation allowed the circ_ptr to be created with a pointer to any object allocated by operator new. Unforunately this implementation left a serious hole open for misuse of the class. The size of the object was registered with the first creation of a circ_ptr that referred to the object. This size was used to determine if an object contained other circ_ptrs in order to find circular references. If a circ_ptr<base> were to be initialized with a pointer to a derived type because it registered the size as sizeof(base) the code may fail to recognize that the object contains another circ_ptr and thus prematurely collect an object. Using a templatized constructor in the original implementation would have made it less likely for this to occur since the size could be calculated from the type of pointer passed to the constructor, but this would not have eliminated the possibility for error as illustrated by the following code:

    base* factory(int i)
    {
    	switch (i)
    	{
    	case 1:
    		return new derived1;
    	case 2:
    		return new derived2;
    	default:
    		return 0;
    	}
    }
    
    circ_ptr<base> ptr(factory(1));
    

    To actually eliminate this possibility we need to provide a custom way of allocating objects that are to be garbage collected. The implementation for gc_ptr does this by providing an overloaded new operator that takes an instance of gc_detail::gc_t in addition to the size_t. In this overload the size of the object allocated is stored to enable proper detection of whether or not an object contains another gc_ptr. With this overload the above example can be coded safely as (note the unusual syntax for new):

    base* factory(int i)
    {
    	switch (i)
    	{
    	case 1:
    		return new(gc) derived1;
    	case 2:
    		return new(gc) derived2;
    	default:
    		return 0;
    	}
    }
    
    gc_ptr<base> ptr(factory(1));
    

    Objects pointed to by gc_ptr now must be allocated only through the syntax "new(gc)". The variable "gc " passed to operator new using this syntax is declared in an unnamed namespace by the library as a convenience and to prevent violations of the "one definition rule." Objects allocated through the standard new operator that are used to construct a gc_ptr will throw an immediate std::invalid_argument exception. This makes it easy to find such errors at run time and prevents misuse. Because of this requirement a gc_ptr is not a direct replacement for regular pointers, but it's still a relatively simple replacement. All that's needed is for you to change the code that allocates the object.

    Interface

    The interface for using gc_ptr is relatively simple. There are two global methods that are used to control garbage collection and the gc_ptr class itself.

    void gc_collect()

    This is the heart of the collector. When invoked a mark-and-sweep algorithm is run to find all objects which are currently no longer in use and then destroys them. The programmer is free to make use of this method to invoke garbage collection at any point. However, there isn't a requirement that the programmer ever call this method. The new(gc) operator will invoke this method automatically if either a programmer specified threshold (see below) of memory has been allocated since the last call to gc_collect or if memory has been exhausted. In most cases this automatic collection should be enough.

    void gc_set_threshold(size_t bytes)

    This method sets the threshold at which gc_collect is automatically called by operator new(gc). Initially the threshold is set to 1024 bytes but may be changed by this method to optimize performance. There are two interesting values that may be used in a call to this method. Passing in zero will result in memory being collected for every invocation of operator new(gc). This will degrade performance but will insure optimal use of memory. Passing std::numeric_limits<size_t>::max() will have the opposite effect, turning automatic collection off entirely, though collect will still be called if memory is exhausted. This will give the best possible performance but will result in the worst possible use of memory.

    An interesting technique could be applied to create a "parallel garbage collector" by passing in std::numeric_limits<size_t>::max() to this method to shut off automatic collection and then coding a thread that periodically calls gc_collect itself. This may result in optimal performance and memory use in multithreaded programs.

    gc_ptr::gc_ptr()

    Constructs a gc_ptr with a NULL reference.

    explicit gc_ptr::gc_ptr(T* p)

    Constructs a gc_ptr to point at an object constructed with operator new(gc).

    template <typename U> explicit gc_ptr::gc_ptr(const gc_ptr<U>& other)

    Constructs a gc_ptr from another gc_ptr. The template definition allows related types to be copied, though only single inheritance relationships may be used safely with this implementation. If a solution can be found for multiple inheritance a future version will do so. For now you should avoid using types that use multiple inheritance in gc_ptr.

    gc_ptr<T> gc_ptr::operator= (T* p)

    Points a gc_ptr to a new object allocated by operator new(gc).

    template <typename U> gc_ptr<T> gc_ptr::operator= (const gc_ptr<U>& other)

    Points a gc_ptr to the object pointed to by another gc_ptr. Again the template definition allows related types to be copied but is unsafe for use on types that use multiple inheritance.

    T* gc_ptr::get() const

    Returns a real C++ pointer to the object pointed to by this gc_ptr. This is an explicit cast. For safety reasons there is no implicit cast to the real pointer.

    T& gc_ptr::operator*() const

    Allows the gc_ptr to be derefenced in the same manner that a real C++ pointer would be.

    T*gc_ptr::operator->() const

    Allows normal "arrow notation" to be used on the gc_ptr in the same manner that a real C++ pointer would.

    Pros

    Provides simple garbage collection to C++ programmers. The code that implements the gc_ptr is short and contained in only two files that are easily included in your projects.

    Eliminates memory management errors that occur with traditional ref-counted smart pointers when circular references are encountered.

    Provides a safe implementation that's not restricted to "conservative garbage collection".

    Provides a flexible interface that allows the programmer to control when garbage is to be collected, including a simple way to implement "parallel garbage collection".

    Easy to use.

    Thread safe. When used in a multithreaded program calls to gc_collect, and thus calls to operator new(gc), may block if another thread is currently collecting garbage. In practice this shouldn't cause any noticable speed difference, since both operations are (relatively) slow any way.

    Cons

    Not as efficient as manual memory management or ref-counted smart pointers.

    Requires the use of an overloaded operator new(gc) for allocation of the objects that are to be garbage collected. This syntax is non-standard, but the implementation conforms to the C++ Standard.

    Will not work properly with types that use multiple inheritance. This is the most serious hole in the implementation and simply results in undefined behavior, not in a compile time error. A solution to this problem is needed so if any programmers know of one I'd love to hear it.

    Pointers to objects returned from operator new(gc) that are never pointed to by a gc_ptr will leak. There's no way for the collector to know how to delete such objects since the object's type is not registered until the first assignment of a gc_ptr to the object.

    Possible Future Enhancements

    Some enhancements that may be provided in the next version include the ability to use types that use multiple inheritance and to include ref-counting.

    The multiple inheritance problem will be hard to solve. The garbage collector must save a void pointer to the object and the gc_ptr uses only this void pointer internally. Casting the void pointer to the templated pointer type using static_cast<> is safe for types that don't use inheritance or that use only single inheritance, but it's not safe for types that use multiple inheritance. If any programmers know of a solution to this problem I would love to hear from them.

    The ref-counting would help to optimize memory usage by destroying objects that don't participate in circular references immediately instead of only when gc_collect is called. This shouldn't be too difficult to add to this implementation and was left out only because such uses should be coded with ref-counted smart pointers instead of gc_ptr in order to optimize the performance of both. This makes ref-counting in gc_ptr only a minor benefit.

    Part Two of this article is now available.

    Special Acknowledgement

    The code presented here was based on the implementation of circ_ptr5 written by Thant Tessman and submitted to Boost.3 The implementation was refactored to minimize compile time dependencies, address the usage issue that led to new(gc), and to provide a more efficient collection algorithm. I'm indebted to Thant not only for the original circ_ptr but for input during the refactoring for my implementation. The code presented here is original, but would not exist with out the original efforts and later input given by him.

    Footnotes

    1 A Simple Smart Pointer, an article found on CodeProject.

    2 A Smart Pointer Capable of Object Level Thread Synchronization and Reference Counting Garbage Collection, an article found on CodeProject.

    3 Boost, an organization of programmers working to produce quality libraries that undergo extensive peer review to the public domain. Many of these libraries are hoped to be considered for inclusion in later C++ standards.

    4 The standard doesn't gaurantee which, if any, integral type a pointer can be placed into, but in practice most platforms allow this, and it's a common technique used.

    5 circ_ptr.zip. This link requires you to be a member of the Boost eGroup mailing list. See the Boost3 web site for instructions on joining.

  • License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here