Contents
This article demonstrates a method to try and solve a common problem hated by most C++ programmers: memory leaks and memory overruns. The method used in this article is to track all memory allocated by the program. It also has basic protection checks whether the memory written to the allocated block has overrun the number of bytes actually allocated.
This method also lets you organize and group allocated memory by ID or by name. Giving a name to a memory allocation gives the advantage that you can get the memory allocated by name while not needing to keep a pointer running around. Giving a group ID to an allocation helps the programmer to keep memory allocations grouped and thus can call a single function to deallocate all memory of a certain group.
Tracking memory is a very efficient way to keep an eye on all allocated memory. This gives you the ability to later enumerate all memory blocks allocated and also to deallocate all memory allocated. This is a garbage collection implementation.
Garbage collection is used very much in modern programming languages like all .NET languages and Java. Having a garbage collection implementation in C++ does not come without its price, but it is always a safe way to remove memory leaks completely.
When allocating memory, it has to be deallocated at some point or another. It is very easy not to deallocate the memory, either by programming errors or by the program logic. Example:
void ServePizzas(int pizzacount)
{
char* p = new char[pizzacount];
if (g_remainingpizzas == 0) return;
delete [] p;
}
In the code above, a memory block is allocated with the char* p = new char[pizzacount]; statement. The conditional statement in the middle has a return statement and causes the function to exit and all the code after it will not be executed. Since the delete [] p; statement will not be executed, the memory allocated will not be freed thus resulting in a memory leak.
The solution presented in this article involves overriding the C++ new and delete operators. The new operator is normally called like this:
char* p = new char[256];
This allocates a 256 byte array in pointer p. When using this library, the call remains as-is, but internally, the memory has been marked and tracked, allowing control over the allocated memory.
The new operator can also have parameters, and in this library, the new operator has three overloads with parameters. Let's look at some examples:
char* p = new char[256];
char* p1 = new(1) char[256];
char* p2 = new("my array") char[256];
char* p3 = new(1, "his array") char[256];
These were all the ways to allocate memory. To deallocate the memory allocated in the previous example, we normally do the following:
delete [] p;
delete [] p1;
delete [] p2;
delete [] p3;
Now, let's say that we lost the pointers p, p1, p2, and p3, how can we deallocate the memory? There are many ways with the memory tracker library. Let's find out how:
char* p = Null;
char* p1 = Null;
char* p2 = hmt_getnamed("my array");
delete [] p2;
char* p3 = hmt_getnamed("his array");
delete [] p3;
In this method, we could recover most of the pointers but could not recover pointer p and p1 since we did not assign a name to them. There is still a way to deallocate them though. Let's look at method 2:
hmt_garbagecollect(0);
hmt_garbagecollect(1);
In this method, we deallocate group 1 since p1 and p3 were assigned to group 1, and group 0 since p and p2 were not assigned to any group and thus they belong to group 0. Let's have a look at a more brute-force approach:
hmt_garbagecollect();
In this method, we force all memory allocated by the new operator to be deallocated, even p which was allocated using the normal new operator. This makes sure that there are no memory leaks.
If you need to know how much memory the program has allocated, you can use the following code:
size_t memallocated = hmt_getallocated();
printf("Memory allocated using new: %d", memallocated);
If you need to print debug information on all the memory allocated:
hmt_debugprint();
If you want to print debug information to see if any memory block has been overrun or trashed:
hmt_debugcheck();
Since we override the new statement, we have control on how much memory is allocated. Some more memory than actually requested is allocated to make space for tracking data.
The overhead for each memory allocation is 32 bytes for 32-bit processors, and 64 bytes for 64-bit processors. The following table represents the internal memory structure of an allocated block when the new statement is called. This also indicates how the memory tracker knows about all the memory allocated.
| Number of bytes |
Field |
Description |
| 4 bytes |
Trash UID |
Used to determine if memory has been overwritten. |
| 4 bytes |
Group ID |
Group ID of allocated block. |
| 4 bytes |
Hash of name |
If item is named, this will contain a hash of the name. |
| 4 bytes |
Flags |
Internal flags to describe memory block. |
| Pointer * |
Size |
Size of block including overhead. |
| Pointer * |
Previous block |
Linked list pointer to previous allocated block. |
| Pointer * |
Next block |
Linked list pointer to next allocated block. |
| n bytes |
Allocated memory ** |
Actual memory requested by the new statement. |
| 4 bytes |
Trash UID |
Used to determine if memory has been overwritten. |
* 4 bytes for 32-bit processor, 8 bytes for 64-bit processor.
** The pointer returned by the new statement points to this memory block.
With this structure, all memory blocks are known and linked together.
This module is part of a larger, more complex, memory pool module which should be published later on. There is, for sure, room for improvement in this module, one of them being the lookup of names, which is a linear search, and thus slow when there are a large number of memory allocations. In the memory pool module, a binary tree is used to speed up things, but that is for later.
This module is by no means the only method to do this kind of memory tracking, but it is a pretty efficient way to remove memory leaks.
- 16-04-2008: Original article.
|
|
 |
 | Seriously flawed [modified] T Atlicker | 16:46 24 Jul '09 |
|
 |
// B to the U to the G to the G to the Y static const double overall_rating = 0.5f;
The article alludes to tracking memory leaks and performing garbage collection but the implementation provides neither. As a whole the approach is ill conceived, poorly implemented, extremely inefficient and buggy.
Most competent developers will probably tell you that the best place to start looking is at the point of allocation. This generally requires tracking both the name of the file and line number where the allocation occurred. For some reason the implementation omits this type of functionality.
To combat leaks and reclaim memory the article instructs the developer to instead use the supplied "garbage collector". This does not solve the problem it simply hides it. Consider the following block of code:
void drain_the_lizard() { b = new object(); u = new object(); g = new object();
do_something(b, u, g);
delete b; delete u;
return; } Both "b" and "u" are deleted but "g" remains. When the application terminates it calls garbagecollect() to release all memory allocated by the tracker (see the source). The leak remains and continues throughout the execution of the application so you end up with a serious memory leak and the tracker has no idea where it occurs.
The author also doesn't mention that the garbage collection is only useful on plain old data types. If the memory reclaimed by the garbage collector contains anything other than POD the destructor is NEVER called. Unlike the garbage collectors found in the Java and .NET runtimes this one has no way of determining if the object is still in use or has been marked for later deletion or if it's even an object. Contrary to what the article states this is NOT a safe way to solve memory leaks and in my opinion this approach promotes extremely bad software development practices.
Other issues with the implementation...
Nasty typecasting. heapmemtracker::m_ptail and heapmemtracker::m_phead are declared as void pointers instead of their actual type. I still have no clue what the motivation for this is but given the code as a whole my guess is it was originally written in C then retooled for use with C++. If not. YIKES!
Hard coded size for the allocation header. Seriously. WHY? Both MEMBLOCK_ALIGN and MEMBLOCK_OVERHEAD are declared and used in a non type safe and bug prone manner. Unless there is a very compelling requirement I see no reason to set the alignment boundary to a value greater than 8 bytes. Having these values hard coded rather than based (at compile time) on the size of data used to create a tracked chuck of memory is a bug waiting to happen. Keep in mind that the memory returned by the allocator need only be suitably aligned for use with any C or C++ data type. Instead you might try something like the following which will provide a more memory efficient (and less buggy) determination of alignment:
static const size_t MEMBLOCK_ALIGNMENT = sizeof(__int64); static const size_t MEMBLOCK_ALIGN = MEMBLOCK_ALIGNMENT - 1; static const size_t MEMBLOCK_OVERHEAD = (sizeof(memblock_s) + MEMBLOCK_ALIGN) & ~MEMBLOCK_ALIGN;
named types. Boolean, True and False are provided by the language - use them.
Uint32/int32 are incorrectly defined - see the specs on long.
Lack of encapsulation. The functionality in memblockhelper are class methods of what you would likely find in memblock_s. It's a simple matter of moving the methods into memblock_s and using placement new for initialization.
new/alloc() fails to throw an exception (std::bad_alloc) on an out of memory condition.
delete/free() fails when a null pointer is passed.
new and delete operators missing support for std::nothrow_t.
new[] and delete[] operators missing. The tracker will not be called when deleting arrays "objectArray = new object[100]; delete[] objectArray;". [EDIT]
No way to detect hash collisions. Trivial I know but a source for potential bugs.
There are at least 8 bytes of unused space for each allocation on 64 bit targets. 24 bytes IF you fix the Int32/Uint32 typedefs.
No type or size information for MEMBLOCK_TRASHID. Assumed to be 32 bits.
Memory chunks are cleared (via memset) whenever they are allocated or freed. My opinion - pointless except for debug builds and very specialized cases.
modified on Saturday, July 25, 2009 3:04 PM
|
|
|
|
 |
|
 |
Thanks for your criticism, although I think with all due the respect that with the effort you have put into this post you could have changed the code and would have made an article of your own with the proposed fixes. I'm sure you have some good points but I'm also sure that many of your comments are flawed in many ways. The code has been written in C++, even so if you are saying that it was written in C but of course you cannot prove that. Everyone has his own opinions but it does not mean that if it is not done your way it is bug prone. I'm sure you're not for open-source in that case...
|
|
|
|
 |
|
 |
I stand by my original assessment of both your article and source code. It is inefficient, incomplete, contains bugs, includes and encourages unsafe operations, lacks type safety and doesn't even bother to track allocation points for locating memory leaks.
Any article containing only bug fixes and/or a proposal to fix bugs for an existing submission would likely get rejected. They explicitly request that such content be submitted to the original author first so those issues can be addressed in the original article and code.
If "many" of *my* comments are "flawed in many ways" why not cite the problems? The only issue I noticed was concerning new[] and delete[] operators. I have corrected it for clarity and tagged it with [EDIT]. Feel free to let me know of any "flaws" you think are in my comments and I will be more than happy to give you clarification. As to whether the code was originally written in C or not you should probably go back and re-read my original comment. It was stated as an opinion based on my personal observations - nothing more nothing less.
It's not about my way, your way, the bad way or the freeway. It IS however partly about implementing behavior that is not only expected but defined by the C++ standard. Throwing exceptions (technically you should reference or augment std::set_new_handler() ), accepting null pointers in delete, support for std::nothrow_t, etc. are expected of the allocator. Any existing code that expects this behavior is going to fail miserably.
BTW. What exactly does open source have to do with my comments or the bugs in your code?
|
|
|
|
 |
|
 |
All code has its target audience and for sure you are not one of them. For whatever reason you are complaining on this code only you know. Everyone has his opinion and this is going into a coding wars topic. So I'll just end it here. Inform me when you post your first article.
|
|
|
|
 |
 | My vote of 1 Achilleas Margaritis | 7:40 17 Feb '09 |
|
 |
The article says 'garbage collector', but it is not a garbage collector! it's an inefficient memory heap with tagged blocks.
|
|
|
|
 |
 | Very useful crazywew | 12:49 2 Jan '09 |
|
 |
Thank you for your article! In my opinion is a very well-written article with good examples and this solution was really beneficial for me, because I try writing bigger projects which always need high amount of memory, so optimization is very important. A word than a hundred, congratulations!
|
|
|
|
 |
|
 |
Thank you for your kind comments, very appreciated!
|
|
|
|
 |
 | That is no garbage collection Achilleas Margaritis | 7:31 4 Sep '08 |
|
 |
The code does not implement garbage collection!
|
|
|
|
 |
|
 |
As its name states, it is a memory allocation tracking system which can also garbage collect when instructed to do so. For sure it is not a full garbage collection implementation but it does help on memory allocation in a low-level language like C++. If you want a full garbage collector you can use the .NET languages or Java which have full garbage collection implementations.
|
|
|
|
 |
|
 |
It can not collect garbage. Collecting garbage means to identify which objects are reachable and discard the rest.
I don't have any problem with your article, I like your code, but please remove the references to garbage collections and collecting garbage. It has nothing to do with garbage collection.
|
|
|
|
 |
|
 |
Depends on how you see it. In C++ it is difficult to make good garbage collection without changing all memory allocations. With this method you can mark the items to collect with their own ID so in the end it does have some elements of a garbage collector. As I said previously, I did not make a fully fledged garbage collector system.
|
|
|
|
 |
|
 |
Having some elements of garbage collector does not make a garbage collector!
|
|
|
|
 |
|
 |
Then what is it? A part-time Garbage Collector? Point in view is that this is a starting point for a Garbage Collection implementation and the title is there to show that.
|
|
|
|
 |
|
 |
You can't do a garbage collector with your algorithm.
It's a memory heap of named blocks, not a garbage collector.
It's also very inefficient, with 32 bytes wasted as block headers.
|
|
|
|
 |
 | std::vector<char*> dict.push_back(); is also getting overridden ... ArtificialIntelligence123 | 3:43 1 Jul '08 |
|
 |
Hi, first of all, thanks for this wonderful article. I am testing it with vector. Vector is not getting added ...It is giving error like Null pointer can't be deleted and program sending exception. Following is my code ..
hmt_debugprint(); char* sd = new(1) char[40]; sd = "asdff"; std::vector<char*> dict; dict.push_back(sd); hmt_garbagecollect(1); hmt_debugprint();
Please help me. Thanks,
|
|
|
|
 |
|
 |
I've not tested your code, but ran into something that might be related lately: when compiling with optimizations, basically the compiler detects that between the first hmt_debugprint() and the hmt_garbagecollect, the code does nothing, so it discard compiling it for nothing.
Now, if you actually do something on the dict vector, like printf its size/content, the compiler will see it is of use, and will compile it.
Just a thought.
|
|
|
|
 |
 | cool ideals,but how to download the code? christanxw | 5:22 18 Jun '08 |
|
 |
cool ideals,but how to download the code? I cannt download the code, not open source?
|
|
|
|
 |
|
 |
At the beginning of this article there is the download link. You only have source code to download.
|
|
|
|
 |
 | Nice article LeonardoTorres | 12:19 14 May '08 |
|
 |
Hello I think this article is very nice and simple implementation.... for me is neccesary
Thanks..
Leonardo Torres
|
|
|
|
 |
|
 |
Thank you for your kind words
|
|
|
|
 |
 | Suggestion JTAnderson | 9:26 1 May '08 |
|
 |
First off, would like to say very clean, nice simple code. I like it. If you are opened to suggestions, I'd like to see:
- Multiple Heap Support. - STL Allocator support. - Being able to set a break on a certain allocation number.
Anyways, great work!
|
|
|
|
 |
 | Help to find mis-match between new[] operator and delete operator Steve (axa) Miller | 3:31 29 Apr '08 |
|
 |
Though this code does not override the new[] and delete[] operators it found the mis-match because it found the pointer given to the delete call did not have valid tracking information
Thanks
|
|
|
|
 |
|
 |
I have just tested the code with the new[] operator and it does track the memory allocated. Your problem could be that you have overwritten either at the start or at the end of the allocated memory. When you try to deallocate memory, the Heap Memory Tracker checks if you have overwritten more memory than allocated.
new int[256]; new(1, "test") int[256]; hmt_debugprint();
With this code it showed me that it did track both the memory allocations.
Make sure that you include the 'heapmemorytracker.h' header file or else the new and delete operators will not be overridden in that particular source file.
|
|
|
|
 |
 | An alternative approach using boost Ted Dunlop | 6:24 22 Apr '08 |
|
|
 |
|
 |
I personally find the boost library very bloated and so I avoid it most of the times. The only reason I have used boost is for regular expressions.
Smart pointers are very powerful but also non-trivial until you learn how to use them. For this I wanted to keep things as simple as possible.
Every developer has his way of writing code and this is one way, but the concept of this code is different than what you are trying to suggest, it tracks, groups, names and collects memory allocations.
|
|
|
|
 |
|
|