Contents
I never stop to be amazed by what you can find by browsing the MFC source code files. My latest discovery is rather spectacular. It consists of a small utility class that allows modifying how objects for a given class are dynamically allocated. This class is CFixecAlloc
. It is used inside the famous CString
class to allocate string buffers. Using CFixedAlloc
with the existing code requires very minimal changes and strives surprising improvements in performance. In this article, I will discuss CRT dynamic allocation overheads, usual alternatives to CRT shortcomings, the anatomy of the CFixedAlloc
class, how to use CFixedAlloc
in your programs, and finally, I will present a demo application so you can see by yourself what CFixedAlloc
can do for your programs.
It is a well known fact that dynamic memory allocation is expensive. It incurs performance overhead both in execution time and in space. If you need to be convinced, just take a peak in the malloc.c file provided with the CRT source code files. malloc()
function is a complex function that takes some time to be executed. Concerning the space overhead, the subject is a bit complex. Here is an overview. Depending on the version of VC++ you are using when you compile your program, and the Windows version where your program is running, a different version of heap will be used. This is out of scope for this article but if you are curious about how CRT chooses the heap version, you can find the function __heap_select()
in the heapinit.c file. In the CRT winheap.h header file, three heap versions are defined:
__SYSTEM_HEAP
__V5_HEAP
__V6_HEAP
V5 and V6 heap versions map to heap implementations private to the CRT library while the system heap maps directly to the Win32 heap services (HeapCreate()
, HeapAlloc()
, etc.). What is interesting about the V5 and V6 versions is that you can know exactly what is the space overhead for these heaps by looking at the CRT source code. For example, you can find this statement for the V6 heap:
sizeEntry = (intSize + 2 * (int)sizeof(int) +
(BYTES_PER_PARA - 1)) & ~(BYTES_PER_PARA - 1);
BYTES_PER_PARA
equals 16 bytes. This is a huge overhead. Consider the case where you would request twelve bytes. The CRT would reserve 32 bytes for this request. This is more than the double of what you requested! Unfortunately (or fortunately, it depends on your perspective), the V5 and the V6 heap are nowadays rarely used and we cannot know for sure what is the HeapAlloc()
overhead because we do not have access to this source code. However, it is a very good assumption that there is an overhead. Microsoft states this fact in the HeapCreate()
documentation:
The system uses memory from the private heap to store heap support structures, so not all of the specified heap size is available to the process. For example, if the HeapAlloc
function requests 64 kilobytes (K) from a heap with a maximum size of 64K, the request may fail because of system overhead.
Some people have written memory pool classes to workaround the CRT heap overhead. The excellent article from Uri Twig is an example of such a class. However, most of them require massive code changes to replace all new
and delete
invocations to some calls to the buffer pool class member functions. Another common potential limitation (potential because it depends on your needs) is that they lack thread safety. CFixedAlloc
addresses these limitations and it will be explained how in the next section.
Let's start by viewing the class declaration:
class CFixedAlloc
{
public:
CFixedAlloc(UINT nAllocSize, UINT nBlockSize = 64);
UINT GetAllocSize() { return m_nAllocSize; }
public:
void* Alloc();
void Free(void* p);
void FreeAll();
public:
~CFixedAlloc();
protected:
struct CNode
{
CNode* pNext;
};
UINT m_nAllocSize;
UINT m_nBlockSize;
CPlex* m_pBlocks;
CNode* m_pNodeFree;
CRITICAL_SECTION m_protect;
};
As you can see, it is a very simple class. The first thing that we can notice is that the class provides thread safety with a critical section. Next, m_AllocSize
contains the object size of the class that will be using its service, and m_blockSize
specifies the number of objects each fixed block of memory can contain. Both data members are set at the construction. The remaining unknown is the mysterious CPlex
pointer. I will not go in to the details of this class but just know that it is the class that does the actual CRT dynamic allocation calls. It only contains a pointer to another CPlex
object in order to create a linked list of CPlex
so its size is only 4 bytes. When allocating memory, it requests:
m_allocSize*m_blockSize+sizeof(CPlex)
If you are interested in knowing more about CPlex
, I recommend you to read the book MFC Internals which discusses CPlex
in more details. With this information, we can already compare the memory overhead difference between CRT and CFixedAlloc
. By using CRT, you will have the CRT overhead for each object, while with CFixedAlloc
, it is the CRT overhead plus the size of CPlex
(4 bytes) divided by the number of allocated objects. When the number of objects is big, the memory overhead is very close to zero, and as you will see with the demo, it can make a huge difference of many MBs. Now, let's look more closely at the two more important CFixedAlloc
functions:
void* CFixedAlloc::Alloc()
{
EnterCriticalSection(&m_protect);
if (m_pNodeFree == NULL)
{
CPlex* pNewBlock = NULL;
TRY
{
pNewBlock = CPlex::Create(m_pBlocks,
m_nBlockSize, m_nAllocSize);
}
CATCH_ALL(e)
{
LeaveCriticalSection(&m_protect);
THROW_LAST();
}
END_CATCH_ALL
CNode* pNode = (CNode*)pNewBlock->data();
(BYTE*&)pNode +=
(m_nAllocSize * m_nBlockSize) - m_nAllocSize;
for (int i = m_nBlockSize-1; i >= 0; i--,
(BYTE*&)pNode -= m_nAllocSize)
{
pNode->pNext = m_pNodeFree;
m_pNodeFree = pNode;
}
}
ASSERT(m_pNodeFree != NULL);
void* pNode = m_pNodeFree;
m_pNodeFree = m_pNodeFree->pNext;
LeaveCriticalSection(&m_protect);
return pNode;
}
void CFixedAlloc::Free(void* p)
{
if (p != NULL)
{
EnterCriticalSection(&m_protect);
CNode* pNode = (CNode*)p;
pNode->pNext = m_pNodeFree;
m_pNodeFree = pNode;
LeaveCriticalSection(&m_protect);
}
}
In Alloc()
, the code checks if the pool of free memory is empty, if it is then it creates a new block of memory (CPlex
) and sets up a list of free nodes. It is interesting to note that there is no wasted space to place node information because it is placed directly into each block. Of course, in order for this scheme to work, m_nAllocSize
must be bigger than sizeof(CNode)
. This is why it is checked in the constructor:
ASSERT(nAllocSize >= sizeof(CNode));
If you want to see the CRT code, with a glimpse of an eye, you can tell that CFixedAlloc
is much lighter. From an algorithmic point of view, the main factor for this simplicity is because the memory block's size is fixed (thus the name of the class). In a heap that allows variable size allocation, after many allocations and deallocations, the heap gets fragmented and the heap code has to scan the heap once in a while to merge adjacent small free blocks into big ones to make an efficient use of memory. With my demo program, I got the allocation time shred by a factor of 4 to 5.
And finally, let me introduce to you macros that help to use CFixedAlloc
with your classes:
#define DECLARE_FIXED_ALLOC(class_name) \
public: \
void* operator new(size_t size) \
{ \
ASSERT(size == s_alloc.GetAllocSize()); \
UNUSED(size); \
return s_alloc.Alloc(); \
} \
void* operator new(size_t, void* p) \
{ return p; } \
void operator delete(void* p) { s_alloc.Free(p); } \
void* operator new(size_t size, LPCSTR, int) \
{ \
ASSERT(size == s_alloc.GetAllocSize()); \
UNUSED(size); \
return s_alloc.Alloc(); \
} \
protected: \
static CFixedAlloc s_alloc \
#define IMPLEMENT_FIXED_ALLOC(class_name, block_size) \
CFixedAlloc class_name::s_alloc(sizeof(class_name), block_size) \
The DECLARED_FIXED_ALLOC()
macro overloads the new
and delete
operators of your class and thus allows you to use CFixedAlloc
with absolutely no code change. With the IMPLEMENT_FIXED_ALLOC()
, this is where you specify the block size. To complete this section, I will add that you can find more information on this type of allocation scheme in the excellent book of Scott Meyer, Effective C++.
- Include "fixalloc.h" in the header file that contains the class definition that you want to modify.
- Add the
DECLARE_FIXED_ALLOC()
macro in the class declaration.
- Add the
IMPLEMENT_FIXED_ALLOC()
macro in the CPP file that contains the class definition.
- Since
CFixedAlloc
is a private
MFC class, you have to add an additional include directory in the compiler options to point to the MFC source code directory. (I am assuming that you installed it during Visual Studio setup. It is installed by default, I think.)
- Recompile.
- Fine tune the size of the block size.
Now, you are done. Your class is upgraded for using the MFC fixed allocation service. One word of caution, if the define _DEBUG
is present during the compilation, the CFixedAlloc
macros will expand to nothing and the result will be that your class will behave as if you did no change to it. During the development of the demo program, I got this small problem where because I choose to link with the debug version of the run-time library, the _DEBUG
macro was implicitly set. Be aware of that. To figure that out, I added garbage in this block that MFC automatically adds when it creates a new file:
#ifdef _DEBUG
#define new DEBUG_NEW
#undef THIS_FILE
static char THIS_FILE[] = __FILE__;
#endif
Also, here is a small comment on the last step. It is very important that you provide a good value for the block size. If it is too big, you will be wasting memory, and if it is too small then you will not get as much performance improvements as you could get. However, even if the block size value is too small, you will still reduce the number of CRT allocation calls by a factor equal to the block size which is non negligible. The ideal value for the block size is the exact number of objects that will be allocated like in the demo program, but of course it is usually not possible to know that value.
It has been reported that when compiling the demo program with VC++2005, the user is getting the following warnings:
.\CFixedAllocDemoDlg.cpp(237) : warning C4995:
'CFixedAlloc': name was marked as #pragma deprecated
.\CFixedAllocDemoDlg.cpp(240) : warning C4995:
'CFixedAlloc': name was marked as #pragma deprecated
Since that I am not aware of any MFC class that would have superseded CFixedAlloc
and after reading this article, you should know what the class CFixedAlloc
does and how, my recommendation, if using CFixedAlloc
in your program makes a difference, is that you can safely ignore the warnings. If you feel extra cautious and you really want to make sure that your program will compile with future versions of MFC in case Microsoft removes the CFixedAlloc
class from MFC, nothing stops you from making a private copy of the CFixedAlloc
files.
This is true that Microsoft is slowly moving away from CFixedAlloc
. In MFC6, CFixedAlloc
is used with every temporary handle maps objects classes (i.e. CWnd
and CGdiObject
). Here is an example:
class CTempGdiObject : public CGdiObject
{
DECLARE_DYNCREATE(CTempGdiObject)
DECLARE_FIXED_ALLOC(CTempGdiObject);
};
Such classes have been removed in MFC7 (VC++2003). I am really curious about what motivates the MFC team to do so and if someone knows, I would really appreciate if you could share that information.
CFixedAlloc
returns a node in a "free list" when an object is deleted but the "free list" is inside the same allocated CPlex
. Since plexes are single linked, they will remain allocated for the entire duration of the program (until their head is deleted). In fact, CFixedAlloc
is more than an allocator, it is a recycler: it allocates on need but does not release. It just keeps apart for eventual subsequent allocation requests.
This results in better performance speed in case of frequent alternate new
/ delete
/ new
calls (since you will mostly reuse the system resource you already own), but may create problems in those situations when a huge amount of memory is required only for a limited time: the program will retain the memory until the head of the plexes is deleted (and since it is static
... at the end of the program) unless the user explicitly releases the memory with CFixedAlloc::FreeAll()
once he is done with the memory. You can call CFixedAlloc::FreeAll()
anytime to free yourself the memory if you have previously deleted all the allocated objects. Otherwise, the memory will be freed when the CFixedAlloc
object is destroyed. Here is a little reminder: Global objects are created in startup code before entering WinMain()
. Global objects are destroyed after the program has exited WinMain()
and right before the program gets unloaded.
When using CFixedAlloc
with the derived class, there are a few things that you must be aware of. Let's say you have:
class Base
{
DECLARE_FIXED_ALLOC(Base);
};
class Child : public Base
{
private:
};
If you do:
Child *p = new Child;
Base::operator new()
will be called and because the class Child
does not have the same size as the class Base
, CFixedAlloc
will not work. This problem is discussed with great details in the book Effective C++ and to prevent that problem, this is also why there is an ASSERT()
statement in the overloaded operator new in the DECLARE_FIXED_ALLOC()
macro. However, do not rely on the ASSERT()
macros to catch potential errors in your code because they are useless! Since the ASSERT()
macro is active only in debug mode they are of no use because in debug mode the macro DECLARE_FIXED_ALLOC()
also expands to nothing. That being said, nothing stops you from using CFixedAlloc
in a Base
class *and* its derived classes:
class Base
{
DECLARE_FIXED_ALLOC(Base);
};
class Child : public Base
{
DECLARE_FIXED_ALLOC(Child);
};
Of course, it doesn't make sense to declare the Base
class as FIXED_ALLOC
if the class is abstract
. One thing to be aware about using CFixedAlloc
in class hierarchy is that the operator delete is static
and not virtual
. Consider the following example:
Base *p = new Child;
delete p;
In this case, Base::operator delete()
would be called (I haven't tested it, but I'm 99% confident that this is what would happen) and that would be a serious problem. To workaround that problem here is a potential solution:
class Base
{
public:
virtual void Destroy() {delete this;}
DECLARE_FIXED_ALLOC(Base);
};
class Child : public Base
{
public:
virtual void Destroy() {delete this;}
DECLARE_FIXED_ALLOC(Child);
};
Base *p = new Child;
p->Destroy();
The demo program is very simple. It just declares two almost identical classes: one without CFixedAlloc
and the other using it. Then, a bunch of objects is created. This operation is timed, and once completed, the result is displayed. For computing the size overhead of the class without CFixedAlloc
, I used the overhead you would have if your program was using the __V6_HEAP
which is very unlikely and thus the computed value might not be totally exact. However, from experience, it seems to be quite accurate:
#define ITER_NUM 1000*1024
class A
{
public:
A( A *next ) : m_next(next) {}
A *m_next;
int dummy1;
int dummy2;
};
class B
{
public:
B( B *next ) : m_next(next) {}
B *m_next;
int dummy1;
int dummy2;
DECLARE_FIXED_ALLOC(B);
};
IMPLEMENT_FIXED_ALLOC(B,ITER_NUM);
void CCFixedAllocDemoDlg::OnTestTimebutton()
{
A *curAObj, *firstAObj = NULL;
B *curBObj, *firstBObj = NULL;
DWORD startA, endA, startB, endB;
register int i;
{
CWaitCursor wait;
startA = GetTickCount();
for( i = 0; i < ITER_NUM; i++ )
{
firstAObj = new A(firstAObj);
}
while( firstAObj )
{
curAObj = firstAObj->m_next;
delete firstAObj;
firstAObj = curAObj;
}
startB = endA = GetTickCount();
for( i = 0; i < ITER_NUM; i++ )
{
firstBObj = new B(firstBObj);
}
while( firstBObj )
{
curBObj = firstBObj->m_next;
delete firstBObj;
firstBObj = curBObj;
}
endB = GetTickCount();
}
displayResult( endA-startA,endB-startB );
}
#define BYTES_PER_PARA 16
void CCFixedAllocDemoDlg::displayResult( DWORD timeA,
DWORD timeB )
{
TCHAR buf[1024];
int overheadA = (32-sizeof(A))*ITER_NUM;
int overheadB =
(8+sizeof(B)*ITER_NUM + sizeof(CPlex) + (BYTES_PER_PARA-1))
& ~(BYTES_PER_PARA - 1);
overheadB -= sizeof(B)*ITER_NUM;
wsprintf( buf, __TEXT("Creating and destroying %d objects\n")
__TEXT("without CFixedAlloc\t: %4d ms\n")
__TEXT("with CFixedAlloc\t: %4d ms\n")
__TEXT("You saved %d bytes with CFixedAlloc"),
ITER_NUM, timeA, timeB,
overheadA - overheadB );
MessageBox(buf,__TEXT("Results"));
}
You could make a single thread version of CFixedAlloc
just by making a private copy of the class and remove the critical section stuff to further increase the performance gain. Johan Strand pointed out that the single thread support has already been added to MFC7 (VC++.NET). If you want to use the single thread version of CFixedAlloc
, you can use the following macros:
#define DECLARE_FIXED_ALLOC_NOSYNC(class_name) \
public: \
void* operator new(size_t size) \
{ \
ASSERT(size == s_alloc.GetAllocSize()); \
UNUSED(size); \
return s_alloc.Alloc(); \
} \
void* operator new(size_t, void* p) \
{ return p; } \
void operator delete(void* p) { s_alloc.Free(p); } \
void* operator new(size_t size, LPCSTR, int) \
{ \
ASSERT(size == s_alloc.GetAllocSize()); \
UNUSED(size); \
return s_alloc.Alloc(); \
} \
protected: \
static CFixedAllocNoSync s_alloc \
#define IMPLEMENT_FIXED_ALLOC_NOSYNC(class_nbame, block_size) \
CFixedAllocNoSync class_name::s_alloc(sizeof(class_name), block_size) \
There is a mistake in the IMPLEMENT_FIXED_ALLOC_NOSYNC()
macro. The macro parameter class_nbame
should be class_name
. These macros do not appear to be used by MFC which makes sense since there is a syntax error in the macro. This means that the no sync version has probably not been tested by the MFC team. However, once the syntax error is fixed and since the change was trivial, it should work fine. You can fix the syntax error and use the fixed fixalloc.h in your program without having to recompile MFC (the macro isn't used anywhere in MFC). I have modified the demo program so that you can test out the single thread version. All you have to do is:
- Fix the syntax error in your fixalloc.h file.
- Uncomment the following
define
statement:
#ifndef _USE_SINGLE_THREAD
class B
{
public:
B( B *next ) : m_next(next) {}
B *m_next;
int dummy1;
int dummy2;
DECLARE_FIXED_ALLOC(B);
};
IMPLEMENT_FIXED_ALLOC(B,ITER_NUM);
#else
class B
{
public:
B( B *next ) : m_next(next) {}
B *m_next;
int dummy1;
int dummy2;
DECLARE_FIXED_ALLOC_NOSYNC(B);
};
IMPLEMENT_FIXED_ALLOC_NOSYNC(B,ITER_NUM);
#endif
My testing of the no sync version has shown a speedup improvement of 40% over the original CFixedAlloc
version. The bottom line is use the no sync version if you are not developing a multithread application.
That is it! I hope you enjoyed this article, and if you did and found it useful, please take a few seconds to rank it. You can do so right at the bottom of the article. Also, I invite you to visit my website to check for any updates of this article.
- Platform SDK: Memory Management MSDN Library.
- Uri Twig, Buffer Pool\Object Pool
- Charles Petzold, Programming Windows, Fifth Edition, Microsoft Press, 1999
- Jeff Prosise, Programming Windows with MFC, Second Edition Microsoft Press, 1999
- George Shepherd, Scot Wingo, MFC Internals, Addison Wesley, 1996
- Scott Meyer, Effective C++ (3rd edition), Addison Wesley, 2005
- 01-27-2006
- Included comments in VC++2005 warnings reported by Allan Braun.
- Table of content added.
- 01-20-2006
- Expanded the "How to use" section based on the comments from readers.
- Expanded the "Suggestions" section based on the comments from Johan Strand.
- 11-28-2005