Click here to Skip to main content
15,867,851 members
Articles / Programming Languages / C++
Article

Sophisticated memory allocation

Rate me:
Please Sign up or sign in to vote.
3.56/5 (12 votes)
22 May 20044 min read 98.1K   30   24
How to manage heap in an extremely effective way.

Introduction

In this article, I would like to discuss how the performance of a program can be improved dramatically by a bit sophisticated memory allocation management.

Types of memory allocation.

Variables are used to store data. The question is how they are managed during the program execution. Generally speaking, there are 3 types of allocations:

  1. Global (or static) variables and TLS (in multitasking environment). The memory is allocated for the whole duration of the program (or thread in TLS-case) execution. The allocation itself takes place at the startup only. But in this way, it is possible to allocate only a specific set of variables.
  2. Automatic variables, the most common ones. Those are temporary variables declared in functions (or inner scopes), parameters in function calls and etc. The memory is taken from the stack. Taken, not allocated, because the memory for the stack has already been allocated (strictly speaking, it is sometimes just reserved, and than allocated on demand). This is very effective, fast and a simple way of memory allocation. You don't need to free it, the compiler does it automatically by updating the stack pointer upon exit from the scope of such a variable. Note also that in the multitasking environment, each thread of execution has its own stack, so that threads don't share automatic variables.

    In other words, this method should be used when possible. But, there are cases when this is not possible:

    • The allocation size is unknown on the compilation stage.
    • The variable must be allocated and freed in an unpredictable or a compilation stage order.
    • The allocation is too big to fit the thread's stack. Well, this can be fixed by increasing the stack’s maximum stack size, but it is not always ideal, because the memory allocated for the thread’s stack is never free until the thread exits (at least on Win32 systems).
  3. Dynamic variables, so-called on-heap allocation. Usually allocated by a call to an operator or a function. It doesn't have the limitations of the on-stack allocation.

    However, it has the following disadvantages:

    • It's up to you to track the life duration of each of such variables, and deallocate it when necessary (again, by calling a function or an operator).
    • The allocation (and freeing) is usually a complex operation, even more complex in the multitasking environment. It involves searches; hence its performance is not constant. Also, there is usually a memory fragmentation, which affects both performance and the overhead.

So, let's make meanwhile some preliminary conclusion:

Dynamic variables must be avoided whenever it is possible. I explicitly insist on it. Unjustified use of dynamic variables can lead to catastrophic performance degradation. Frequent dynamic allocations should be cached.

Sounds crazy? Once, our company used an open-source video codec. It worked OK, but was too heavy. Believe it or not, I managed to make it work 3 times faster by just optimizing its memory management, leaving the algorithm intact.

Imagine now a situation when you have to allocate blocks of memory of a constant (or at least limited) size, and free them in a random order. This situation is very common; it may include some container class implementation (you have to construct nodes), server application with some state structure per client, and etc. There is obvious need for on-heap allocation; however, there is a way to eliminate its disadvantages.

Look at the following class:

Interface:

class CAllocator {
 
       PBYTE m_pCache;
       ULONG m_nFree;
       ULONG m_nFirstFree;
       ULONG m_nTotal;
       
       ULONG m_nAllocationSize;

       PBYTE NodeAt(ULONG nIndex)
       {
           return m_pCache + nIndex * (m_nAllocationSize + sizeof(ULONG));
       }
       ULONG& IndexAt(ULONG nIndex)
       {
           return *(ULONG*) (NodeAt(nIndex) + m_nAllocationSize);
       }
 
public:
       CAllocator(ULONG nAllocationSize) :
           m_pCache(NULL),
           m_nFree(0),
           m_nFirstFree(0),
           m_nTotal(0),
           m_nAllocationSize(nAllocationSize)
             {}
 
       ~CAllocator()
       {
           Clear();
       }
 
       bool Init(ULONG nAllocationSize, ULONG nCount);
       bool Init(ULONG nCount) { return Init(m_nAllocationSize, nCount); }
       void Clear();
       void Drop();
       PVOID Alloc();
       void Free(PVOID pPtr);
};

Implementation:

bool CAllocator::Init(ULONG nAllocationSize, ULONG nCount)
{
       Clear();
       m_nAllocationSize = nAllocationSize;
       DWORD dwBlockSize = nAllocationSize + sizeof(ULONG);
 
       if (m_pCache = new BYTE[nCount * dwBlockSize])
       {
           m_nTotal = nCount;

           Drop();
           return true;
 
       }
       return false;
}
void CAllocator::Clear()
{
       if (m_pCache)
       {
           delete[] m_pCache;
           m_pCache = NULL;
       }
       m_nTotal = m_nFree = m_nFirstFree = 0;
}
void CAllocator::Drop()
{
       for (m_nFree = 0; m_nFree < m_nTotal; m_nFree++)
           IndexAt(m_nFree) = m_nFree;
}
PVOID CAllocator::Alloc()
{
       if (!m_nFree)
           return new BYTE[m_nAllocationSize];
 
       ULONG nFreeIndex = IndexAt(m_nFirstFree);
       ++m_nFirstFree %= m_nTotal;
       m_nFree--;
 
       return NodeAt(nFreeIndex);
}
 
void CAllocator::Free(PVOID pPtr)
{
       if (m_nFree < m_nTotal)
       {
           ULONG nDiff = ((ULONG) pPtr) - ((ULONG) m_pCache);
           ULONG nIndex = nDiff / (m_nAllocationSize + sizeof(ULONG));
           if (nIndex < m_nTotal)
           {
               IndexAt((m_nFirstFree + m_nFree) % m_nTotal) = nIndex;
               m_nFree++;
               return;
           }
       }
       delete pPtr;
}

This class is for caching allocations of a constant (or at least limited) size. This is how it can be used:

struct CMyState {
       // some members
};
 
// Construct an allocator object.
CAllocator allForState(sizeof(CMyState));
 
// Now init its storage. Let's say that we want to cache
// up to 300 simultaneous CMyState structuers.
allForState.Init(300);
 
// Ask it now for an allocation block.
CMyState* pState = (CMyState*) allForState.Alloc();
 
// Don't forget to free it.
allForState.Free(pState);

How it works

It works like a worm on a cycle race J. There is a cycle race of allocation blocks, and there is a worm, whose length is initially zero. You ask to allocate a block – the worm eats the allocation by its head, its tail is left intact, and its length is now 1. You ask again, and again, the head advances, the tail doesn’t move, the worm grows.

Now, you free something (doesn’t matter in which order). Our worm exhales it by its tail. The tail is advanced, so that the worm shrinks. This allocation that you just freed is put where the worm’s tail was, so that it can be eaten again by the worm’s head in the future.

And so on. This method is extremely fast, since it doesn’t perform any searches at all. If the cached pool is exhausted (the worm’s head has reached its tail) – it simply calls to an extern allocation function, so that it is guaranteed to work always (unless the extern allocation function fails).

Tips and precautions

  1. The implementation I’ve given is NOT thread safe. In other words – you cannot use the same allocator object in different threads simultaneously. You can either put some synchronization objects (critical section), or just avoid such situations.
  2. Pay special attention not to exceed the scope! If you exceed your allocation’s scope – you’ll also confuse our worm, so that consequences would be unpleasant J.
  3. To make the program look more beautiful, you can encapsulate the use of the allocator by the following:
    struct CMyClass {
           // some members
     
           void* operator new (size_t nSize, CAllocator& allObj)
           {
               return allObj.Alloc();
           }
           void operator delete (void* pPtr, CAllocator& allObj)
           {
               return allObj.Free(pPtr);
           }
    };

    And then, the syntax would be:

    CMyClass* pObj = new (allObj) CMyClass;
    // some code...
    delete (allObj) pObj;

    Or you can omit the CAllocator parameter in the new and delete operators, by simply using global allocator variable. But, I personally hate using global variables.

That’s all. Enjoy.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer (Senior)
Israel Israel
My name is Vladislav Gelfer, I was born in Kiev (former Soviet Union), since 1993 I live in Israel.
In programming I'm interested mostly in low-level, OOP design, DSP and multimedia.
Besides of the programming I like physics, math, digital photography.

Comments and Discussions

 
GeneralCool Pin
theidealist17-Dec-06 17:00
theidealist17-Dec-06 17:00 
Questionit is a ring queue? Pin
littledraon1-Mar-05 21:57
littledraon1-Mar-05 21:57 
Questionit is a ring queue? Pin
Anonymous1-Mar-05 21:56
Anonymous1-Mar-05 21:56 
Questionwhat about _alloca? Pin
theodor eicke31-Jan-05 0:10
theodor eicke31-Jan-05 0:10 
Questioncould you please explain? Pin
peterchen29-Jun-04 1:06
peterchen29-Jun-04 1:06 
AnswerRe: could you please explain? Pin
valdok29-Jun-04 9:46
valdok29-Jun-04 9:46 
In fact we have two buffers.
The first - is a pre-allocated memory block with the size of blocks * sizeof(element).
The second is a ring buffer with size of blocks * sizeof(index).

In my implementation I united those two buffers into a single memory block sized blocks * (sizeof(element) + sizeof(index)). This is because I hate to allocate memory Smile | :) , but this is just an implementation detail.

1. Initialization. First of all we initialize our second buffer (which is a worm cycle Smile | :) ) so that every element in it (which is an index in fact) points to a distinct element in our first buffer. It can be made by assigning those indexes continous numbers from 0 to blocks - 1 respectively. Than we place our worm in this cycle. No matter where, we only need to set its size to zero.

2. You require a block. The worm "eats" an index by its head, and the application receives a block which was pointed by that index.

3. You free a block. We calculate its index ((pointer - array_start) / sizeof(element)). And this index is "exhaled" by our worm, so that it can be eaten again by its head in the future.

Now, there are situations when our pool is exhausted, this is when the worm's head reaches its tail. In such a situation we allocate memory using extern new function. When this block is freed than, we calculate its index. Because it was not really allocated by our allocator we'll get an invalid index. For such blocks we'll use extern delete function.
GeneralVery good Pin
Nikola Mihaylov8-Jun-04 2:24
Nikola Mihaylov8-Jun-04 2:24 
QuestionWhy you add this ? Pin
ftai6-Jun-04 16:41
ftai6-Jun-04 16:41 
AnswerRe: Why you add this ? Pin
kdkd8-Jun-04 8:40
kdkd8-Jun-04 8:40 
GeneralRe: Why you add this ? Pin
valdok8-Jun-04 12:38
valdok8-Jun-04 12:38 
GeneralRe: Why you add this ? Pin
ftai8-Jun-04 13:35
ftai8-Jun-04 13:35 
GeneralCBuffer Pin
Kochise31-May-04 21:27
Kochise31-May-04 21:27 
Generalsynchronization &amp; dynamic size Pin
Uri Twig31-May-04 0:16
Uri Twig31-May-04 0:16 
GeneralFree Pin
Mr Blonde27-May-04 2:30
Mr Blonde27-May-04 2:30 
GeneralRe: Free Pin
valdok27-May-04 3:14
valdok27-May-04 3:14 
GeneralRe: Free Pin
Mr Blonde27-May-04 21:52
Mr Blonde27-May-04 21:52 
GeneralAnswers Pin
valdok24-May-04 23:29
valdok24-May-04 23:29 
GeneralStart here... Pin
compiler24-May-04 11:05
compiler24-May-04 11:05 
GeneralRe: Start here... Pin
Don Clugston24-May-04 15:40
Don Clugston24-May-04 15:40 
GeneralRe: Start here... Pin
JamesC12331-Dec-04 7:43
JamesC12331-Dec-04 7:43 
QuestionIs there really an advantage? Pin
ReorX24-May-04 8:24
ReorX24-May-04 8:24 
Generalhm Pin
wb24-May-04 4:54
wb24-May-04 4:54 
QuestionDynamical? Pin
armentage24-May-04 4:27
armentage24-May-04 4:27 
AnswerRe: Dynamical? Pin
Marcello30-Jul-04 10:21
Marcello30-Jul-04 10:21 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.