Click here to Skip to main content
Click here to Skip to main content

Sophisticated memory allocation

, 22 May 2004
Rate this:
Please Sign up or sign in to vote.
How to manage heap in an extremely effective way.

Introduction

In this article, I would like to discuss how the performance of a program can be improved dramatically by a bit sophisticated memory allocation management.

Types of memory allocation.

Variables are used to store data. The question is how they are managed during the program execution. Generally speaking, there are 3 types of allocations:

  1. Global (or static) variables and TLS (in multitasking environment). The memory is allocated for the whole duration of the program (or thread in TLS-case) execution. The allocation itself takes place at the startup only. But in this way, it is possible to allocate only a specific set of variables.
  2. Automatic variables, the most common ones. Those are temporary variables declared in functions (or inner scopes), parameters in function calls and etc. The memory is taken from the stack. Taken, not allocated, because the memory for the stack has already been allocated (strictly speaking, it is sometimes just reserved, and than allocated on demand). This is very effective, fast and a simple way of memory allocation. You don't need to free it, the compiler does it automatically by updating the stack pointer upon exit from the scope of such a variable. Note also that in the multitasking environment, each thread of execution has its own stack, so that threads don't share automatic variables.

    In other words, this method should be used when possible. But, there are cases when this is not possible:

    • The allocation size is unknown on the compilation stage.
    • The variable must be allocated and freed in an unpredictable or a compilation stage order.
    • The allocation is too big to fit the thread's stack. Well, this can be fixed by increasing the stack’s maximum stack size, but it is not always ideal, because the memory allocated for the thread’s stack is never free until the thread exits (at least on Win32 systems).
  3. Dynamic variables, so-called on-heap allocation. Usually allocated by a call to an operator or a function. It doesn't have the limitations of the on-stack allocation.

    However, it has the following disadvantages:

    • It's up to you to track the life duration of each of such variables, and deallocate it when necessary (again, by calling a function or an operator).
    • The allocation (and freeing) is usually a complex operation, even more complex in the multitasking environment. It involves searches; hence its performance is not constant. Also, there is usually a memory fragmentation, which affects both performance and the overhead.

So, let's make meanwhile some preliminary conclusion:

Dynamic variables must be avoided whenever it is possible. I explicitly insist on it. Unjustified use of dynamic variables can lead to catastrophic performance degradation. Frequent dynamic allocations should be cached.

Sounds crazy? Once, our company used an open-source video codec. It worked OK, but was too heavy. Believe it or not, I managed to make it work 3 times faster by just optimizing its memory management, leaving the algorithm intact.

Imagine now a situation when you have to allocate blocks of memory of a constant (or at least limited) size, and free them in a random order. This situation is very common; it may include some container class implementation (you have to construct nodes), server application with some state structure per client, and etc. There is obvious need for on-heap allocation; however, there is a way to eliminate its disadvantages.

Look at the following class:

Interface:

class CAllocator {
 
       PBYTE m_pCache;
       ULONG m_nFree;
       ULONG m_nFirstFree;
       ULONG m_nTotal;
       
       ULONG m_nAllocationSize;

       PBYTE NodeAt(ULONG nIndex)
       {
           return m_pCache + nIndex * (m_nAllocationSize + sizeof(ULONG));
       }
       ULONG& IndexAt(ULONG nIndex)
       {
           return *(ULONG*) (NodeAt(nIndex) + m_nAllocationSize);
       }
 
public:
       CAllocator(ULONG nAllocationSize) :
           m_pCache(NULL),
           m_nFree(0),
           m_nFirstFree(0),
           m_nTotal(0),
           m_nAllocationSize(nAllocationSize)
             {}
 
       ~CAllocator()
       {
           Clear();
       }
 
       bool Init(ULONG nAllocationSize, ULONG nCount);
       bool Init(ULONG nCount) { return Init(m_nAllocationSize, nCount); }
       void Clear();
       void Drop();
       PVOID Alloc();
       void Free(PVOID pPtr);
};

Implementation:

bool CAllocator::Init(ULONG nAllocationSize, ULONG nCount)
{
       Clear();
       m_nAllocationSize = nAllocationSize;
       DWORD dwBlockSize = nAllocationSize + sizeof(ULONG);
 
       if (m_pCache = new BYTE[nCount * dwBlockSize])
       {
           m_nTotal = nCount;

           Drop();
           return true;
 
       }
       return false;
}
void CAllocator::Clear()
{
       if (m_pCache)
       {
           delete[] m_pCache;
           m_pCache = NULL;
       }
       m_nTotal = m_nFree = m_nFirstFree = 0;
}
void CAllocator::Drop()
{
       for (m_nFree = 0; m_nFree < m_nTotal; m_nFree++)
           IndexAt(m_nFree) = m_nFree;
}
PVOID CAllocator::Alloc()
{
       if (!m_nFree)
           return new BYTE[m_nAllocationSize];
 
       ULONG nFreeIndex = IndexAt(m_nFirstFree);
       ++m_nFirstFree %= m_nTotal;
       m_nFree--;
 
       return NodeAt(nFreeIndex);
}
 
void CAllocator::Free(PVOID pPtr)
{
       if (m_nFree < m_nTotal)
       {
           ULONG nDiff = ((ULONG) pPtr) - ((ULONG) m_pCache);
           ULONG nIndex = nDiff / (m_nAllocationSize + sizeof(ULONG));
           if (nIndex < m_nTotal)
           {
               IndexAt((m_nFirstFree + m_nFree) % m_nTotal) = nIndex;
               m_nFree++;
               return;
           }
       }
       delete pPtr;
}

This class is for caching allocations of a constant (or at least limited) size. This is how it can be used:

struct CMyState {
       // some members
};
 
// Construct an allocator object.
CAllocator allForState(sizeof(CMyState));
 
// Now init its storage. Let's say that we want to cache
// up to 300 simultaneous CMyState structuers.
allForState.Init(300);
 
// Ask it now for an allocation block.
CMyState* pState = (CMyState*) allForState.Alloc();
 
// Don't forget to free it.
allForState.Free(pState);

How it works

It works like a worm on a cycle race J. There is a cycle race of allocation blocks, and there is a worm, whose length is initially zero. You ask to allocate a block – the worm eats the allocation by its head, its tail is left intact, and its length is now 1. You ask again, and again, the head advances, the tail doesn’t move, the worm grows.

Now, you free something (doesn’t matter in which order). Our worm exhales it by its tail. The tail is advanced, so that the worm shrinks. This allocation that you just freed is put where the worm’s tail was, so that it can be eaten again by the worm’s head in the future.

And so on. This method is extremely fast, since it doesn’t perform any searches at all. If the cached pool is exhausted (the worm’s head has reached its tail) – it simply calls to an extern allocation function, so that it is guaranteed to work always (unless the extern allocation function fails).

Tips and precautions

  1. The implementation I’ve given is NOT thread safe. In other words – you cannot use the same allocator object in different threads simultaneously. You can either put some synchronization objects (critical section), or just avoid such situations.
  2. Pay special attention not to exceed the scope! If you exceed your allocation’s scope – you’ll also confuse our worm, so that consequences would be unpleasant J.
  3. To make the program look more beautiful, you can encapsulate the use of the allocator by the following:
    struct CMyClass {
           // some members
     
           void* operator new (size_t nSize, CAllocator& allObj)
           {
               return allObj.Alloc();
           }
           void operator delete (void* pPtr, CAllocator& allObj)
           {
               return allObj.Free(pPtr);
           }
    };

    And then, the syntax would be:

    CMyClass* pObj = new (allObj) CMyClass;
    // some code...
    delete (allObj) pObj;

    Or you can omit the CAllocator parameter in the new and delete operators, by simply using global allocator variable. But, I personally hate using global variables.

That’s all. Enjoy.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

valdok
Software Developer (Senior)
Israel Israel
My name is Vladislav Gelfer, I was born in Kiev (former Soviet Union), since 1993 I live in Israel.
In programming I'm interested mostly in low-level, OOP design, DSP and multimedia.
Besides of the programming I like physics, math, digital photography.

Comments and Discussions

 
GeneralCool Pinmembertheidealist17-Dec-06 17:00 
Questionit is a ring queue? PinsussAnonymous1-Mar-05 21:57 
Questionit is a ring queue? PinsussAnonymous1-Mar-05 21:56 
Questionwhat about _alloca? Pinmembertheodor eicke31-Jan-05 0:10 
Questioncould you please explain? Pinmemberpeterchen29-Jun-04 1:06 
AnswerRe: could you please explain? PinmemberVladislav Gelfer29-Jun-04 9:46 
GeneralVery good PinmemberNikola Mihaylov8-Jun-04 2:24 
QuestionWhy you add this ? Pinmemberftai6-Jun-04 16:41 
AnswerRe: Why you add this ? PinsussAlan88-Jun-04 8:40 
GeneralRe: Why you add this ? PinmemberVladislav Gelfer8-Jun-04 12:38 
GeneralRe: Why you add this ? Pinmemberftai8-Jun-04 13:35 
GeneralCBuffer PinmemberKochise31-May-04 21:27 
Generalsynchronization &amp; dynamic size PinmemberUri Twig31-May-04 0:16 
GeneralFree PinmemberMr Blonde27-May-04 2:30 
GeneralRe: Free PinmemberVladislav Gelfer27-May-04 3:14 
GeneralRe: Free PinmemberMr Blonde27-May-04 21:52 
GeneralAnswers PinmemberVladislav Gelfer24-May-04 23:29 
GeneralStart here... Pinmembercompiler24-May-04 11:05 
GeneralRe: Start here... PinmemberDon Clugston24-May-04 15:40 
GeneralRe: Start here... PinmemberJamesC12331-Dec-04 7:43 
QuestionIs there really an advantage? PinmemberReorX24-May-04 8:24 
Generalhm Pinmemberwb24-May-04 4:54 
QuestionDynamical? Pinmemberarmentage24-May-04 4:27 
AnswerRe: Dynamical? PinmemberMarcello30-Jul-04 10:21 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140721.1 | Last Updated 23 May 2004
Article Copyright 2004 by valdok
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid