Click here to Skip to main content
14,384,304 members

A Custom STL std::allocator Replacement Improves Performance

Rate this:
5.00 (15 votes)
Please Sign up or sign in to vote.
5.00 (15 votes)
14 Apr 2017CPOL
Protect against heap fragmentation faults and improve execution speed with a fixed block alternative to STL std::allocator

Introduction

This is my third and final article concerning fixed block memory allocators here on Code Project. This time, we'll create an alternate C++ Standard Library std::allocator memory manager using the groundwork laid by the first two articles. 

The Standard Template Library (STL) is powerful C++ software library including container and iterator support. The problem with using the library for mission-critical or time-critical projects isn't with STL itself – the library is very robust. Instead, it's with the unrestricted use of the global heap. The standard STL allocator uses the heap extensively during operation. This is a problem on resource constrained embedded systems. Embedded devices can be expected to operate for months or years where a fault caused by a fragmented heap must be prevented. Fortunately, STL provides a means to replace std::allocator with one of our own design. 

Fixed block memory allocators are a common technique to provide dynamic-like operation yet the memory is provided from memory pools where the blocks doled are of a fixed size. Unlike the global heap, which is expected to universally operate on blocks of any size, a fixed block allocator can be tailored for a narrowly defined purpose. Fixed block allocators can also offer consistent execution times whereas the global heap cannot offer such guarantees. 

This article describes an STL-compatible allocator implementation that relies upon a fixed block allocator for dispensing and reclaiming memory. The new allocator prevents faults caused by a fragmented heap and offer consistent allocation/deallocation execution times. 

std::allocator

The STL std::allocator class provides the default memory allocation and deallocation strategy. If you examine code for a container class, such as std::list, you'll see the default std::allocator template argument. In this case, allocator<_Ty> is template class handling allocation duties for _Ty objects.

template<class _Ty, class _Ax = allocator<_Ty> >
class list : public _List_val<_Ty, _Ax>
{
    // ...
}

Since the template parameter _Ax defaults to allocator<_Ty> you can create a list object without manually specifying an allocator. Declaring myList as shown below creates an allocator class for allocating/deallocating int values. 

std::list<int> myList;

The STL containers rely upon dynamic memory for elements and nodes. An element is the size of an inserted object. In this case, sizeof(int) is the memory required to store one list element. A node is an internal structure required to bind the elements together. For std::list, it is a doubly-linked list storing, at a minimum, pointers to the next and previous nodes. 

Of course, the element size varies depending on the objects stored. However, the node size varies too depending on the container used. A std::list node may have a different size than a std::map node. Because of this, a STL-allocator must be able to handle different sized memory block requests. 

An STL-allocator must adhere to specific interface requirements. This isn't an article on the how's and why's of the std::allocator API – there are many online references that explain this better than I. Instead, I'll focus on where to place the memory allocation/deallocation calls within an existing STL-allocator class interface and provide new "x" versions of all common containers to simplify usage. 

xallocator    

Most of the heavy lifting for the new fixed block STL allocator comes from the underlying xallocator as described within my article "Replace malloc/free with a Fast Fixed Block Memory Allocator". As the title states, this module replaces malloc/free with new fixed block xmalloc/xfree versions. 

To the user, these replacement functions operate in the same way as the standard CRT versions except for the fixed block feature. In short, xallocator has two modes of operation: static pools, where all memory is obtained from pre-declared static memory, or heap blocks, where blocks are obtained from the global heap but recycled for later use when freed. See the aforementioned article for implementation details. 

stl_allocator

The class stl_allocator is the fixed block STL-compatible implementation. This class is used as an alternative to std::allocator

template <typename T>
class stl_allocator
{
public:
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    typedef T* pointer;
    typedef const T* const_pointer;
    typedef T& reference;
    typedef const T& const_reference;
    typedef T value_type;

    stl_allocator(){}
    ~stl_allocator(){}

    template <class U> struct rebind { typedef stl_allocator<U> other; };
    template <class U> stl_allocator(const stl_allocator<U>&){}

    pointer address(reference x) const {return &x;}
    const_pointer address(const_reference x) const {return &x;}
    size_type max_size() const throw() {return size_t(-1) / sizeof(value_type);}

    pointer allocate(size_type n, stl_allocator<void>::const_pointer hint = 0)
    {
        return static_cast<pointer>(xmalloc(n*sizeof(T)));
    }

    void deallocate(pointer p, size_type n)
    {
        xfree(p);
    }

    void construct(pointer p, const T& val)
    {
        new(static_cast<void*>(p)) T(val);
    }

    void construct(pointer p)
    {
        new(static_cast<void*>(p)) T();
    }

    void destroy(pointer p)
    {
        p->~T();
    }
};

The code is really just a standard std::allocator interface. There are many examples online. The source attached to this article has been used on many different compilers (GCC, Keil, VisualStudio). The thing we're interested in is where to tap into the interface with our own memory manager. The methods of interest are:

  • allocate()
  • deallocate()

allocate() allocates n number of instances of object type T. xmalloc() is used to obtained memory from the fixed block memory pool as opposed to the global heap.

pointer allocate(size_type n, stl_allocator<void>::const_pointer hint = 0)
{
    return static_cast<pointer>(xmalloc(n*sizeof(T)));
}

deallocate() deallocates a memory block previously allocated using allocate(). A simple call to xfree() routes the request to our memory manager. 

void deallocate(pointer p, size_type n)
{
    xfree(p);
}

Really, that's all there is to it once you have an fixed block memory manager. xallocator is designed to handle any size memory request. Therefore, no matter storage size called for by the C++ Standard Library, elements or nodes, xmalloc/xfree processes the memory request.

Of course, stl_allocator is a template class. Notice that the fixed block allocation duties are delegated to non-template functions xmalloc() and xfree(). This makes the instantiated code for each instance as small as possible. 

"x" Containers

The following STL container classes use fixed sized memory blocks that never change in size while the container is being utilized. The number of heap element/node blocks goes up and down, but block sizes are constant for a given container instantiation.

  • std::list
  • std::map
  • std::multipmap
  • std::set
  • std::multiset
  • std::queue

To make using stl_allocator a bit easier, new classes were created for many of the standard container types. Each new container just inherits from the standard library counterpart and is pre-pended with "x". 

  • xlist
  • xmap
  • xmultimap
  • xset
  • xmultiset
  • xqueue

The following code shows the complete xlist implementation. Notice xlist just inherits from std::list, but the key difference is the _Ax template argument defaults to stl_allocator and not std::allocator

#ifndef _XLIST_H
#define _XLIST_H

#include "stl_allocator.h"
#include <list>

template<class _Ty, class _Ax = stl_allocator<_Ty> >
class xlist : public std::list<_Ty, _Ax>
{
};

#endif

Each of the “x” versions of the STL container is used just like the std version except all allocations are handled by stl_allocator. For instance:

#include  "xlist.h"

xlist<xstring> myList;
myList.push_back("hello world");

The following container types use variable sized memory blocks from the heap that expand or contract in size during operation. 

  • std::string
  • std::vector

A corresponding xvector wasn't implemented because vectors require a contiguous memory region whereas the other container types are node based. A fixed block allocator works well with equally sized blocks, but not so well if a vector continually expands to a larger and larger single block. While stl_allocator would work with a vector, it could be misused and cause runtime problems with the fixed block memory manager. 

A std::string also varies its requested memory size during execution, but typically, strings aren't used in an unbounded fashion. Meaning, in most cases you're not trying to store a 100K-byte string, whereas a vector you might actually do that. Therefore, xstring's are available as explained next. 

"x" Strings

New x-version of the standard and wide versions of the string classes are created.

  • xstring
  • xwstring

Here, we just typedef a new x-version using stl_allocator as the default allocator of char and wchar_t types:

#ifndef _XSTRING_H
#define _XSTRING_H

#include "stl_allocator.h"
#include <string>

typedef std::basic_string<char, std::char_traits<char>, stl_allocator<char> > xstring;
typedef std::basic_string<wchar_t, std::char_traits<wchar_t>, stl_allocator<wchar_t> > xwstring;

#endif 

Usage of an xstring is just like any other std::string

#include "xstring.h"

xwstring s = L"This is ";
s += L"my string.";

"x" Streams

The iostreams C++ Standard Library offers powerful and easy-to-use string formatting by way of the stream classes. 

  • std::stringstream
  • std::ostringstream
  • std::wstringstream
  • std::wostringstream

Like the container classes, iostreams makes use of the custom STL allocator instead of the global heap using these new definitions.

  • xstringstream
  • xostringstream
  • xwstringstream
  • xwostringstream

Again, just typedef new x-versions with the default stl_allocator template arguments makes it easy.

#ifndef _XSSTREAM_H
#define _XSSTREAM_H

#include "stl_allocator.h"
#include <sstream>

typedef std::basic_stringstream<char, std::char_traits<char>, 
stl_allocator<char> > xstringstream;
typedef std::basic_ostringstream<char, std::char_traits<char>, 
stl_allocator<char> > xostringstream;

typedef std::basic_stringstream<wchar_t, std::char_traits<wchar_t>, 
stl_allocator<wchar_t> > xwstringstream;
typedef std::basic_ostringstream<wchar_t, std::char_traits<wchar_t>, 
stl_allocator<wchar_t> > xwostringstream;

#endif

Now, using an xstringstream is a snap.

xstringstream myStringStream;
myStringStream << "hello world " << 2016 << ends;

Benchmarking

In my previous allocator articles, simple tests show the fixed block allocator is faster than the global heap. Now let's check the stl_allocator enabled containers to see they compare against std::allocator on a Windows PC. 

All tests were built with Visual Studio 2008 and maximum speed compiler optimizations enabled. The xallocator is running in heap blocks mode where blocks are initially obtained from the heap but recycled during deallocations. The debug heap is also disabled by setting the Debugging > Environment _NO_DEBUG_HEAP=1. The debug heap is considerably slower because of the extra safety checks. 

The benchmark tests are simplistic and stress insertion/removal of 10000 elements. Each test is run three times. The insert/remove is where the STL library relies upon dynamic storage and that's the focus of these tests, not the underlying algorithmic performance. 

The code snippet below is the std::list test. The other container tests are similarly basic. 

list<int> myList;
for (int i=0; i<MAX_BENCHMARK; i++)
    myList.push_back(123);
myList.clear();

The performance difference between std::allocator and stl_allocator running in heap block mode is shown below: 
 

ContainerModeRunBenchmark Time (mS)
std::listGlobal Heap160
std::listGlobal Heap232
std::listGlobal Heap323
xlistHeap Blocks119
xlistHeap Blocks211
xlistHeap Blocks311
std::mapGlobal Heap137
std::mapGlobal Heap234
std::mapGlobal Heap341
xmapHeap Blocks138
xmapHeap Blocks225
xmapHeap Blocks325
std::stringGlobal Heap1171
std::stringGlobal Heap2146
std::stringGlobal Heap395
xstringHeap Blocks157
xstringHeap Blocks236
xstringHeap Blocks340

The test results shows that, for this benchmark, the stl_allocator enabled versions are approximately 2 to 3 times faster than std::allocator. Now this doesn't mean that STL is now magically overall twice as fast. Again, this benchmark is just concentrating on insertion and removal performance. The underlying container algorithmic performance hasn't changed. Therefore, the overall improvement seen will vary depending on how often your application inserts/removes from containers. 

The stl_allocator operates in fixed time if xallocator is used in static pool mode. If xallocator heap blocks mode is used, the execution time is constant once the free list is populated with blocks obtained from the heap. Notice that the xlist benchmark above has an initial execution time of 19mS, yet the subsequent executions are 11mS each. On the first run, xallocator had to obtain fresh blocks from the global heap using operator new. On runs 2 and 3, the blocks already existed within the xallocator free-list so the heap isn't relied upon making successive runs much faster. 

The global heap has unpredictable execution time performance when the heap fragments. Since xallocator only allocates blocks and recycles them for later use the execution time is more predictable and consistent. 

Game developers go to great lengths to solve heap fragmentation and its myriad of associated problems. The Electronic Arts Standard Template Library (EASTL) white paper goes into detail about the problems with STL and specifically the std::allocator. Paul states in the EASTL -- Electronic Arts Standard Template Library document:

Quote:

Among game developers the most fundamental weakness is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL.

While I don't write code for games, I can see how the lag associated with fragmentation and erratic allocation/deallocation times would effect game play. 

Reference Articles

Benefits

STL is an extremely useful C++ library. Too often through, for medical devices I can't use it due to the possibility of a heap fragmentation memory fault. This leads to rolling your own custom container classes for each project instead of using an existing, well-tested library. 

My goal with stl_allocator was eliminating memory faults; however, the fixed block allocator does offer faster and more consistent execution times whereas the heap does not. The heap implementation and level of fragmentation leads to unpredictable execution times. Depending on your application, this may or may not be an issue.

As shown in this article, simply employing an STL-compatible fixed block allocator opens up the possibility of using C++ Standard Template Library features on a project that otherwise may not have been possible. 

Revisions

  • 3rd April, 2016
    • Initial release
  • 11th April, 2016
    • Fixed bugs in code. New source code attached to article. 
  • 14th April, 2017
    • Fixed minor compiler include bug on Linux systems.
    • Updated attached source code.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

David Lafreniere
United States United States
I've been a professional software engineer for over 20 years. When not writing code, I enjoy spending time with the family, camping and riding motorcycles around Southern California.

Comments and Discussions

 
QuestionUsing my idea of hybrid allocator for xvector might be possible? Pin
ehaerim18-Dec-18 9:21
memberehaerim18-Dec-18 9:21 
AnswerRe: Using my idea of hybrid allocator for xvector might be possible? Pin
David Lafreniere23-Mar-19 5:04
memberDavid Lafreniere23-Mar-19 5:04 
QuestionDo you have any solution for "Creating STL Containers, for example, STL::map, in Shared Memory"? Pin
ehaerim16-Dec-18 19:58
memberehaerim16-Dec-18 19:58 
Hi David,
I am a big fan of your works and we were corresponding a few times.
Recently, I am working on a small class that is to be shared between multiple processes for storing global data. For that, I need to create STL containers in shared memory.
I guess you also had before or will have such needs in the future.

Creating STL Containers in Shared Memory | Dr Dobb's[^]

This article shows how to do that, but the complete source is not available any more.

Could you implement STL containers in shared memory with the help of your allocator implementation?
That will be awesome.

Best regards
HaeRim
AnswerRe: Do you have any solution for "Creating STL Containers, for example, STL::map, in Shared Memory"? Pin
David Lafreniere20-Mar-19 17:46
memberDavid Lafreniere20-Mar-19 17:46 
AnswerRe: Do you have any solution for "Creating STL Containers, for example, STL::map, in Shared Memory"? Pin
David Lafreniere23-Mar-19 5:35
memberDavid Lafreniere23-Mar-19 5:35 
GeneralRe: Do you have any solution for "Creating STL Containers, for example, STL::map, in Shared Memory"? Pin
ehaerim28-Mar-19 2:57
memberehaerim28-Mar-19 2:57 
PraiseLooks great! Here's a GCC Makefile Pin
Louis Thiery9-Oct-17 4:37
memberLouis Thiery9-Oct-17 4:37 
GeneralRe: Looks great! Here's a GCC Makefile Pin
David Lafreniere9-Oct-17 9:05
memberDavid Lafreniere9-Oct-17 9:05 
QuestionLinux or Android? Pin
gregko31469-Sep-17 8:52
membergregko31469-Sep-17 8:52 
AnswerRe: Linux or Android? Pin
David Lafreniere13-Sep-17 2:27
memberDavid Lafreniere13-Sep-17 2:27 
AnswerRe: Linux or Android? Pin
David Lafreniere13-Sep-17 15:44
memberDavid Lafreniere13-Sep-17 15:44 
GeneralRe: Linux or Android? Pin
gregko314614-Sep-17 3:15
membergregko314614-Sep-17 3:15 
QuestionSame speed as std::allocator with std::unordered_set Pin
Ted_Ka6-Aug-17 0:02
memberTed_Ka6-Aug-17 0:02 
AnswerRe: Same speed as std::allocator with std::unordered_set Pin
David Lafreniere6-Aug-17 10:40
memberDavid Lafreniere6-Aug-17 10:40 
GeneralRe: Same speed as std::allocator with std::unordered_set Pin
gregko314613-Sep-17 9:11
membergregko314613-Sep-17 9:11 
GeneralRe: Same speed as std::allocator with std::unordered_set Pin
David Lafreniere13-Sep-17 15:42
memberDavid Lafreniere13-Sep-17 15:42 
Questioncompile error Pin
hedgezzz19-Feb-17 14:06
memberhedgezzz19-Feb-17 14:06 
AnswerRe: compile error Pin
David Lafreniere27-Feb-17 13:07
memberDavid Lafreniere27-Feb-17 13:07 
AnswerRe: compile error Pin
mattytraxx28-Mar-17 15:07
membermattytraxx28-Mar-17 15:07 
GeneralRe: compile error Pin
David Lafreniere14-Apr-17 10:51
memberDavid Lafreniere14-Apr-17 10:51 
QuestionGreat work Pin
Richard MacCutchan11-Apr-16 1:27
protectorRichard MacCutchan11-Apr-16 1:27 
AnswerRe: Great work Pin
David Lafreniere11-Apr-16 6:14
memberDavid Lafreniere11-Apr-16 6:14 
QuestionAWOL Pin
degski4-Apr-16 19:45
memberdegski4-Apr-16 19:45 
AnswerRe: AWOL Pin
David Lafreniere5-Apr-16 2:02
memberDavid Lafreniere5-Apr-16 2:02 
GeneralRe: AWOL Pin
Serkan Onat5-Apr-16 2:14
professionalSerkan Onat5-Apr-16 2:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 3 Apr 2016

Tagged as

Stats

34.6K views
658 downloads
25 bookmarked