Click here to Skip to main content
15,887,683 members
Articles / Programming Languages / C++
Article

A Custom Block Allocator for Speeding Up VC++ STL

Rate me:
Please Sign up or sign in to vote.
4.63/5 (20 votes)
30 Oct 2006CPOL 237.4K   1.5K   59   63
A block allocator for use with STL containers that greatly improves speed in programs doing massive data insertions and extractions.

Introduction

block_allocator is a custom STL allocator for use with STL as implemented in Microsoft VC++. Rather than doing allocations on a per-node basis, block_allocator allocates memory in fixed sized chunks, and delivers portions of these chunks as requested. Typical speed improvements of 40% have been obtained with respect to the default allocator. The size of the chunks, set by the user, should not be too little (reduced speed improvements) nor too large (memory wasted). Experiment and see what sizes fit best to your application.

block_allocator can substitute for the default allocator in the following containers:

  • list,
  • set,
  • multiset,
  • map,
  • multimap,
and WON'T work with other containers such as vector or queue. Note however that vector and queue already perform allocation in chunks. The usage of block_allocator is fairly simple, for instance:
// block allocated list of ints with chunks of 1024 elements
std::list<int,block_allocator<int,1024> > l;
Normal containers and block allocated containers can coexist without problems.

Compatibility mode with MSVC++ 6.0/7.0

Due to limitations of the standard library provided with these compilers, the mode of usage explained above does not work here. To circumvent this problem one must proceed as follows: For each of the containers supported, there's an associated block allocated container derived from it thru use of block_allocator. You have to define an activating macro for each container to be defined prior to the inclusion of blockallocator.h:

  • list -> block_allocated_list (macro DEFINE_BLOCK_ALLOCATED_LIST),
  • set -> block_allocated_set (macro DEFINE_BLOCK_ALLOCATED_SET),
  • multiset -> block_allocated_multiset (macro DEFINE_BLOCK_ALLOCATED_MULTISET),
  • map -> block_allocated_map (macro DEFINE_BLOCK_ALLOCATED_MAP),
  • multimap -> block_allocated_multimap (macro DEFINE_BLOCK_ALLOCATED_MULTIMAP),

To use block allocation based STL in your application, define the corresponding activating macro, include blockallocator.h and then change your declarations as follows:

  • list<type> -> block_allocated_list<type,chunk_size>
  • set<key> -> block_allocated_set<key,chunk_size>
  • multiset<key> -> block_allocated_multiset<key,chunk_size>
  • map<key,type> -> block_allocated_map<key,type,chunk_size>
  • multimap<key,type> -> block_allocated_multimap<key,type,chunk_size>

where chunk_size is the size of the chunks. You can enter too the other optional template parameters (see MSVC++ STL docs for more info).

The MSVC++ 6.0/7.0 compatibility mode can also be used in MSVC++ 7.1, so you need not modify your block_allocator-related code when porting legacy code to 7.1.

Multithreading issues

Each block allocated container instance uses its own block_allocator, so no multithreading problems should arise as long as your program conveniently protects their containers for concurrent access (or if no two threads access the same container instance). This is the same scenario posed by regular STL classes (remember operations on containers are not guarded by CRITICAL_SECTIONs or anything similar), so the moral of it all is: If your program was multithread safe without block_allocator, it'll continue to be with it.

Version history

  • 29th Feb, 2000 - 1.1
    • Initial release in CodeProject.
  • 22nd Mar, 2001 - 1.2
    • Included definitions for operator== and operator!=. The lack of these caused linking errors when invoking list::swap() and similar methods. The funny thing about it is that no one ever reported this seemingly important bug, so either swap() is not that much used or not that many people use block_allocator!
  • 25th Oct, 2006 - 1.3
    • block_allocator now works with MSVC++ 7.1 and 8.0. Thanks to James May for helping with testing this new version of the code.
  • 30th Oct, 2006 - 1.4
    • Fixed some typedefs incorrectly made private in block_allocated_list, block_allocated_set, etc.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Spain Spain
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralRe: blockallocator bug Pin
Joaquín M López Muñoz7-Mar-01 21:58
Joaquín M López Muñoz7-Mar-01 21:58 
Generalblockallocator bug: closer inspection and proposed workaround Pin
Joaquín M López Muñoz2-Oct-01 0:12
Joaquín M López Muñoz2-Oct-01 0:12 
QuestionIs the allocator thread-safe? Pin
29-Dec-00 5:52
suss29-Dec-00 5:52 
AnswerRe: Is the allocator thread-safe? Pin
Joaquín M López Muñoz2-Oct-01 0:20
Joaquín M López Muñoz2-Oct-01 0:20 
General12% map speed improvement achieved by using this approach Pin
26-Nov-00 5:18
suss26-Nov-00 5:18 
QuestionWhere is the speedup? Pin
Jim16-May-00 2:12
Jim16-May-00 2:12 
AnswerRe: Where is the speedup? Pin
Joaquín M López Muñoz16-May-00 3:18
Joaquín M López Muñoz16-May-00 3:18 
AnswerRe: Where is the speedup? Pin
Joaquín M López Muñoz8-Mar-01 21:23
Joaquín M López Muñoz8-Mar-01 21:23 
At the end of this email you'll find a small test program. It does
random insertions and extractions on two different lists, both with
the default allocator and with block_allocator. I tested this several
times and block_allocator consistently achieved an improvement of
42-43 %. Maybe you can give it a try to determine if there's something
strange with your test platform. Mine is:

Compiler: MSVC++ 5.0 (no sps)
Build: Release, Multithreaded CRT, all switches at their default states
OS: Win95 OSR2
Machine PMMX, 233 MHz, 160MB

Anyway, this is not an academic exercise. I developed block_allocator
to use it in an image compression algorithm doing some intense
list work. The introduction of block_allocator alone speeded up the
performance by a factor of 15-20%, and this was a real program,
not a performance test. I'll be glad to know about your further
experiences with block_allocator. Perharps some conclusion can be drawn
from your test environment or the kind of work you use the containers for.

Regards, Joaquín M López Muñoz

<br />
#define DEFINE_BLOCK_ALLOCATED_LIST<br />
#include "blockallocator.h"<br />
<br />
#include <iostream><br />
#include <windows.h><br />
<br />
using namespace std;<br />
<br />
template <class T1,class T2,class X1,class X2><br />
DWORD test(T1& l1,T2& l2,X1& x1,X2& x2)<br />
{<br />
  srand(0);<br />
  DWORD dw=GetTickCount();<br />
<br />
  for(int n=0;n<500;++n){<br />
    for(int m=0;m<1000;++m){<br />
      if(rand()<RAND_MAX/2)l1.push_front(x1);<br />
      if(rand()<RAND_MAX/2)l2.push_front(x2);<br />
    }<br />
    for(T1::iterator it1=l1.begin();it1!=l1.end(); ){<br />
      if(rand()<RAND_MAX/2)it1=l1.erase(it1);<br />
      else it1++;<br />
    }<br />
    for(T2::iterator it2=l2.begin();it2!=l2.end(); ){<br />
      if(rand()<RAND_MAX/2)it2=l2.erase(it2);<br />
      else it2++;<br />
    }<br />
    for(m=0;m<500;++m){<br />
      if(rand()<RAND_MAX/2)l2.push_back(x2);<br />
      if(rand()<RAND_MAX/2)l1.push_back(x1);<br />
    }<br />
  }<br />
<br />
  l1.clear();<br />
  l2.clear();<br />
<br />
  return GetTickCount()-dw;<br />
}<br />
<br />
struct node<br />
{<br />
  DWORD x,y,z;<br />
<br />
  bool operator <(const node &) const;<br />
  bool operator >(const node &) const;<br />
  bool operator ==(const node &) const;<br />
  bool operator !=(const node &) const;<br />
};<br />
<br />
int main(void)<br />
{<br />
  DWORD total_default=0;<br />
  DWORD total_block_allocator=0;<br />
  for(int i=1;i<10;++i){<br />
    {<br />
      cout<<"default allocator: ";<br />
<br />
      list<int> l1;<br />
      list<node> l2;<br />
      int x1=0;<br />
      node x2;<br />
      DWORD dw=test(l1,l2,x1,x2);<br />
      cout<<dw<<endl;<br />
<br />
      total_default+=dw;<br />
    }<br />
    {<br />
      cout<<"block allocator: ";<br />
      block_allocated_list<int,1024> l1;<br />
      block_allocated_list<node,1024> l2;<br />
      int x1=0;<br />
      node x2;<br />
      DWORD dw=test(l1,l2,x1,x2);<br />
      cout<<dw<<endl;<br />
      total_block_allocator+=dw;<br />
    }<br />
  }<br />
  cout<<"----------------------------------------"<<endl;<br />
  cout<<"Total time default allocator: "<<total_default<<endl;<br />
  cout<<"Total time block allocator: "<<total_block_allocator<<endl;<br />
  cout<<"Block allocator improvement: "<<<br />
    (DWORD)(100.0*(total_default-total_block_allocator)/total_default)<<"%"<<endl;<br />
<br />
  return 0;<br />
}<br />

GeneralRe: Where is the speedup? Pin
19-Apr-01 14:19
suss19-Apr-01 14:19 
GeneralRe: Where is the speedup? Pin
6-Sep-01 6:44
suss6-Sep-01 6:44 
GeneralRe: Where is the speedup? Pin
14-Sep-01 9:03
suss14-Sep-01 9:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.