Click here to Skip to main content
Email Password   helpLost your password?

Introduction

block_allocator is a custom STL allocator for use with STL as implemented in Microsoft VC++. Rather than doing allocations on a per-node basis, block_allocator allocates memory in fixed sized chunks, and delivers portions of these chunks as requested. Typical speed improvements of 40% have been obtained with respect to the default allocator. The size of the chunks, set by the user, should not be too little (reduced speed improvements) nor too large (memory wasted). Experiment and see what sizes fit best to your application.

block_allocator can substitute for the default allocator in the following containers:

and WON'T work with other containers such as vector or queue. Note however that vector and queue already perform allocation in chunks. The usage of block_allocator is fairly simple, for instance:
// block allocated list of ints with chunks of 1024 elements
std::list<int,block_allocator<int,1024> > l;
Normal containers and block allocated containers can coexist without problems.

Compatibility mode with MSVC++ 6.0/7.0

Due to limitations of the standard library provided with these compilers, the mode of usage explained above does not work here. To circumvent this problem one must proceed as follows: For each of the containers supported, there's an associated block allocated container derived from it thru use of block_allocator. You have to define an activating macro for each container to be defined prior to the inclusion of blockallocator.h:

To use block allocation based STL in your application, define the corresponding activating macro, include blockallocator.h and then change your declarations as follows:

where chunk_size is the size of the chunks. You can enter too the other optional template parameters (see MSVC++ STL docs for more info).

The MSVC++ 6.0/7.0 compatibility mode can also be used in MSVC++ 7.1, so you need not modify your block_allocator-related code when porting legacy code to 7.1.

Multithreading issues

Each block allocated container instance uses its own block_allocator, so no multithreading problems should arise as long as your program conveniently protects their containers for concurrent access (or if no two threads access the same container instance). This is the same scenario posed by regular STL classes (remember operations on containers are not guarded by CRITICAL_SECTIONs or anything similar), so the moral of it all is: If your program was multithread safe without block_allocator, it'll continue to be with it.

Version history

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
Generalblock_allocated_list::sort() 2x slower than a standard stl::list()?
Pet123
21:33 26 May '08  
hi Joaquín M López Muñoz :

first thanks for the nice job.

i have tested most of the containers mentioned in your article with its stl counterpart (not including multiset and multimap which i never used).

the facts discovered :

block_allocated_list is about 4x faster than stl::list (un-sorted)
block_allocated_set is about 2x faster than stl::set
block_allocated_map is about 2x faster than stl::map

to test deeper, i sorted the list (both the block_allocated_list and stl::list) then i found that the sort operation for block_allocated_list is about 2x slower than stl::list.

do you have ideas about that? sure i can avoid this in pratice since i have realized the fact but still wonder can this be improved?

thank you.
GeneralRe: block_allocated_list::sort() 2x slower than a standard stl::list()?
Joaquín M López Muñoz
23:45 27 May '08  
Hello,

Can you please read the following thread about sort not behaving properly?


block_allocated_list::sort() does not return or produces incorrect result[^] (bottom of the page)

The reason was a bug in the Dinkumware std lib implementation of list::splice. I guess your problem is different since that was in MSVC++ 6.0 and you most likely are using a more modern compiler, but it doesn't hurt to check.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Want a Boost forum in Code Project? Vote here[^]!

GeneralLow Fragmentation Heap
Paul Sanders (AlpineSoft)
9:00 26 Oct '07  
Another thing you might try to speed up memory allocation in general (not just for STL) is to enable Windows' Low Fragmentation Heap; just add the following at the start of your application:
ULONG HeapFragValue = 2;
HeapSetInformation ((void *) _get_heap_handle (), HeapCompatibilityInformation, &HeapFragValue, sizeof (HeapFragValue));
This works only on Windows XP or later and is convered in more detail here:

http://msdn2.microsoft.com/en-us/library/aa366750.aspx[^]

I am ashamed to say that I have not carried out any benchmarks myself, but I have read that it can make quite a difference (see http://xania.org/200512/crt-heap-fragmentation-in-windows[^]).



GeneralFree Heap block 347970 modified at 348394 after it was freed
Nyarlatotep
8:33 19 Dec '06  
I have used your block allocator with maps under a VC6 project inside two cdialogs (which are both childs of a Control bar and where i can switch from one to the other by a tab control)

in the first cdialog there are these definitions:


typedef block_allocated_map<long, long, 20> LONG2LONG;

LONG2LONG m_mapSrcAO;


in the other one:


typedef block_allocated_map<long, long, 20> LONG2LONG;

LONG2LONG m_mapSrcOP;
LONG2LONG m_mapBrkOP;


when the application is closed I receive the user breakpoint in the subject.

Commenting out the "LONG2LONG m_mapSrcAO;" definition in the first dialog solve the problem.
No one of the maps are modified (inserting or deleting entries). Only if I define the m_mapSrcAO member variable, I obtain the error.

GeneralRe: Free Heap block 347970 modified at 348394 after it was freed
Nyarlatotep
8:46 19 Dec '06  
Other clues:

I 've tried to allocate the map in the heap, in the first dialog only:

into the class definition:


typedef block_allocated_map<long, long, 20> LONG2LONG;

LONG2LONG *m_mapSrcAO;


into the class constructor


....
m_mapSrcAO = new LONG2LONG;
....


into the class ondestroy handler

....
delete m_mapSrcAO;
....


and no problems arise.

GeneralRe: Free Heap block 347970 modified at 348394 after it was freed
Nyarlatotep
9:26 19 Dec '06  
if, in the first cdialog, i change the block size to a different value from the block size used in the second cdialog, all goes right !!!

in the first cdialog


typedef block_allocated_map<long, long, 21> LONG2LONG;


in the second cdialog


typedef block_allocated_map<long, long, 20> LONG2LONG;

GeneralRe: Free Heap block 347970 modified at 348394 after it was freed
Joaquín M López Muñoz
3:29 20 Dec '06  
Hello Nyarlatotep,

I think the problem you're experiencing is the same that was discussed here[^]. Basically, this is a bug of the Dinkumware's STL library shipping with MSVC++ 6.0 and can be hopefully solved by having (in your case) a LONG2LONG object defined somewhere as a static global.

Does this fix things?

Best regards,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Want a Boost forum in Code Project? Vote here[^]!

GeneralRe: Free Heap block 347970 modified at 348394 after it was freed
Nyarlatotep
4:04 20 Dec '06  
Yes, it fixes !!! Thanks a lot !!! Smile

This is a very weird problem, however !!! Hmmm
GeneralExcellent! and weird effect
peterchen
10:20 14 Nov '06  
First, a very very excellent drop-in replacement!

I'm mainly looking into it to increase memory locality to speed up some data-intense operations.

It works very well in one place (50% faster for a ridiculously oversized data set Smile ), however, in another (for me more interesting) place, I lose performance and wanted to ask if you have an idea what this can be:

The code in question uses a few hundred map[int, {16 byte struct}] instances, each map containing only a few (10..30) items. using block_allocator with small chunk sizes (128) I have about the same performance as std::map. Going to larger chunk sizes, performance radically degrades (half speed at 1024 bytes chunk size)


Speeding this up (instead of slowing it down Smile) could improve responsiveness of the application notably. So I want to dig deeper. Now that might be memory locality, or another effect with the code (which does many things besides accesing that map). I wanted to ask if you have an idea what this can be.

Q:
Do you know a pathological case with your allcoator that may cause this?

Can your code be modified quickly so all maps of a given type use the same block allocator instance. This way I could test the effect is purely memory locality.

I would be grateful for some ideas.



Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us!

GeneralRe: Excellent! and weird effect
Joaquín M López Muñoz
11:35 14 Nov '06  
peterchen wrote:
The code in question uses a few hundred map[int, {16 byte struct}] instances, each map containing only a few (10..30) items. using block_allocator with small chunk sizes (128) I have about the same performance as std::map. Going to larger chunk sizes, performance radically degrades (half speed at 1024 bytes chunk size)


If maps sizes top at 30 or so why do you choose a chunk size that high? I'd try setting chunk_size to something closer to 30.

peterchen wrote:
Q:
Do you know a pathological case with your allcoator that may cause this?


No. My hunch about what can be going on is related to the observation above: if you have chunks of 128 elements and each map occupy only around 20% of each associated chunk, you get lots of wasted memory and possibly poor locality. Try lowering the chunk size.

peterchen wrote:
Can your code be modified quickly so all maps of a given type use the same block allocator instance. This way I could test the effect is purely memory locality.


I can't check it right now, but it should be easy to hack this for a try:
  1. In class block_allocator, make the members head and tail static.

  2. Delete all constructors of block_allocator.

  3. Move the head/tail initialization code:
          head.next=reinterpret_cast<chunk *>(&tail);
    tail.previous=reinterpret_cast<chunk *>(&head);
    to some initialization routine of your program (this procedure must be done exactly once).
Please keep me informed about your progress. Hope this helps,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Want a Boost forum in Code Project? Vote here[^]!

GeneralRe: Excellent! and weird effect
peterchen
12:49 14 Nov '06  
thanks for your quick reply!

Joaquín M López Muñoz wrote:
I'd try setting chunk_size to something closer to 30.


It's "chunks" not "bytes D'Oh! - sometimes it helps to read documentation!
With chunks that big I'm definitely killing all locality.

I'll try to try the static test tomorrow (gotta get home now Wink )





Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us!

GeneralProblem with block_allocated_list::const_iterator in VS2005
Wocki
21:41 26 Oct '06  
I can't use the const_iterator of the block_allocated_list because the compiler says "No access to private typedef....".

If I comment out the const_iterator typedef in block_allocated_list it works.
AnswerRe: Problem with block_allocated_list::const_iterator in VS2005
Joaquín M López Muñoz
22:46 26 Oct '06  
Wocki wrote:
I can't use the const_iterator of the block_allocated_list because the compiler says "No access to private typedef....".

If I comment out the const_iterator typedef in block_allocated_list it works.


Hello Wocki,

You're absolutely right, thanks for reporting the problem. A better fix is to make the offending typedef public, rather than commenting it out. I'll be shortly updating the code to solve this long standing bug (seems like it's been there since the very first version six years ago Sniff)

Anyway, since you're using a modern compiler I suggest you abandon the block_allocated_list stuff, which is really only necessary for MSVC++ 6.0/7.0, and use block_allocator directly as explained in the article.

Best regards,




Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Want a Boost forum in Code Project? Vote here[^]!

Generalthank you!
wb
20:28 26 Oct '06  
I just tested your code on VS2005 and I get a speedup by 3.8 compared to the normal STL allocator!! OMG
Generalchunk size
slonial
3:34 17 Feb '06  
What should be the optimal value of chunk size when I am creating 6000 instance of map/list with Custom Block allocator.

These 6000 instances are getting created in sequence,if I am keeping
chunk size more than 100, the VM of the process shoots up.

Thanx,
Sumit


Generalwhy do you need to define MSVC_STL_list_node?
Anonymous
10:07 24 Aug '04  
Hi,

I read your lock_allocator on CodeProject website, pretty impressive. however, i feel there is something that dosn't make sense and shouldn't implement like that way, just want to discuss with you.

in your blockallocator.h:

you define

template struct MSVC_STL_list_node
{
void *p1 , *p2;
T t;
};

template
class block_allocated_list:
public std::list , chunk_size>
>
{
...


why do you need to define MSVC_STL_list_node? it should be encapsulated
into the implementation algorithm of microsoft's std::list.

the bellowing is the definitions in MSDN of list:
template <
class Type,
class Allocator=allocator
>
class list

you can see, Allocator=allocator, Microsoft doesn't define an extern
MSVC_STL_list_node explicit. even though, it must be defined and hided in its implementation.

I also checked SGI/HP 's implemenation of STL. same finding.

Thus, I think you don't need to define block_allocated_list,
end user directly use std::list is fine, like:
std::list >


May be i am wrong? if so please correct me.


jesse

hellojesse@gmail.com
GeneralRe: why do you need to define MSVC_STL_list_node?
Joaquín M López Muñoz
12:15 24 Aug '04  
May be i am wrong? if so please correct me.

You are not wrong, this mechanism is a workaround to overcome a serious defficiency in MSVC++ 6.0. Let me explain.

Consider the following definition:
typedef std::list<int,funky_allocator<int> > funky_list;
which is the canonical way to use funky_allocator (or any other allocator): the allocator is instantiated to allocate nodes the size of the elements held (ints in the example). But, if you think of it, an STL list needs some additional internal data to maintain its data structure: normally, two pointers to keep the node linked to the prior and following element. So, internally, funky_list cannot just use the allocator as is, because it needs larger nodes.

With this problem in mind, STL designers invented a mechanism called rebinding:
typedef funky_allocator<int> int_allocator;
typedef int_allocator::rebind<double>::other double_allocator;
The oddly defined double_allocator type is exactly the same as
typedef funky_allocator<double> double_allocator;
I don't know how familiar you are with templates: if you don't get the syntax, do not trouble much about it; in essence, the rebinding mechanism allows an STL list to acquire an allocator instantiated for internal nodes from the user provided allocator instantiated for the type of the elements.

Unfortunately, template support in MSVC++ 6.0 is seriously broken so that the rebinding stuff cannot be implemented. For this reason, the implementors of the MSVC version of STL use a different, non-standard, strategy.

I don't want to dive into more details (unless you're curious), but basically, due to the lack of rebinding in MSVC++ 6.0, block_allocator must be instantiated with a type the exact size of the internal nodes used by an STL list (or set, map, etc.), which kinda breaks the encapsulation of the whole scheme. The classes block_allocated_list and family do this dirty work behind the scenes so that at least you've got a decent user interface. Had MSVC++ 6.0 support for allocator rebinding, instead of block_allocated_list you'd simply write:
typedef std::list<type,block_allocator<type,chunk_size> > my_list;
the way STL designers meant it to be.

Hope I've made myself more or less clears. Best,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
GeneralRe: why do you need to define MSVC_STL_list_node?
Anonymous
21:47 24 Aug '04  
Joaquín:

   thank you very much for your quick reply and your professional knowledge inside STL.
yes, there is rebinding in SGI's STL implementation as you said. now i understand what rebind is. Smile

   since MSVC++ 6.0 doesn't support allocator rebinding, the src code of microsoft's list explictly passes the size of node(with extra left/right pointer) as template parameter. However. MSDN website seems lying,i.e. not conforming to its src code.   please see this link:

         http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcstdlib/html/vclrfList_class.asp


   please confirm whether my accuse of MSDN is right. thanks in advance!
  
   P.S.
   I enjoy coding in templates, do you have your own website   to share some topics in templates programming?

jesse
   hellojesse@gmail.com

GeneralRe: why do you need to define MSVC_STL_list_node?
Joaquín M López Muñoz
5:11 25 Aug '04  
The URL you refer to applies to MSVC 7.1 (aka .NET), which is conformant in this aspect. block_allocator is meant to be used in MSVC++ 6.0.

I enjoy coding in templates, do you have your own website to share some topics in templates programming?

No, sorryFrown Thinking of it, that would be an interesting website to have. I don't know of any place devoted to template programing, though this is commonly discussed in newsgroups comp.lang.c++ and comp.lang.c++.moderated.

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
GeneralRe: why do you need to define MSVC_STL_list_node?
Anonymous
10:03 26 Aug '04  
however, i find another nice article and it seems your arguments don't sound right. according to that article, we don't need to define template struct MSVC_STL_list_node by ourself and break the encapsulation.

here is what it said:

======================================================================
.....
VC++ 6.0 does not understand member template classes, so 'template struct rebind { typedef allocator other; };' is useless. Good to know. So, if our allocators have to work with VC++ 6.0 and Dinkumware, we need to implement a member function called '_Charalloc()':

// Begin Dinkumware (VC++ 6.0 SP5):
char *_Charalloc(size_type n) { return static_cast
(mem_.allocate(n)); }
// End Dinkumware
======================================================================
http://www.codeguru.com/Cpp/Cpp/cpp_mfc/stl/article.php/c4079

I also searched my MSDN help that is VC6.0, NOT .NET the std::list class declaration is the same as in .NET.

don't be mad at me, just want the real answer..

regards!

jesse



GeneralRe: why do you need to define MSVC_STL_list_node?
Anonymous
10:05 26 Aug '04  
repost for formating issue.

however, i find another nice article and it seems your arguments don't sound right. according to that article, we don't need to define template struct MSVC_STL_list_node by ourself and break the encapsulation.

here is what it said:

   ======================================================================
   .....
   VC++ 6.0 does not understand member template classes, so 'template <class U> struct rebind { typedef allocator<U> other; };' is useless. Good to know. So, if our allocators have to work with VC++ 6.0 and Dinkumware, we need to implement a member function called '_Charalloc()':

   // Begin Dinkumware (VC++ 6.0 SP5):
   char *_Charalloc(size_type n) { return static_cast<char*>
                           (mem_.allocate(n)); }
   // End Dinkumware
======================================================================
http://www.codeguru.com/Cpp/Cpp/cpp_mfc/stl/article.php/c4079

I also searched my MSDN help that is VC6.0, NOT .NET   the std::list class declaration is the same as in .NET.

   don't be mad at me, just want the real answer..

regards!

jesse

GeneralRe: why do you need to define MSVC_STL_list_node?
Joaquín M López Muñoz
10:48 26 Aug '04  
don't be mad at me, just want the real answer..

Not at all, I love discussing this stuff. Please thrash me till you're satisfied Poke tongue

I'll answer in reverse order:

I also searched my MSDN help that is VC6.0, NOT .NET the std::list class declaration is the same as in .NET.

Yes, the interface of std::list is the same, what changes is the internal implementation with respect to how the allocator is used. Please keep reading.

VC++ 6.0 does not understand member template classes, so 'template struct rebind { typedef allocator other; };' is useless. Good to know. So, if our allocators have to work with VC++ 6.0 and Dinkumware, we need to implement a member function called '_Charalloc()'...

This is also correct and it is the key point of this discussion. Consider the definition
typed std::list<int,std::allocator<int> > my_list;
So, according to what the article says, the implementation in MSVC 6.0 of my_list allocates its nodes by calling
Charalloc(n)
where n is not sizeof(int) but rather the size of an internal node. So, the allocator does not know in advance how large the nodes requested will be (though they'll always be the same size for a given container and element). And the problem is that block_allocator needs to know this size at compile-time so that it can arrange its block structure in an efficient manner. To remedy this, the definition
block_allocated_list<type,chunk_size>
is basically the same (it derives from, actually) as
std::list<type,block_allocator<type,MSVC_STL_list_node<type>,chunk_size> >
and block_allocator uses the type MSVC_STL_list_node<type> only to gain knowledge, at compile-time, of the size of the nodes it'll be requested. That's it.

It's a little cumbersome, so if I haven't made myself completely clear please tell me so.


Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
GeneralRe: why do you need to define MSVC_STL_list_node?
Anonymous
11:51 26 Aug '04  
now everything clear.

1)your solution is to pass the size of List_Node at compiling time;
pro: speed up
con: break the encapsulation, we need to track whether or when MS change the definition of List_Node. even though the possibility is small.


2)the solution in that article on codeguru
pro: hide the encapsulation.
con: slow down a little bit by caculating the size of List_Node.

To be honest, i like 2) a little bit more.
but your article is well written and organized, you help me a lot
in understaning customzied allocator.
really appreciate.

GeneralRe: why do you need to define MSVC_STL_list_node?
Joaquín M López Muñoz
12:32 26 Aug '04  
now everything clear.

Great. Your analysis of the options available is correct, and, to be honest, option 2 looks more attractive to me, too. The article in Codeguru that you mention (I assume it is this[^]) is impressive. But I think there are some problems with its adaptation to _Charalloc:

If you read again the section on _Charalloc again, you'll note that they implement it like this:
  char *_Charalloc(size_type n) { return static_cast<char*>
(mem_.allocate(n)); }
Following the thread to the description of ss_storage, it turns out that, when requested a block which is not the default size (that of the element), this class looks for a row of contiguous cells making up for the space required. In our case, no fragmentation will happen (i.e., single cells in between rows), as the container will always be requesting the same size, but the following two problems still remain:
  • The algorithm is making unnecesary checks to see if there's a contiguos block: given what I said above, the first available cell will be always the beginning of appropriate contiguous row. This extra checks might impact performance.
  • The allocated memory is not entirely used most of the time, as the contiguous blocks will usually be sligthy large than strictly necessary.
It'd be interesting to benchmark this allocator and mine for time and space efficiency (take the challenge?)



Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
GeneralRe: why do you need to define MSVC_STL_list_node?
Anonymous
21:54 26 Aug '04  
yes. i can't agree with you more.

this afternoon, i set up a project to run the src from codeguru projects.
except the fragmentation issue pointed by you, there are also a lot of logic errors there:
1) its allocate() function and check_consecutive() function is a mess.
2) there is a case of dead_lock, if the ss_allocator's initial size is 4 units,we need to allocate 5 units, the ss_allocator will keep grow() until crash.
3) there are also bugs in pool_allocator ..

I am wondering how could they make that mistakens however wrote such an excellent article. maybe their strength is speeching/writing. Smile

in my opinion, the algorithm of your code is much much better and well implemented. i am thinking of combine your code with src from that article.

BTW, now i implemented a prototype that write some data through STL into
shared memory; i face some challenges here, maybe you can shed some light:
1) how could another process load the data from shared memory into STL by take advantage of our own allocator.
2) if the writer side has some update, how could the reader side effectively gets notified and refresh , but NOT reload the whole STL again.

diffinitely, I will run your program and that codeguru program side by side against some test cases, and post results here.

Happy coding.

jesse

hellojesse@gmail.com



Last Updated 31 Oct 2006 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010