Click here to Skip to main content
Click here to Skip to main content

Buffer Pool \ Object Pool

By , 18 Jan 2003
 

Sample Image - Pools.gif

Introduction

Anyone who works with applications that use vast allocation and freeing of dynamic memory during their run knows that this can cause decrease in performance, because this operations needs to be guarded with some kind of a global lock, so different modules can affect one another's performance. Further more, over time the memory is fragmented from all sizes of allocations, so if an application is kind of a resident process on a machine, its performance deteriorates over time.

In this article I will try to give a partial solution for this problem. Why partial? 'Cause I'm not trying to give a better implementation of the Memory Manager above the OS level. What I can optimize is modules that use vast allocation and freeing of buffers of the same size, which from my experience are the most common.

The solution I'm suggesting is a buffer pool that will allocate and free buffers of a predefined size, the complexity of the allocation and free operations is O(1), and each buffer pool object is self synchronized, so it wont affect other modules using another buffer pool. Furthermore the dynamic allocations of the buffer pool are done in segments of memory the size of N buffers, so the fragmentation of the memory over time is reduced significantly. More then that, because each segment is allocated as one block of memory, the paging of memory will be reduced also.

Using the code

First declare a CBufferPool object

CBufferPool BuffPoolObj;

Next create the Buffer pool by calling the Create method.

BuffPoolObj.Create(nBufferSize,  //Buffer Size
  nBuffersPerSegment, //Number of buffers in each segment
  nInitialNumberOfSegments, //Initial number of segments
  nMinNumberOfSegments, //Minimum number of segments during run
  nMaxNumberOfSegments, //Maximum number of segments during run
  nFreeBuffersPercentForSegmentDeletion); 
    //Percentage of all free buffers relative to 
    //size of a segment that allows a segments deletion

The last parameter nFreeBuffersPercentForSegmentDeletion is not so intuitively understood, so I'll try to explain its purpose. In the implementation of the buffer pool, the buffers are given from a segment that is pre-allocated. There is that 'gray zone' where if a buffer is the first allocated buffer in a segment, we allocate a new segment for it, and if in the next operation we free this same buffer, we should free the whole segment and so on. This will cause a decrease in the performance that is worse than if we'll allocate the buffers in the traditional way. To prevent this from happening, I'll free a segment if I'll have a safe number of free buffers from whole segments. This safe distance is determined by this parameter.

Then you can call Allocate method to get the required buffer, and the Free method to return it to the pool.

void* pBuffer = BuffPoolObj.Allocate()
//do something with buffer
BuffPoolObj.Free(pBuffer);

The CBufferPool has a synchronization scheme, so this methods can be called asynchronously. To destroy the buffer pool object call the Destroy method. This method will free all segments, in debug it will check that all buffers where freed.

Working the demo

The demo project demonstrates how the buffer pool can improve performance of a program compared to simple new and delete operators. Furthermore it can be used to check if your system can be improved, and the approximate amount of improvement that can be achieved with a certain configuration.

There are 3 tests that you can perform with the demo project, test the buffer pool and then the global allocation (new and delete) to see the improvement the buffer pool achieves over global allocation, and the interaction test that can illustrate the severe decrease in performance effect that the global allocation suffers from, in comparison to the buffer pool.

The parameters in the demo are divided into three, session parameters that are used to run both the buffer pool and the global allocation, and the two other sections for each of the two. For the parameters that are not intuitively understood, you can understand using the code. The buffer pool code is very well documented, and I tried to keep the demo application code simple.

For those who compile the demo in debug mode, let me just say that the buffer pool is added in debug mode tests to ensure the buffers or segments that are freed will not be used and buffers that are being allocated are not already in use. And these tests (commonly memsets to trash the memory out for it to be unusable) will decrease the performance beyond measurement, so do the tests in release.

Implementation

I will just try to explain the implementation in general, the buffer pool is constructed of a doubly linked list of segments, each segment has a single linked list of free buffers. The link between the buffers uses the buffer itself. Each buffer has apart from its required memory, a header that keeps its index in the segment, so the free operation will be direct with no searching and its complexity O(1). All the memory for the segment and buffers is allocated as one block. The segments list is ordered during the run so if the segment is fully used, it will be inserted to the end of the list, and if it has one or more of its buffers freed, it will move to the head of the list. The allocation operation just pops one free buffer out of the list. When a buffer is free a calculation is made to extract it's segment, a segment is allocated each time there is no free buffer, and freed if we have more then nFreeBuffersPercentForSegmentDeletion multiplied by nBuffersPerSegment amount of free buffers all together in the buffer pool. This is the basic idea for specifics. See the code, it is really well documented.

Object pool

The object pool is an implementation of a pool for object that is a direct descendent of the buffer pool. What is needed in order to use it is to declare an Object pool with a declared type.

CObjectPool<SomeType> SomeTypeObjectPool;

Next create The object pool by calling the Create method.

SomeTypeObjectPool.Create
    ( nObjectsPerSegment, //Number of objects in each segment
     nInitialNumberOfSegments, //Initial number of segments
     nMinNumberOfSegments, //Minimum number of segments during run
     nMaxNumberOfSegments, //Maximum number of segments during run
     nFreeObjectsPercentForSegmentDeletion); 
      //Percentage of all free objects 
      //in relative to the size of a segment that allows a segments deletion

And the use of the pool is as follows:

SomeType* pObject = SomeTypeObjectPool.Allocate()
//do something with object
SomeTypeObjectPool.Free(pObject);

The object pool implementation has very little code, it uses a new and delete "placement" (as the documentations calls this method) that receives the pointer to a buffer pool which is initialized with the object size as the buffer size at the Create method. I think the implementation is very exciting because of this new\delete "placement" method that is rarely used.

A light example of the object pool is shown in the OnInitDialog function in the CDemoDlg.

You can use the CBufferPool \ CObjectPool on any platform. All you need to do, is replace the critical section object with the other platform dependent mutual exclusion lock.

Thanks to George Anescu for the CPreciseTimer class used in the demo project.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Uri Twig
Web Developer
Israel Israel
Member
4 years expirience coding C++ with MFC & STL and coding C for Windows XP/2K internals (Drivers).
 
I Love what I do.

For pastime activities:
Fun & Games

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralA few bugsmembermtn_133713 May '09 - 13:44 
Here's a few bugs in BufferPool.cpp I uncovered after testing it out a bit:
------------
line 120
if (m_nNumberOfSegments > m_nNumberOfSegmentsHighMark)
should be:
if (m_nNumberOfSegments >= m_nNumberOfSegmentsHighMark)
Once the maximum of segments matches the HighMark, there should not be anymore segments allocated.
 
------------
line 172:
memset(pBuffer, 'E', m_nBufferSize);
line 193:
memset(p, 'E', m_nBufferSize );
Make sure these are not null pointers before doing a memset on the memory. This is the case when the if-statement in line 120 is entered.
 
------------
lines 149, 204, 211, 285, 382
Need to put #ifdef _DEBUG around these blocks since the Signiture member is not compiled in if _DEBUG is not defined.
GeneralMAXIMUM_WAIT_OBJECTSmemberHarly30 Aug '07 - 22:47 
int MessageLoop(HANDLE* lphObjects, int cObjects )
{
ASSERT( cObjects < MAXIMUM_WAIT_OBJECTS ); // harly.he@gmail.com add 2007.08.31
...
}

 
no any way, go any way.

Questionbecause need global lock???memberBamaco220 Jan '04 - 10:04 
Anyone who works with applications that use vast allocation and freeing of dynamic memory during their run knows that this can cause decrease in performance, because this operations needs to be guarded with some kind of a global lock...
 
I disagree right there!
 
There is no need to guard with a global lock. Maybe for your application of for some
class of application. But I would not generalize that
AnswerRe: because need global lock???memberftai7 Jul '04 - 14:10 
I think the golbal lock means the control to public resource. In this case We use the memory which is important to any process
GeneralAnother TakesussFrank32521 Jan '03 - 2:53 
I've used similar methods and classes in some work I've done.
 
The project was a high-speed data server for stock market quotes and trades. With this type of server the information is continuously updated, removed, added to over the course of a day - 100 million or more messages per day.
 
This specific application was designed for multi-processor Win2k systems and used asynchronous thread processing - threads were assigned to different processors and had different priorities.
 
Two tools were important for analyzing and optimizing the program. The first is "Performance Monitor" and the other was NuMega TrueTime (part of the Bounds Checker suite).
 
One of the biggest performance robbers is locking for multithread access - it can cause a multithread application to process a single thread at a time due to contention over the same resouce - the threads are not asynchronous anymore.
 
To overcome this contention, we used seperate lists for the free and data items; we only needed to lock one or the other. This eliminated a lot of contention. It also eliminated the overhead of using a double-link list; a single-link list is twice as fast. Free items should be placed at the head of the free list; it increases the chance that a memory reference will be in the CPU cache.
 
To reduce contention even further, we designed the application level to work with lists of data or free blocks; we locked the manager once to retrieve 50 or 100 blocks.
 
This still left contention when we were accessing the data... to make sure it wasn't freed from underneath us by another thread.
 
We eliminated this by putting freed data on another list; the items on this list were "expired" after a reasonable time period that ensured there were no references to the item and then it was placed on the free list.
 
One of the interesting things we found was:
 
   Our custom management was fastest.
   MFC/ATL templates (CMap, CArray, etc) were 3-5x slower
   STL were 2-3x slower than the MFC/AFX templates
 
Most applications don't need this type of optimization - after all, it's only what appears to the user that is important.

GeneralRe: Another TakememberTim Smith21 Jan '03 - 5:59 
One of the biggest performance robbers is locking for multithread access
 
Sing it brother. Smile | :)
 
Any time you can use different heaps for your different threads, you will improve performance.
 
Tim Smith
 
I'm going to patent thought. I have yet to see any prior art.
GeneralRe: Another TakesussFrank32521 Jan '03 - 7:03 
Actually my initial inclination was to use seperate heaps too; ie, HeapCreate, HeapAlloc, etc. One of the advantages of seperate heaps is protection from corruption. But the class for using virtual memory was already in place.
 
So I did some reading, and then I did some testing. It is much faster to do a VirtualAlloc and then partition it into blocks than it was to use HeapAlloc/HeapFree. And for our application, we needed the ability to have large memory blocks that were locked into memory.
 
We did use seperate allocations for each type of object, and seperate manager objects for each type.
 
I would guess that pre-allocated memory from a local heap would be just as fast as pre-allocated memory using the virtual functions, though. But the heaps are serialized by default so if you're going to do your own management you might want to turn this off.
 

GeneralRe: Another TakememberTim Smith21 Jan '03 - 7:59 
Heap is a generic term. I wasn't specifically referencing HeapXXX in Win32.
 
Tim Smith
 
I'm going to patent thought. I have yet to see any prior art.
General??? I thought....memberChopper20 Jan '03 - 5:01 
If i'm not mistaken, Windows 2000/XP defragment memory automatically just it reaches critical level, because this is the right level of effective memory management.
 
Regards,
Vitaly

 
Ultimate all-in-one XP-Style UI multiplatform solution: Tooltips, XP Menus, Hyperlinks, Drawing Graphics and formatted documents, plus powerful binary resource reuse. All Free 100%, visit www.tooltips.net now. The most powerful integrate UI library for developers in VC++, C#, Java, VB, Delphi, Borland C++, Borland C++ Builder, as well as any COM-compatible platform; Unified approach, cross-language OS, full .NET integration.

GeneralRe: ??? I thought....memberTim Smith20 Jan '03 - 5:25 
There is no way you can defragment an allocated virtual address space without moving data around in memory.
 
Tim Smith
 
I'm going to patent thought. I have yet to see any prior art.
GeneralRe: ??? I thought....memberChopper20 Jan '03 - 5:32 
But who needs to defragment virtual memory? Only the actual memory has the value for applications. Besides, if you claim being able to defragment virtual memory without reading it into memory you would have to modify the Windows Virtual Memory File directly. I think only Windows can access access its virtual memory directly, or am i wrong here?
 
Regards,
Vitaly

 
Ultimate all-in-one XP-Style UI multiplatform solution: Tooltips, XP Menus, Hyperlinks, Drawing Graphics and formatted documents, plus powerful binary resource reuse. All Free 100%, visit www.tooltips.net now. The most powerful integrate UI library for developers in VC++, C#, Java, VB, Delphi, Borland C++, Borland C++ Builder, as well as any COM-compatible platform; Unified approach, cross-language OS, full .NET integration.

GeneralRe: ??? I thought....memberUri Twig20 Jan '03 - 5:59 
There is NO defragmentation involved in this implementation of the buffer pool, what I claim is that the use of the buffer pool will reduse the fragmentation and reduce the affect that fragmentation caused by other modules in the system on the module that uses the buffer pool.

GeneralRe: ??? I thought....memberChopper20 Jan '03 - 6:01 
ok, we'll see...Smile | :)

 
Ultimate all-in-one XP-Style UI multiplatform solution: Tooltips, XP Menus, Hyperlinks, Drawing Graphics and formatted documents, plus powerful binary resource reuse. All Free 100%, visit www.tooltips.net now. The most powerful integrate UI library for developers in VC++, C#, Java, VB, Delphi, Borland C++, Borland C++ Builder, as well as any COM-compatible platform; Unified approach, cross-language OS, full .NET integration.

GeneralRe: ??? I thought....sussFrank32521 Jan '03 - 7:07 
BTW, I've used the "HeapCompact" api in some GUI applications that ran for long periods of time. Every xth update, I'd call the API on the global heap.

GeneralRe: ??? I thought....memberftai27 Jun '04 - 15:25 
Yes. I also think so . In this case. when using free function, It only sign the specific Index to be unuse. It is not similar to "delete/free". The address is not changed when we use next time . So it reduce the framentation. This only be buffer alloc. not to segment.
GeneralRe: ??? I thought....sussHaggai David20 Jan '03 - 22:45 
As said before, the OS cannot defragment the memory. In my experience an implementation that does not allocate all its memory in bunches, but drop by drop will not perform better. Actually, we use a similar implementation in our product - and I know that others do so as well. I think this generic implementation will do good to your application, and it is easy to use.
 
One more thing that should be mentioned, the overhead of this implementation in terms of memory usage. When using this implementation (which is efficient in my opinion) you get an overhead of one integer for each allocation and some more for each segment. Therefore, one should consider another implementation if your buffer size is small (e.g 1-8 bytes) - in all other cases, this is what you should want to enter to your project.
 

 
Haggai.
GeneralDoug Lea allocatormemberTim Smith19 Jan '03 - 8:28 
http://gee.cs.oswego.edu/dl/[^]
 
When I modified the code to use the Lea allocator instead of VC's allocator, the gap closed to just 7-9% slower than the pooled method. With many applications, using the Lea allocator will allow you to avoid taking the plunge of a pooled allocator. You trade a little bit of performance for a generic allocator.
 
Tim Smith
 
I'm going to patent thought. I have yet to see any prior art.
GeneralRe: Doug Lea allocatormemberUri Twig20 Jan '03 - 6:02 
As I said in the article I did not give a generic allocator solution as the Lea allocator does, I just gave a solution to a specific kind of allocation which is the "predefined size" one.
and I think the Demo project is not the proper benchmark to compare allocators it is just a tool I presented to give a general idea of the optimazation you can gain using the pools, for instance it wont give you a mesurment of the amount of optimazation you gain over time, or the real friction reduction between modules. I think you should give the code a second glance.
thanks for the link it is very absorbing.
GeneralRe: Doug Lea allocatormemberTim Smith20 Jan '03 - 6:22 
Oh no. Don't get me wrong. Buffer pool and specialized allocators have their place. Under certain conditions, they can improve performance and help avoid fragmentation of memory. After all, what you posted did outperform Lea.
 
The only reason I brought up the Lea allocator is that I was under the impression that custom allocators were more beneficial than they really are.
 
http://pag.lcs.mit.edu/reading-group/berger01reconsidering.pdf[^]
 
This paper takes a second look at the common wisdom (which I agreed with) that custom allocators are a powerful tool to squeak out a bit more performance. It turns out that, excluding region based allocators (which is what your code is), the Lea allocator performs just as good or better in the real world than most custom allocators.
 
Tim Smith
 
I'm going to patent thought. I have yet to see any prior art.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 19 Jan 2003
Article Copyright 2003 by Uri Twig
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid