Click here to Skip to main content
Email Password   helpLost your password?

Introduction

The Bip-Buffer is like a circular buffer, but slightly different. Instead of keeping one head and tail pointer to the data in the buffer, it maintains two revolving regions, allowing for fast data access without having to worry about wrapping at the end of the buffer. Buffer allocations are always maintained as contiguous blocks, allowing the buffer to be used in a highly efficient manner with API calls, and also reducing the amount of copying which needs to be performed to put data into the buffer. Finally, a two-phase allocation system allows the user to pessimistically reserve an area of buffer space, and then trim back the buffer to commit to only the space which was used..

Let's cover a little history first. If you don't already know why a circular buffer can be implemented really efficiently in hardware, or why that makes them the buffer of choice in most electronics, here's why.

Back in Days of Old...

Once upon a time, computers were much simpler. They didn't have 64 bit data buses. Heck, they didn't even have real 16 bit registers - although you could occasionally convince a pair of them to sub in for that purpose. These were simpler times, where Real Men programmed in assembly language, and laughed at anyone who didn't know how to use the carry flag for all kinds of nefarious purposes.

With simpler times came elegant hacks to eke the most power out of every instruction cycle available. Take, for example, a simple terminal communications program. Newer RS232 serial controllers had things like automatic handling of RTS and CTS signal lines to control the flow of data - but this came at a cost. Namely, the connection would be stopping and starting all the time, instead of streaming along. So in between the controller card and the system, would often be found a FIFO. This simple circular buffer was often no more than a couple of bytes long, but it meant that the system could run smoothly along without polling to see if data had arrived, or being hammered by constant interrupts from the serial controller.

Most FIFOs started out on-chip, but people also added their own in their code - the idea being that if you had some really gnarly dancing that you had to do on the incoming data, you may as well batch it all up into one lump and do it infrequently... giving spare time to the system to do other things. Like scroll the console, or decode GIFs.

As I said, a FIFO is a very simple circular buffer. Most are implemented very simply as well; they're typically 2n bytes in size, which allows the pointers to simply overflow to get back around to the other end of the buffer. The FIFO logic can tell if the FIFO is empty because the head and tail values are the same, and it's full if the head is one greater than the tail.

Implementing these in software was easy on the old 8 bit systems. Take a 16 bit register pair. Decide on a location in memory (a multiple of 256) to store the FIFO data in. Then, after setting the register to the start of the buffer, don't touch the high register - just increment the low register. This gives you a 256-byte long buffer which you can walk through in one (in the case of the Zilog Z80, 4 cycles - the smallest execution unit available on that system) instruction. You can never go out of the bounds of your buffer, because the low register acts as an index with a value from 0 to 255. When you hit what would have been index 256, the register overflows and clocks back over to zero.

The Modern Day

Unfortunately, there is no solution quite as elegant available to Windows programmers today as that simple old 8-bit solution. Sure, you can dive down into assembly language (provided you can work out how the compiler maps registers to values... something I've never seen a good enough explanation of to get my head around), but most people don't have time for assembly language any more. And besides, we're dealing with 32 bit registers now - incrementing just one low-order byte from inside that register isn't really all that kosher any more. It can lead to cache flushing, pipeline stalling, printer fires, rains of frog, etc.

If you can't just clock the low-order register to walk through the buffer, you have to start worrying about things like checking to see how much buffer you have filled before the end, making sure that you remember to copy the rest of the data from the start of the buffer, and all kinds of other bookkeeping headaches.

My first attempt at implementing something like this relied on the vague hope that the virtual memory system could be tricked into setting things up in such a way that you could set up a mirror of a section of memory right next to the original. The idea being that you could still use the rotating allocation of data; a copy operation could go at full speed without any checking to see if you'd walked off the end of the buffer - because as far as your process's address space is concerned, the end of your buffer is also the beginning of your buffer.

Now, this mirroring technique may actually work. Due to some restrictions, I decided not to implement it myself (yet - I'm sure I'll find a use for it some day). The idea behind it is that first one reserves two areas of virtual memory, side by side. One then maps the same temporary file into both virtual memory sections. Voila! Instant mirroring, and a nice large buffered expanse one can copy data from willy-nilly.

Unfortunately, while it should (again, I've not tried it) indeed work, there is another problem - namely, that files can only be mapped on 64kb boundaries (possibly larger on larger memory systems). This means that your buffer has to be a minimum of 64kb in size, and will take up 128kb of your virtual address space. Depending on your application, this may be a valid technique. However, I don't see writing a server application with 1000's of sockets being a valid prospect here.

So what to do? If mirroring won't work, how close can we get to using a circular buffer in our code? Heck, even if we can get close, why would we want to?

The Advantages of the Circular Buffer

There are a number of key advantages to using a circular buffer for the temporary storage of data.

When one puts data into a block of memory, one also has to take it out again to make use of it. (Or one can use it in place). It is useful to be able to make use of the data in the buffer while more data is being appended to the buffer. However, as one frees up space at the head of the buffer, that space is no longer usable, unless one copies all of the data in the buffer which has not yet been used to the beginning of the buffer. This frees up space at the end of the buffer, allowing more data to be appended.

There are a couple of ways around this; one can simply copy the data (which is a reasonably expensive proposition), or one can extend the buffer to allow more data to be appended (a massively expensive process).

With a circular buffer, the free space in the buffer is always available to have data appended into it; the data is copied, the pointer adjusted, and that's that. No copying, no reallocation, no worries. The buffer is allocated once, and then remains useful for its entire life.

A Fly In The Ointment

One could simply implement a circular buffer by allocating a chunk of memory, and maintaining pointers. When one walked off the end of the buffer, the pointer would be adjusted - and this operation would be reflected in every operation that is performed, whether copying data into the buffer or removing it. Length calculations are slightly more complicated than normal, but not overly so - simple inline functions take care of that problem with ease, sweeping it under the rug.

Unfortunately, most API calls don't believe in circular buffers. You have to pass them a single contiguous block of memory which they can access, and there is no way for you to modify their write behavior to adjust pointers when they cross the end of the buffer. So what to do? Well, this is where the Bip-Buffer comes in.

Enter The Bip-Buffer

If one cannot pass a circular buffer into an API function, then one needs an alternative that will work - preferably with many of the same advantages as the circular buffer. It is possible to build a circular buffer which is split into two regions - or which is bi-partite (and that's how you get the Bip in a Bip-Buffer). Each of the two regions move through the buffer, starting at the left and ending up at the right hand side. When one runs out of space for appending data, if there is only one region, a new one is created at the beginning (if possible). The diagram below shows how it works in more detail.

[Figure: Data as it moves through the Bip Buffer]

The buffer starts out empty, with no regions present (figure 1). (eg. immediately after calling AllocatedBuffer)

Then, when data is first put into the buffer, a single region (the 'A' region) is created (figure 2). (Say, by calling Reserve followed by Commit)

Data is added to the region, extending it to the right (figure 3).

For the first time now, we remove data from the buffer (figure 4). (see the DecommitBlock call described below)

This continues until the region reaches the end of the buffer (figure 4). Once more free space is available to the left of region A than to the right of it, a second region (comically named "region B") is created in that space. The choice to create a new region when more space is available on the left is made to maximize the potential free space available for use in the buffer. The upshot of all this leaves us with something which looks rather like figure 5.

If we now use up more of the buffer space, we end up with figure 6, with new space only being allocated from the end of region B. If we eventually allocate enough data to use up all of the free space between regions A and B (figure 7), we no longer have any usable space in the buffer, and no more reservations can be performed until we read some data out of it.

If we then read more data out of the buffer (say the entire remaining contents of region A), we exhaust it entirely. At the point, as region A is completely empty, we no longer need to track two separate regions, and all of region B's internal data is copied over region A's internal data, and region B is entirely cleared. (figure 8)

If we read a little more data out of the buffer, we now end up with something a lot like figure 4, and the cycle continues.

Characteristics of the Bip-Buffer

The upshot of all of this is that on average, the buffer always has the maximal amount of free space available to be used, while not requiring any data copying or reallocation to free up space at the end of the buffer.

The biggest difference from an implementation standpoint between a regular circular buffer and the Bip Buffer is the fact that it only returns contiguous blocks. With a circular buffer, you need to worry about wrapping at the end of the buffer area - which is why for example if you look at Larry Antram's Fast Ring Buffer implementation, you'll see that you pass data into the buffer as a pointer and a length, the data from which is then copied byte by byte into the buffer to take into account the wrapping at the edges.

Another possibility which was brought up in the bulletin board (and the person who brought it up shall remain nameless, if just because they... erm... are nameless) was that of just splitting the calls across wraps. Well, this is one way of working around the wrapping problem, but it has the unfortunate side-effect that as your buffer fills, the amount of free space which you pass out to any calls always decreases to 1 byte at the minimum - even if you've got another 128kb of free space at the beginning of your buffer, at the end of it you're still going to have to deal with ever shrinking block sizes. The Bip-Buffer neatly sidesteps this issue by just leaving that space alone if the amount you request is larger than the remaining space at the end of the buffer. When writing networking code, this is very useful; you always want to try to receive as much data as possible, but you never can guarantee how much you're going to get. (For most optimal results, I'd recommend allocating a buffer which is some multiple of your MTU size).

Yes, you are going to lose some of what would have been free space at the end of the buffer. It's a small price to pay for playing nicely with the API.

Use of this buffer does require that one checks twice to see if the buffer has been emptied; as one has to deal with the possibility that there are two regions currently in use. However, the flexibility and performance gains outweigh this minor inconvenience.

The BipBuffer class (full source code provided in the link) has the following signature:

class BipBuffer
{
private:
    BYTE* pBuffer;
    int ixa, sza, ixb, szb, buflen, ixResrv, szResrv;

public:
    BipBuffer();

The constructor initializes the internal variables for tracking regions, and memory pointers to null; it does not allocate any memory for the buffer, in case one needs to use the class in an environment where exception handling cannot be used.

~BipBuffer();

The destructor simply frees any memory which has been allocated to the buffer.

bool AllocateBuffer(int buffersize = 4096);

AllocateBuffer allocates a buffer from virtual memory. The size of the buffer is rounded up to the nearest full page size. The function returns true if successful, or false if the buffer cannot be allocated.

void FreeBuffer();

FreeBuffer frees any memory allocated to the buffer by the call to AllocateBuffer, and releases any regions allocated within the Bip-Buffer.

bool IsInitialized() const;

IsInitialized returns true if the buffer has had memory allocated to it (by calling AllocateBuffer), or false if there is no memory allocated to the buffer.

int GetBufferSize() const;

GetBufferSize returns the total size (in bytes) of the buffer. This may be greater than the value passed into AllocateBuffer, if that value was not a multiple of the system's page size.

void Clear();

Clear ... well... clears the buffer. It does not free any memory allocated to the buffer; it merely resets the region pointers back to null, making the full buffer usable for new data again.

[The Effect of Reserve and Commit]

BYTE* Reserve(int size, OUT int& reserved);

Now to the nitty-gritty. Allocating data in the Bip-Buffer is a two-phase operation. First an area is reserved by calling the Reserve function; then, that area is Committed by calling the Commit function. This allows one to, say, reserve memory for an IO call, and when that IO call fails, pretend it never happened. Or alternatively, in a call to an overlapped WSARecv() function, it allows one to advertise how much memory is available to the network stack to use for incoming data, and then adjust the amount of space used based on how much data was actually read in (which may be less than the requested amount).

To use Reserve, pass in the size of block requested. The function will return the size of the largest free block available which is less than or equal to size in length in the reserved parameter you passed in. It will also return a BYTE* pointer to the area of the buffer which you have reserved.

In the case where the buffer has no space available, Reserve will return a NULL pointer, and reserved will be set to zero.

Note: you cannot nest calls to Reserve and Commit; after calling Reserve you must call Commit before calling Reserve again.

void Commit(int size);

Here's the other half of the allocation. Commit takes a size parameter, which is the number of bytes (starting at the BYTE* you were passed back from Reserve) which you have actually used and want to keep in the buffer. If you pass in zero for this size, the reservation will be completely released, as if you had never reserved any space at all. Alternatively, in a debug build, if one passes in a value greater than the original reservation, an assert will fire. (In a release build, the original reservation size will be used, and no one will be any the wiser). Committing data to the buffer makes it available for routines which take data back out of the buffer.

The diagram above shows how Reserve and Commit work. When you call Reserve, it will return a pointer to the beginning of the gray area above (fig. 1). Say you then only use as much of that buffer as the blue section (fig 2). It'd be a shame to leave this area allocated and going to waste, so you can call Commit with only as much data as you used, which gives you fig. 3 - namely, the committed space extends to fill just the part you needed, leaving the rest free.

int GetReservationSize() const;

If at any time you need to find out if you have a pending reservation, or need to find out that reservation's size, you can call GetReservationSize to find the amount reserved. No reservation? You'll get a zero back.

BYTE* GetContiguousBlock(OUT int& size);

Well, after all this work to put stuff into the buffer, we'd better have a way of getting it out again.

First of all, what if you need to work out how much data (total) is available to be read from the buffer?

int GetCommittedSize() const;

One method is to call GetCommittedSize, which will return the total length of data in the buffer - that's the total size of both regions combined. I would not recommend relying on this number, because it's very easy to forget that you have two regions in the Bip-Buffer if you do. And that would be a bad thing (as several weeks of painful debugging experience has proved to me). As an alternative, you can call:

BYTE* GetContiguousBlock(OUT int& size);

... which will return a BYTE* pointer to the first (as in FIFO, not left-most) contiguous region of committed data in the buffer. The size parameter is also updated with the length of the block. If no data is available, the function returns NULL (and the size parameter is set to zero).

In order to fully empty the buffer, you may wish to loop around, calling GetContiguousBlock until it returns NULL. If you're feeling miserly, you can call it only twice. However, I'd recommend the former; it means you can forget that there's two regions, and just remember that there's more than one.

void DecommitBlock(int size);

So what do you do after you've consumed data from the buffer? Well, in keeping with the spirit of the aforementioned Reserve and Commit calls, you then call DecommitBlock to release data from it. Data is released in FIFO order, from the first contiguous block only - so if you're going to call DecommitBlock, you should do it pretty shortly after calling GetContiguousBlock. If you pass in a size of greater than the length of the contiguous block, then the entire block is released - but none of the other block (if present) is released at all. This is a deliberate design choice to remind you that there is more than one block and you should act accordingly. (If you really need to be able to discard data from blocks you've not read yet, it's not too difficult to copy the DecommitBlock function and modify it so that it operates on both blocks; just unwrap the if statement, and adjust the size parameter after the first clause. Implementation of this is left as the dreaded youknowwhat).

And that's the Bip-Buffer implementation done. A short example of how to use it is provided below.

#include "BipBuffer.h"


BipBuffer buffer;
SOCKET s;
bool read_EOF;

bool StartUp
{
    // Allocate a buffer 8192 bytes in length

    if (!buffer.AllocateBuffer(8192)) return false;
    readEOF = false;

    s = socket(...

    ... do something else ...
}

void Foo()
{
    _ASSERTE(buffer.IsValid());

    // Reserve as much space as possible in the buffer:

    int space;
    BYTE* pData = buffer.Reserve(GetBufferSize(), space);

    // We now have *space* amount of room to play with.


    if (pData == NULL) return;

    // Obviously we've not emptied the buffer recently

    // because there isn't any room in it if we return.


    // Let's use the buffer!

    int recvcount = recv(s, (char*)pData, space, 0);

    if (recvcount == SOCKET_ERROR) return;
    // heh... that's some kind of error handling...


    // We now have data in the buffer (or, if the

    // connection was gracefully closed, we don't have any)


    buffer.Commit(recvcount);

    if (recvcount == 0) read_EOF = true;

}

void Bar()
{
    _ASSERTE(buffer.IsValid());

    // Let's empty the buffer.


    int allocated;
    BYTE* pData;

    while (pData = buffer.GetContiguousBlock(allocated)
           != NULL)
    {
        // Let's do something with the data.


        fwrite(pData, allocated, 1, outputfile);

        // (again, lousy error handling)


        buffer.DecommitBlock(allocated);
    }
}

Adieu

And so we reach the end. I hope that you find this code and way of managing data useful; I've certainly found that it comes in very handy for writing networking code. If you do find it useful, or use it in any of your code, all that I ask in return is that you drop me an email and let me know how the code is being used (what kind of project, what company, etc). Be vague if NDAs would get in the way - it's nice to know that it's out there, alive, and doing cool things.

Version History

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
Generalan alternative approach
rharding64
7:09 11 Jul '08  
i am designing a small data acquisition system that consists of the following components;
1. PIC16F877A microcontroller running at 20MHZ,8 bit a/d converter
2. FRAM(ferromagnetic RAM) via I2C (inter-integrated circuit protocol (like SPI, but simpler))
3. FTDI FT2232 USB chip -
4. C# based control application, p/invoke over the unmanaged drivers to c#(dllImport)

data input:
analog voltage 0 to 2.5V at sample rate of 10us for a time period of one hour.

the pc is not an RTOS, so microcontroller with on-board 8bit a/d converter is best suited for this purpose, code written in C.

FRAM devices are primarily intended for use in data collection devices, because they can write data at BUS SPEED!

since data is transferred in blocks over the USB connection, it is best to buffer up several blocks of 512 bytes on-board the microcontroller FRAM device to avoid interruptions in the data stream.

since in .NET the limitation for elapsed event handlers is around 1 to 10ms, it can take in blocks of 512 bytes from the microcontroller. actually, at a 10us sampling rate, the time to capture a 512 byte block will be over 5.1ms.

i just wanted to put in my 2 cents, 8 bit machines are STILL being used(heck, your car has a CAN network with a bunch of microcontrollers).

have a great day
Big Grin
Generalnice work... [modified]
gsky
10:12 29 Aug '06  
i would like to say nice work on your api, simple and elegant, much better than what i was trying to do. I didn't use you buffer per say, but used your elegant api to set up a true circular buffer with the blocks on the ends, and your request->commit->get->decommit api was simply elegant. keep up the good work, by the way i didn't use a bip buffer as i too need the buffer to grow dynamically. nice job!! people like you are the reason i can code at all, keep sharing. if i had too rely of what my "edumacation" learned me, it wouldn't be able to do anything;P. much appreciatedBig Grin .

gsky


-- modified at 15:22 Tuesday 29th August, 2006
GeneralHow about Bip Buffer extension?
Quynh Nguyen
7:56 1 Aug '06  
Your library is very useful especially for network programming.

However it's better if your class has capacity of automatically memory expanding. In other words, end-user will see that your buffer is not "limited" in size.
GeneralRe: How about Bip Buffer extension?
Simon Cooke
14:49 1 Aug '06  
I'm not sure I agree here... the idea behind the code is that your buffer doesn't have to expand; it's meant to be fixed in size and use that space efficently, rather than growing to fit any and all data that's thrown at it. Otherwise you could simply use a linked list of fixed size buffers.
GeneralRe: How about Bip Buffer extension?
Quynh Nguyen
4:54 2 Aug '06  
Simon Cooke wrote:
...rather than growing to fit any and all data that's thrown at it. Otherwise you could simply use a linked list of fixed size buffers.

I agree with you and that was also my thought when writing previous message.

Let me explain a little on the fact: Currently, I'm searching for a memory management library that should be high efficient in "first allocated, first freed" situation to be used in an IOCP socket library. There are 3 important things need to be achieved.

1) Very efficient for "first allocated, first freed" situation
2) Memory address must be fixed after allocated (can not be moved around) while an IO operation is still pending in background.

--> Bip Buffer satisfies the two above conditions.

3) The buffer can be expanded automatically

--> linked list of Bip Buffer may be the answer. Please share with me if you think that there should be another solution for this problem. Thank you.
GeneralBug in example?
RichardC
16:50 9 Jan '05  
Hey there
Nice article but just wondering if there's a bug in the example code:

...
if (recvcount == SOCKET_ERROR) return;
...

Shouldn't that be:

if (recvcount == SOCKET_ERROR)
{
      buffer.Commit(0);
      return;
}

otherwise your reserved space remains reserved, right?

Cheers

- Richard
GeneralFlaw in FreeBuffer()
Soenke Schau
10:01 19 Oct '04  
Hello,
in FreeBuffer you coded
::VirtualFree(pBuffer, buflen, MEM_DECOMMIT);
but it will eat up all your virtual memory if used frequently as MEM_DECOMMIT leaves the memory region previously used as still reserved memory in your working set.
The line should read
::VirtualFree(pBuffer, 0, MEM_RELEASE);

Thanks for the nice piece of code.
Best regards
Soenke
GeneralRe: Flaw in FreeBuffer()
Simon Cooke
11:07 19 Oct '04  
Oooh - good catch, thanks!

I probably didn't hit this because I typically only allocate one or two buffers for the life of the app.

I'll get this fixed in an update asap.

Thanks again!
Simon
GeneralNice, very useful.
Max LI
14:12 20 Sep '04  
good idea, tidy implementation.
GeneralImages are broken on Mozilla
Ernesto Perales Soto
4:49 13 May '03  
If you use Mozilla to read your article, images are not shown. I think this happens because the SRC of the images contains backslashes...

Nice article, btw!

eperales
GeneralSome erros in example
DetlevH
0:55 11 Mar '03  
Hi,

there is no member IsValid(). I think, you mean IsInitialized() instead.

Made some changes for my system (No stuff like VirtualAlloc...), and it works very nice.



Best regards,
Detlev
GeneralNice article
Anthony_Yio
15:50 23 Feb '03  
Simple idea for a broad range usage.
GeneralVery nice springboard.
WREY
16:04 13 Jan '03  
This article has laid the foundation and created for readers, the impetus for more creative research on the subject. In that regard, I believe I am one of its customers who will take up the challenge.

Having done something like this a few years ago, it has rekindled the creative juice to explore ways for taking this concept one notch higher, though that will come ONLY after many long hours in the lab. Recalling some of the pitfalls and errors which slowed my efforts (and ultimately sidetracked it), after reading this article, I feel rejuvenated to step back into the ring with this killer! Wink

Excellent piece!!

Cool

William
GeneralHow is this different?
Anonymous
21:46 7 Jan '03  
Very clever solution, but how is this different from using a circular buffer and splitting any calls that would need to wrap around?

ie, you have your buffer with space for some number of elements:
|--------|
If tail>head then you have one contiguous block, starting at H and going to T:
|--H##T--|
If head>tail then you have two contiguous blocks, one starting at H and ending at the end of the buffer, and a second starting at the beginning of the buffer and ending at T:
h#T----H#t
The only difference I can see is that you've let the end and start pointers move to someplace other than the ends, but I don't understand how that would be an improvement...
GeneralRe: How is this different?
Simon Cooke
22:56 7 Jan '03  
The biggest difference is that when you approach the end of the buffer, you're not parceled out small chunks of free space.

For example, using the circular buffer you have shown above, if you asked for "as big a buffer as is available", and you're in this situation:


|--------------[####]----|

You'll only get back a buffer which is 4 blocks in length.

With my method, you get back the bigger area, which is 14 blocks in length.

As you say though, it is indeed exactly the same (other than that minor point, which improves potential throughput) as splitting any calls which would need to wrap around. On the plus side, it's also neatly wrapped in a nice, prepackaged, low-impact form. It also makes adjustment of the size of buffer you need after making the initial reservation much easier; if you think you might get 100 bytes, but only get 32, you can recover without having to do some serious messing around.

Not to mention the fact that I don't believe I've seen any circular buffers which allow one to easily pass data into it other than one byte at a time (or the equivalent byte-by-byte copy operation handled within the implementation); the contiguous nature of the returned buffer regions works exceptionally well for use with standard API functions which expect a standard, unwrapped buffer. If nothing else, you gain cache coherency, the ability to use DMA on the write to the buffer, better optimized code (no need for checks within the inner loop other than a counter), better use of available space (you can always get the biggest block available in the buffer), and on top of that, it's all done for you without any heartache.

Simon
GeneralRe: How is this different?
Andrew Phillips
17:25 11 May '03  
> ... but how is this different from using a circular buffer and splitting any calls that would need to wrap around?

What if you don't want to split the calls?

I did something similar to this where I was receiving packets from 1 to 1024 bytes in size. Since the order the packets were received need to be preserved and (for efficiency) I wanted to access the packets directly from the buffer (and pass to other functions that expected a continuous block) this sort of algorithm was the only solution. If I had used a simple circular buffer then it would have required the copying of the 2 parts of a packet if it wrapped around at the end of the buffer.

Using a buffer of 4K or greater I could be sure that all packets were stored in contiguous memory in the buffer, and no copying was required.

The algorithm was useful in that specific circumstance but I am not sure about how useful it would be in other situations, such as streams with no natural chunks or where the input chunks do not match the output chunks.

For example (as in another application I wrote), if reading data with internal structures as 8K blocks from tape. Here the output chunk almost always overlaps the input chunks and the above algorithm would be useless for allowing access to contiguous chunks. For this Simon's idea of contiguous memory mapped files mapping to the same physical memory sounds very interesting. Here reading would work as with a normal circular buffer (since tape drives usually do DMA to physical memory), but you could then access structures that span the end of the buffer as one chunk.


Andrew Phillips
andrew @ expertcomsoft.com
Generalyo yo...
lauren
13:58 7 Jan '03  
nice article
nice website too
am gonna be in seattle in march -ish
fancy a beer?
Smile


"traffic lights are for people who can't make their own decisions"
biz stuff   about me
GeneralNice.
Nitron
3:23 7 Jan '03  
Well presented, Simon. Now I just have to start a new project where i can use it... Suspicious



- Nitron
"Those that say a task is impossible shouldn't interrupt the ones who are doing it." - Chinese Proverb


GeneralUse?
peterchen
23:54 6 Jan '03  
I'm just trying to grasp it:
As I understand, the second partition is to get better "throughput" of the buffer, ok.

Wouldn't this be valuable mostly in an asynchronous writer/reader scenario, where the buffer allocations stall if no memory is available? Your class seems to lack any synchronization measures, but I can't see a use of it without it. or am I missing something?

Also: in AllocateBuffer, you use:
// Calculate nearest page size
buffersize = ((buffersize + ((si.dwPageSize >> 1) - 1)) / si.dwPageSize) * si.dwPageSize;
This would round down e.g. a size of 5000 bytes to 4096, right?
IMO this is dangerous, as I never would expect this behavior from an allocation function (expected: allocate at least as much bytes as requested). And I guess I'm not alone Wink

Do you have some statistics how much the throughput is improved? The two-partition idea is cool, but as it defeats the simplicity of a ring buffer, I would like some reasoning. (Also, your implementation is no 1:1 replacement, i.e. the user caller needs to be aware that it's a bip)

Also, it would be nice if you present the main idea (ring buffer with two partitions) and it's advantage (faster throughput?) earlier in the article, e.g. in the subtitle ("a two-partition ringbuffer for faster throughput" or in a short "Abstract" paragraph)

Anyway, cool idea, clean implementation, nice presentation, 4/5

(And I have to try out the "Map two views of the same file to adjancted memory areas" tonight - it's too good to let it slip through...)


Those who not hear the music think the dancers are mad.  [sighist] [Agile]
GeneralRe: Use?
Simon Cooke
0:31 7 Jan '03  
The AllocateBuffer code... what can I say, other than... erm... oops!

Thanks for catching that one.

The corrected code (until I can update it tomorrow) is:

buffersize  = ((buffersize + si.dwPageSize - 1) / si.dwPageSize) * si.dwPageSize;

I'm actually using the buffer in an asynchronous reader/writer right now. The thing is, I keep everything in one thread, where each time I roll through, I basically do:
  1. Allocate space for read
  2. Do overlapped read.
  3. While waiting for completion, attempt to deserialize packets from the buffer

... as it's all in the same thread, no synchronization is necessary. Typically, if I needed to synchronize, I'd do something like this:

class Synchronizable
{
CRITICAL_SECTION cs;
public:
Synchronizable()
{
::InitializeCriticalSection(&cs);
}
~Synchronizable()
{
::DeleteCriticalSection(&cs);
}

void Lock()
{
::EnterCriticalSection(&cs);
}

void Unlock()
{
::LeaveCriticalSection(&cs);
}
};

class Synchronize
{
Synchronizable& cs_ref;

public:
Synchronize(Synchronizable& cs) :
cs_ref(cs)
{
cs_ref.Lock();
}

~Synchronize()
{
cs_ref.Unlock();
}
};

class SyncBipBuffer : public BipBuffer, public Synchronizable
{
}

SyncBipBuffer buffer;

void Foo() {
{
Synchronize to(buffer);
// Do something
}

// potentially do something else...
}

It does require a little more effort on the developer's part, but it also lets you place the locks where you need them, instead of having critical section synchronization at every single level in your code.

I don't have any hard stats as to the increase in performance, but if you're not continually draining the buffer, you could expect to reach a state ~ 50% of the time where less than half of the free space of the buffer is available to you. I'll see if I can mock something up for version two of the article Smile

Thanks for your other hints as well - as soon as I get the chance, I'll see if I can shoehorn them in Smile

Simon
GeneralRe: Use?
peterchen
10:39 7 Jan '03  
Simon Cooke wrote: Allocate space for read,Do overlapped read.While waiting for completion, attempt to deserialize packets from the buffer
yep, ok, there it makes sense (provided the speed data arrives at varies), and you don't need extra synchronization.




Those who not hear the music think the dancers are mad.  [sighist] [Agile]
GeneralRe: Use?
Phil Speller
0:40 7 Jan '03  
peterchen wrote: I'm just trying to grasp it:
As I understand, the second partition is to get better "throughput" of the buffer, ok.

Nope, re-read the article - it's about supplying contigious memory regions to an API fn.




Phil
GeneralRe: Use?
peterchen
10:30 7 Jan '03  
But if reader and writer are not decoupled (i.e. one might outrun the other), why not use asimple buffer?


Those who not hear the music think the dancers are mad.  [sighist] [Agile]
GeneralVery useful
Phil Speller
23:38 6 Jan '03  
Elegant solution to a common problem. I always end up implementing a memory pool that uses a linked-list to manage allocation within it - pretty standard method similar to implementations of malloc or new. This has definate performance benefits.



Phil
GeneralRe: Very useful
Simon Cooke
0:44 7 Jan '03  
Thanks Phil Smile


Last Updated 10 May 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010