![]() |
Languages »
C / C++ Language »
General
Advanced
Heap Manager for Allocating Memory from a Shared Memory SegmentBy ChulliyanHeap Manager for Allocating Memory from a Shared Memory Segment |
C++, Windows, Visual Studio, Dev
|
||||||||
|
Advanced Search |
|
|
|
||||||||||||||||
Have you ever wanted to �malloc� and �free� memory from a shared memory segment across process boundaries? Have you ever wanted to roll your own memory manager? There are situations where two processes need to allocate memory from the shared memory segment in an efficient manner. This article describes a fast and efficient algorithm to allocate and free memory from any pre-created memory block including a shared memory segment.
There are several techniques to create a heap manager using pre-allocated block of memory. This article describes two algorithms and provides source code for the second algorithm.
This technique divides the memory blocks into fixed size blocks and pushes the blocks into a stack. Memory allocation (�malloc�) is accomplished by �popping� a block out of the stack and de-allocation pushes the freed memory block into stack.
For example; if an application allocates three classes of objects from the heap, first class of objects has size less than 32 bytes, the second class of objects has size less than 128 bytes and third kind of objects have size less than 512 bytes. In such a scenario, the algorithm will first divide the pre-allocated memory block into three contiguous blocks, and then sub-divide the blocks into a list of fixed size sub-blocks (lists of blocks with 32 bytes, 128 bytes and 512 bytes in sizes respectively). Sub-blocks are then pushed into three separate stacks. The stacks need not be a separate data structure; you can always build the stack in place while sub-dividing the memory blocks. When a memory request for 32 bytes or less is made, the heap manager pops a block out of the first stack and returns it to the caller. When a memory request for 128 bytes or less is made, the heap manager pops a block out of the second stack and returns it to the caller and so on. Freeing a memory block is nothing but pushing the block back into the correct stack.
Advantages of this algorithm are listed below.
� Memory allocation and de-allocation are extremely fast
� Heap manager can be fine tuned by the developer by carefully selecting the heap block size
� Very little memory is wasted for heap header information
� Thread level heap management can be implemented by creating heaps per thread basis. Thread level heap can be stored in the Thread Local Storage and accessed without having a global lock
Disadvantages of this algorithm are listed below.
� Heap initialization can be a little slower as all the heap partitioning must be done in advance
� If the stacks are built in place, the �RESERVED but not COMMITTED� feature of the virtual memory cannot be used as all the memory must be accessible to build the in-place stack
� Developer must choose the heap partitioning size carefully to avoid wastage of memory
� Memory will be wasted if a large number of requests fall in the lower boundary of the partition size
� Large blocks of memory (larger than the largest partition size) cannot be allocated
This algorithm behaves more like the �malloc� and �free�. It uses a pair of binary trees to build a heap in the pre-allocated block of memory. Allocation is accomplished by �splaying� the trees to get the best matching tree node and returning the memory represented by the tree node. De-allocation is done by inserting the freed memory block into the trees. Each tree node has a KEY and the value of the key is used to splay the tree to locate the best matching node.
Splay tree is a self adjusting binary tree with nodes having unique keys. Splaying the tree involves rearranging the tree in such a way that the root node of the tree is the node with the right key value. If a node with the right key value cannot be located in the tree, then the root node will be a node with the closest key value. An example is shown below.
If you splay the tree with a key of �3�, the resulting tree will look like as follows.
When the tree is splayed with a key of �5�, the resulting tree will look like as follows.
Length tree balances using the size of the free memory block as the key value. Each tree node represents a block of �free� memory and the node�s key stores the size of the block. Since there can be more than one free memory blocks with the same size, each node in this tree is a linked list containing other nodes.
The pointer tree balances using the memory address as the key value. Each tree node represents a block of �free� memory and the node�s key stores the address of the block. Since each free block of memory will have a unique address, this tree is a proper binary tree. The address key is the offset of the memory address with respect to the beginning of the memory segment.
The heap is initialized by creating both LENGTH_TREE and POINTER_TREE in the heap. Each tree will contain exactly one node representing the entire memory block.
The parameter of the �malloc� is the size of the requested memory block. Therefore, we need to splay the LENGTH_TREE using the request length to get the best matching tree node. The allocation algorithm is described below.
� Splay the LENGTH_TREE to get best matching free memory block (or tree node)
� Remove the node from the LENGTH_TREE and remove the corresponding address node from the POINTER_TREE
� If the memory block is larger than the requested size, split the memory block and insert the remainder into the LENGTH_TREE and POINTER_TREE
� Return the pointer of the block (minus the heap block header)
Pseudo code:
ALLOC (int len)
Begin
TREE_NODE node = SPAY_TREE (LENGTH_TREE, len);
// Remove the node from the trees
//
REMOVE_NODE (LENGTH_TREE, node);
REMOVE_NODE (POINTER_TREE, node);
//
//
If (node.KEY > len)
Begin
TREE_NODE rem = SPLIT_BLOCK (node);
INSERT_NODE (LENGTH_NODE, rem);
INSERT_NODE (POINTER_NODE, rem);
End
Return ADDRESS_OFF (node);
End
The de-allocation is accomplished by inserting the freed block back into the trees. Simply inserting the freed block into the tree will cause the memory to fragment resulting in performance slowdown. The best way to avoid memory fragmentation is to merge the freed block with the successor memory block as well as with the predecessor memory block and thus creating a contiguous block of memory.
The parameter of the �free� is the pointer of the memory block to be freed. This pointer�s header contains the length of the memory block. The successor node�s address can be computed by adding the length of the block to the address of the block. Search the POINTER_TREE to see if the successor block is in the FREE list. If the successor block is in the FREE list, merge the current block with it. If the predecessor block is in the free list, merge the current block with the predecessor,
� Get the successor memory block (CURRENT_BLOCK + length of the block)
� Splay the POINTER_TREE to check if the successor block is in the free list or not
� If the successor is in the free list, merge the current node with it
� Check if the left node of the successor node is actually the predecessor node of the current node
� If a predecessor not is found, merge the current node with it
� Insert the merged node into the POINTER_TREE
� Insert the merged node into the LENGTH_TREE
Pseudo code:
FREE (void* ptr)
Begin
TREE_NODE node = ptr � sizeof (BLOCK_SIZE);
TREE_NODE next = node + SIZE_OF (node);
TREE_NODE tmp = SPAY_TREE (POINTER_TREE, next);
// Try to merge with the successor and predecessor
//
If (tmp != 0)
Begin
IS_PREDECESSOR (tmp.left, node)
MERGE_PREDECESSOR (node, tmp.left);
MERGE_SUCCESSOR (node, next);
End
INSERT_NODE (POINTER_TREE, node);
INSERT_NODE (LENGTH_TREE, node);
End
Shared memory is shared between multiple processes and one of the issues with shared memory segment (In any OS, Windows, Linux, OSX etc) is that the memory segment may not be mapped into the same address location in all the processes. This behavior mandates that all the data structures stored in the shared memory should use the pointer offsets rather than the actual pointers. That means the LENGTH_TREE and POINTER_TREE must not store any pointers in their nodes. The LEFT and RIGHT attributes of the tree nodes are the memory offsets of the left and right tree nodes.
The algorithm supports the �RESERVED but not COMMITTED� feature of Windows virtual memory. This feature is very useful to fine tune the memory usage of the application.
There are two test applications, one written in MFC and the other is a plain console based application. Test application has two modes. In the first mode, it measures the performance of the algorithm with respect to �malloc�. In the second mode the test application launches three more instances of it and allocates and frees memory from the shared heap.
Performance testing can be done by invoking the program with the following command line parameters.
� �/t�: the program measures the performance numbers by creating a local heap. The local heap does not use any lock
� �/tc�: the program measures the performance numbers by creating a local heap and uses a CRITICAL_SECTION object (or pthread_mutex on Linux) as the heap locking mechanism
� �/tm�: the program creates a shared memory heap and uses MUTEX object (or file locker on Linux) as the locking mechanism
In this mode, the test program creates four stacks in the shared memory. Later it will launch three more instances of itself. Each instance will create objects in the shared memory and push those objects into one of the shared stack. Each instance will pop objects out of the other shared stacks and will display the data on the screen and delete the objects. This mode demonstrates how a process can free memory allocated by other processes.
Performance is measured using the local heap and the memory is allocated and freed 50,000 times in random fashion. The same logic is used to measure the performance of �malloc� and �free� as well. Performance measurement is done on Windows and Linux operating systems. �malloc� C-API is directly mapped to �HeapAlloc� API in Windows. In Linux, the �malloc� and �free� are well optimized code. Therefore, the debug version of the test program will not perform as well as �malloc�.
The algorithm performs extremely well when compared to �malloc� in a prolonged usage scenario. It performs much better than �malloc� if you allocate and free large amount of memory frequently. On the other hand, if the application allocates and frees only a small amount of memory (say, less than 4MB) and less frequently, then �malloc� performs better.
The source code tree is arranged as follows.
| You must Sign In to use this message board. | |||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 20 Jun 2006 Editor: |
Copyright 2006 by Chulliyan Everything else Copyright © CodeProject, 1999-2009 Web12 | Advertise on the Code Project |