Click here to Skip to main content
6,305,776 members and growing! (16,998 online)
Email Password   helpLost your password?
Languages » C / C++ Language » General     Advanced

Heap Manager for Allocating Memory from a Shared Memory Segment

By Chulliyan

Heap Manager for Allocating Memory from a Shared Memory Segment
C++, Windows, Visual Studio, Dev
Posted:20 Jun 2006
Views:23,882
Bookmarked:30 times
Unedited contribution
Announcements
Loading...
 
Search    
Advanced Search
printPrint   Broken Article?Report       add Share
  Discuss Discuss   Recommend Article Email
9 votes for this article.
Popularity: 4.04 Rating: 4.23 out of 5
1 vote, 11.1%
1

2

3
1 vote, 11.1%
4
7 votes, 77.8%
5

Heap Manager for Allocating Memory from a Shared Memory Segment

 

Introduction

Have you ever wanted to �malloc� and �free� memory from a shared memory segment across process boundaries? Have you ever wanted to roll your own memory manager? There are situations where two processes need to allocate memory from the shared memory segment in an efficient manner. This article describes a fast and efficient algorithm to allocate and free memory from any pre-created memory block including a shared memory segment.

 

There are several techniques to create a heap manager using pre-allocated block of memory. This article describes two algorithms and provides source code for the second algorithm.

Fixed Size Partitioning Algorithm

This technique divides the memory blocks into fixed size blocks and pushes the blocks into a stack. Memory allocation (�malloc�) is accomplished by �popping� a block out of the stack and de-allocation pushes the freed memory block into stack.

For example; if an application allocates three classes of objects from the heap, first class of objects has size less than 32 bytes, the second class of objects has size less than 128 bytes and third kind of objects have size less than 512 bytes. In such a scenario, the algorithm will first divide the pre-allocated memory block into three contiguous blocks, and then sub-divide the blocks into a list of fixed size sub-blocks (lists of blocks with 32 bytes, 128 bytes and 512 bytes in sizes respectively). Sub-blocks are then pushed into three separate stacks. The stacks need not be a separate data structure; you can always build the stack in place while sub-dividing the memory blocks. When a memory request for 32 bytes or less is made, the heap manager pops a block out of the first stack and returns it to the caller. When a memory request for 128 bytes or less is made, the heap manager pops a block out of the second stack and returns it to the caller and so on. Freeing a memory block is nothing but pushing the block back into the correct stack.

 

Advantages of this algorithm are listed below.

       Memory allocation and de-allocation are extremely fast

       Heap manager can be fine tuned by the developer by carefully selecting the heap block size

       Very little memory is wasted for heap header information

       Thread level heap management can be implemented by creating heaps per thread basis. Thread level heap can be stored in the Thread Local Storage and accessed without having a global lock

 

Disadvantages of this algorithm are listed below.

       Heap initialization can be a little slower as all the heap partitioning must be done in advance

       If the stacks are built in place, the �RESERVED but not COMMITTED� feature of the virtual memory cannot be used as all the memory must be accessible to build the in-place stack

       Developer must choose the heap partitioning size carefully to avoid wastage of memory

       Memory will be wasted if a large number of requests fall in the lower boundary of the partition size

       Large blocks of memory (larger than the largest partition size) cannot be allocated

 

Dynamic Heap Allocation Algorithm

This algorithm behaves more like the �malloc� and �free�. It uses a pair of binary trees to build a heap in the pre-allocated block of memory. Allocation is accomplished by �splaying� the trees to get the best matching tree node and returning the memory represented by the tree node. De-allocation is done by inserting the freed memory block into the trees. Each tree node has a KEY and the value of the key is used to splay the tree to locate the best matching node.

          Splay Tree

Splay tree is a self adjusting binary tree with nodes having unique keys. Splaying the tree involves rearranging the tree in such a way that the root node of the tree is the node with the right key value. If a node with the right key value cannot be located in the tree, then the root node will be a node with the closest key value. An example is shown below.

 

Tree1 

            If you splay the tree with a key of �3�, the resulting tree will look like as follows.

 

Tree2 

            When the tree is splayed with a key of �5�, the resulting tree will look like as follows.

Tree3 

          Length Tree (LENGTH_TREE)

Length tree balances using the size of the free memory block as the key value. Each tree node represents a block of �free� memory and the node�s key stores the size of the block. Since there can be more than one free memory blocks with the same size, each node in this tree is a linked list containing other nodes.

Pointer Tree (POINTER_TREE)

The pointer tree balances using the memory address as the key value. Each tree node represents a block of �free� memory and the node�s key stores the address of the block. Since each free block of memory will have a unique address, this tree is a proper binary tree. The address key is the offset of the memory address with respect to the beginning of the memory segment.

 

          Initialization

The heap is initialized by creating both LENGTH_TREE and POINTER_TREE in the heap. Each tree will contain exactly one node representing the entire memory block.

Allocation

The parameter of the �malloc� is the size of the requested memory block. Therefore, we need to splay the LENGTH_TREE using the request length to get the best matching tree node. The allocation algorithm is described below.

 

       Splay the LENGTH_TREE to get best matching free memory block (or tree node)

       Remove the node from the LENGTH_TREE and remove the corresponding address node from the POINTER_TREE

       If the memory block is larger than the requested size, split the memory block and insert the remainder into the LENGTH_TREE and POINTER_TREE

       Return the pointer of the block (minus the heap block header)

 

Pseudo code:

 

ALLOC (int len)

Begin

TREE_NODE node = SPAY_TREE (LENGTH_TREE, len);

 

// Remove the node from the trees

//

REMOVE_NODE (LENGTH_TREE, node);

REMOVE_NODE (POINTER_TREE, node);

 

// Split the block if the requested length is less than block size

//

                                    If (node.KEY > len)

                                    Begin

                                                TREE_NODE rem = SPLIT_BLOCK (node);

                                               

INSERT_NODE (LENGTH_NODE, rem);

INSERT_NODE (POINTER_NODE, rem);

End

 

Return ADDRESS_OFF (node);

                        End

 

De-allocation

The de-allocation is accomplished by inserting the freed block back into the trees. Simply inserting the freed block into the tree will cause the memory to fragment resulting in performance slowdown. The best way to avoid memory fragmentation is to merge the freed block with the successor memory block as well as with the predecessor memory block and thus creating a contiguous block of memory.

The parameter of the �free� is the pointer of the memory block to be freed. This pointer�s header contains the length of the memory block. The successor node�s address can be computed by adding the length of the block to the address of the block. Search the POINTER_TREE to see if the successor block is in the FREE list. If the successor block is in the FREE list, merge the current block with it. If the predecessor block is in the free list, merge the current block with the predecessor,

 

       Get the successor memory block (CURRENT_BLOCK + length of the block)

       Splay the POINTER_TREE to check if the successor block is in the free list or not

       If the successor is in the free list, merge the current node with it

       Check if the left node of the successor node is actually the predecessor node of the current node

       If a predecessor not is found, merge the current node with it

       Insert the merged node into the POINTER_TREE

       Insert the merged node into the LENGTH_TREE

 

Pseudo code:

 

FREE (void* ptr)

Begin

      TREE_NODE node = ptr � sizeof (BLOCK_SIZE);

      TREE_NODE next = node + SIZE_OF (node);

TREE_NODE tmp = SPAY_TREE (POINTER_TREE, next);

 

// Try to merge with the successor and predecessor

//

If (tmp != 0)

Begin

            IS_PREDECESSOR (tmp.left, node)

MERGE_PREDECESSOR (node, tmp.left);

MERGE_SUCCESSOR (node, next);

End

 

INSERT_NODE (POINTER_TREE, node);

INSERT_NODE (LENGTH_TREE, node);

                        End

 

Shared Memory

Shared memory is shared between multiple processes and one of the issues with shared memory segment (In any OS, Windows, Linux, OSX etc) is that the memory segment may not be mapped into the same address location in all the processes. This behavior mandates that all the data structures stored in the shared memory should use the pointer offsets rather than the actual pointers. That means the LENGTH_TREE and POINTER_TREE must not store any pointers in their nodes. The LEFT and RIGHT attributes of the tree nodes are the memory offsets of the left and right tree nodes.

The algorithm supports the �RESERVED but not COMMITTED� feature of Windows virtual memory. This feature is very useful to fine tune the memory usage of the application.

Test Application

There are two test applications, one written in MFC and the other is a plain console based application. Test application has two modes. In the first mode, it measures the performance of the algorithm with respect to �malloc�. In the second mode the test application launches three more instances of it and allocates and frees memory from the shared heap.

 

Performance Testing Mode

Performance testing can be done by invoking the program with the following command line parameters.

        �/t�: the program measures the performance numbers by creating a local heap. The local heap does not use any lock

       �/tc�: the program measures the performance numbers by creating a local heap and uses a CRITICAL_SECTION object (or pthread_mutex on Linux) as the heap locking mechanism

       �/tm�: the program creates a shared memory heap and uses MUTEX object (or file locker on Linux) as the locking mechanism

Inter-process Allocation and De-Allocation Mode

In this mode, the test program creates four stacks in the shared memory. Later it will launch three more instances of itself. Each instance will create objects in the shared memory and push those objects into one of the shared stack. Each instance will pop objects out of the other shared stacks and will display the data on the screen and delete the objects. This mode demonstrates how a process can free memory allocated by other processes.

 

Performance

            Performance is measured using the local heap and the memory is allocated and freed 50,000 times in random fashion. The same logic is used to measure the performance of �malloc� and �free� as well. Performance measurement is done on Windows and Linux operating systems. �malloc� C-API is directly mapped to �HeapAlloc� API in Windows. In Linux, the �malloc� and �free� are well optimized code. Therefore, the debug version of the test program will not perform as well as �malloc�.

 

The algorithm performs extremely well when compared to �malloc� in a prolonged usage scenario. It performs much better than �malloc� if you allocate and free large amount of memory frequently. On the other hand, if the application allocates and frees only a small amount of memory (say, less than 4MB) and less frequently, then �malloc� performs better.

Source Code

The source code tree is arranged as follows.

 

  • Directory �./common� contains the algorithm implementation and support classes
  • Directory �./heap� contains the test program
  • Heap.vcproj is the VS 2005 project file for building the GUI test application
  • Heap_c.vcproj is the VS 2005 project file for building the console test application
  • GUI binary is located in �./heap/Release� directory
  • Console binary is located in ./heap/Release_c� directory

 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Chulliyan


Member
I have been writing code for a living for last three hundred years! I have written code in almost all the languages- C/C++, JAVA, C#, VB, Pascal, Delphi, JScript and so on.
Occupation: Web Developer
Location: United States United States

Other popular C / C++ Language articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 4 of 4 (Total in Forum: 4) (Refresh)FirstPrevNext
GeneralFree()'s memory accounting flaw Pinmembermicbarton4:32 8 Jan '07  
GeneralGreat article! PinmemberQuynh Nguyen9:10 27 Jun '06  
GeneralRe: Great article! Pinmemberdude1239:40 27 Jun '06  
GeneralRe: Great article! PinmemberQuynh Nguyen9:48 27 Jun '06  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 20 Jun 2006
Editor:
Copyright 2006 by Chulliyan
Everything else Copyright © CodeProject, 1999-2009
Web12 | Advertise on the Code Project