Click here to Skip to main content
Click here to Skip to main content

Heap Manager for Allocating Memory from a Shared Memory Segment

, 20 Jun 2006
Rate this:
Please Sign up or sign in to vote.
A heap manager for allocating memory from a shared memory segment.

Introduction

Have you ever wanted to malloc and free memory from a shared memory segment across process boundaries? Have you ever wanted to roll your own memory manager? There are situations where two processes need to allocate memory from the shared memory segment in an efficient manner? This article describes a fast and efficient algorithm to allocate and free memory from any pre-created memory block, including a shared memory segment.

There are several techniques to create a heap manager using a pre-allocated block of memory. This article describes two algorithms, and provides source code for the second algorithm.

Fixed Size Partitioning Algorithm

This technique divides the memory blocks into fixed size blocks, and pushes the blocks into a stack. Memory allocation (malloc) is accomplished by popping a block out of the stack, and de-allocation pushes the freed memory block into the stack.

For example: if an application allocates three classes of objects from the heap, and the first class of objects has size less than 32 bytes, the second class of objects has size less than 128 bytes, and the third kind of objects have size less than 512 bytes. In such a scenario, the algorithm will first divide the pre-allocated memory block into three contiguous blocks, and then sub-divide the blocks into a list of fixed size sub-blocks (lists of blocks with 32 bytes, 128 bytes, and 512 bytes in sizes, respectively). Sub-blocks are then pushed into three separate stacks. The stacks need not be a separate data structure; you can always build the stack in place while sub-dividing the memory blocks. When a memory request for 32 bytes or less is made, the heap manager pops a block out of the first stack and returns it to the caller. When a memory request for 128 bytes or less is made, the heap manager pops a block out of the second stack and returns it to the caller, and so on. Freeing a memory block is nothing but pushing the block back into the correct stack.

Advantages of this algorithm are listed below:

  • Memory allocation and de-allocation are extremely fast.
  • Heap manager can be fine tuned by the developer by carefully selecting the heap block size.
  • Very little memory is wasted for heap header information.
  • Thread level heap management can be implemented by creating heaps per thread basis. A thread level heap can be stored in the Thread Local Storage, and accessed without having a global lock.

Disadvantages of this algorithm are listed below:

  • Heap initialization can be a little slower as all the heap partitioning must be done in advance.
  • If the stacks are built in place, the 'RESERVED but not COMMITTED' feature of virtual memory cannot be used as all the memory must be accessible to build the in-place stack.
  • Developer must choose the heap partitioning size carefully to avoid wastage of memory.
  • Memory will be wasted if a large number of requests fall in the lower boundary of the partition size.
  • Large blocks of memory (larger than the largest partition size) cannot be allocated.

Dynamic Heap Allocation Algorithm

This algorithm behaves more like malloc and free. It uses a pair of binary trees to build a heap in the pre-allocated block of memory. Allocation is accomplished by splaying the trees to get the best matching tree node and returning the memory represented by the tree node. De-allocation is done by inserting the freed memory block into the trees. Each tree node has a key, and the value of the key is used to splay the tree to locate the best matching node.

Splay Tree

Splay tree is a self adjusting binary tree with nodes having unique keys. Splaying the tree involves rearranging the tree in such a way that the root node of the tree is the node with the right key value. If a node with the right key value cannot be located in the tree, then the root node will be a node with the closest key value. An example is shown below.

Tree1

If you splay the tree with a key of 3, the resulting tree will look like:

Tree2

When the tree is splayed with a key of 5, the resulting tree will look like:

Tree3

Length Tree (LENGTH_TREE)

Length tree balances using the size of the free memory block as the key value. Each tree node represents a block of 'free' memory and the node's key stores the size of the block. Since there can be more than one free memory blocks with the same size, each node in this tree is a linked list containing other nodes.

Pointer Tree (POINTER_TREE)

The pointer tree balances using the memory address as the key value. Each tree node represents a block of 'free' memory and the node's key stores the address of the block. Since each free block of memory will have a unique address, this tree is a proper binary tree. The address key is the offset of the memory address with respect to the beginning of the memory segment.

Initialization

The heap is initialized by creating both LENGTH_TREE and POINTER_TREE in the heap. Each tree will contain exactly one node representing the entire memory block.

Allocation

The parameter for the malloc is the size of the requested memory block. Therefore, we need to splay the LENGTH_TREE using the request length to get the best matching tree node. The allocation algorithm is described below:

  • Splay the LENGTH_TREE to get the best matching free memory block (or tree node)
  • Remove the node from the LENGTH_TREE and remove the corresponding address node from the POINTER_TREE
  • If the memory block is larger than the requested size, split the memory block and insert the remainder into the LENGTH_TREE and POINTER_TREE
  • Return the pointer of the block (minus the heap block header)

Pseudo code:

ALLOC (int len)
Begin
    TREE_NODE node = SPAY_TREE (LENGTH_TREE, len);
 
    // Remove the node from the trees
    //
    REMOVE_NODE (LENGTH_TREE, node);
    REMOVE_NODE (POINTER_TREE, node);
 
    // Split the block if the requested length is less
    // than block size
    If (node.KEY > len)
    Begin
        TREE_NODE rem = SPLIT_BLOCK (node);
        INSERT_NODE (LENGTH_NODE, rem);
        INSERT_NODE (POINTER_NODE, rem);
    End
 
    Return ADDRESS_OFF (node);
End

De-allocation

The de-allocation is accomplished by inserting the freed block back into the trees. Simply inserting the freed block into the tree will cause the memory to fragment, resulting in performance slowdown. The best way to avoid memory fragmentation is to merge the freed block with the successor memory block as well as with the predecessor memory block, and thus creating a contiguous block of memory.

The parameter for free is the pointer of the memory block to be freed. This pointer's header contains the length of the memory block. The successor node's address can be computed by adding the length of the block to the address of the block. Search the POINTER_TREE to see if the successor block is in the free list. If the successor block is in the free list, merge the current block with it. If the predecessor block is in the free list, merge the current block with the predecessor:

  • Get the successor memory block (CURRENT_BLOCK + the length of the block)
  • Splay the POINTER_TREE to check if the successor block is in the free list or not
  • If the successor is in the free list, merge the current node with it
  • Check if the left node of the successor node is actually the predecessor node of the current node
  • If a predecessor is not found, merge the current node with it
  • Insert the merged node into the POINTER_TREE
  • Insert the merged node into the LENGTH_TREE

Pseudo code:

FREE (void* ptr)
Begin
    TREE_NODE node = ptr – sizeof (BLOCK_SIZE);
    TREE_NODE next = node + SIZE_OF (node);
    TREE_NODE tmp = SPAY_TREE (POINTER_TREE, next);
 
    // Try to merge with the successor and predecessor
    //
    If (tmp != 0)
    Begin
        IS_PREDECESSOR (tmp.left, node)
        MERGE_PREDECESSOR (node, tmp.left);
        MERGE_SUCCESSOR (node, next);
    End

    INSERT_NODE (POINTER_TREE, node);
    INSERT_NODE (LENGTH_TREE, node);
End

Shared Memory

Shared memory is shared between multiple processes, and one of the issues with a shared memory segment (in any OS, Windows, Linux, OSX etc.) is that the memory segment may not be mapped into the same address location in all the processes. This behavior mandates that all the data structures stored in the shared memory should use the pointer offsets rather than the actual pointers. That means the LENGTH_TREE and POINTER_TREE must not store any pointers in their nodes. The LEFT and RIGHT attributes of the tree nodes are the memory offsets of the left and right tree nodes.

The algorithm supports the 'RESERVED but not COMMITTED' feature of Windows virtual memory. This feature is very useful to fine tune the memory usage of the application.

Test Application

There are two test applications: one written in MFC and the other is a plain console based application. The test application has two modes. In the first mode, it measures the performance of the algorithm with respect to malloc. In the second mode, the test application launches three more instances of it and allocates and frees memory from the shared heap.

Performance Testing Mode

Performance testing can be done by invoking the program with the following command line parameters:

  • /t: The program measures the performance numbers by creating a local heap. The local heap does not use any lock.
  • /tc: The program measures the performance numbers by creating a local heap and uses a CRITICAL_SECTION object (or pthread_mutex on Linux) as the heap locking mechanism.
  • /tm: The program creates a shared memory heap and uses a MUTEX object (or file locker on Linux) as the locking mechanism.

Inter-process Allocation and De-Allocation Mode

In this mode, the test program creates four stacks in the shared memory. Later, it will launch three more instances of itself. Each instance will create objects in the shared memory and push those objects into one of the shared stack. Each instance will pop objects out of the other shared stacks, and will display the data on the screen and delete the objects. This mode demonstrates how a process can free memory allocated by other processes.

Performance

Performance is measured using the local heap, and the memory is allocated and freed 50,000 times in random fashion. The same logic is used to measure the performance of malloc and free as well. Performance measurement is done in the Windows and Linux Operating Systems. The malloc C-API is directly mapped to the HeapAlloc API in Windows. In Linux, malloc and free are well optimized code. Therefore, the debug version of the test program will not perform as well as malloc.

The algorithm performs extremely well when compared to malloc in a prolonged usage scenario. It performs much better than malloc if you allocate and free large amounts of memory frequently. On the other hand, if the application allocates and frees only a small amount of memory (say, less than 4MB) and less frequently, then malloc performs better.

Source Code

The source code tree is arranged as follows:

  • /common folder: Contains the algorithm implementation and support classes
  • /heap folder: Contains the test program
  • Heap.vcproj is the VS2005 project file for building the GUI test application
  • Heap_c.vcproj is the VS2005 project file for building the console test application
  • The GUI binary is located in the /heap/Release directory
  • The console binary is located in the /heap/Release_c directory

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Chulliyan
Web Developer
United States United States
I have been writing code for a living for last three hundred years! I have written code in almost all the languages- C/C++, JAVA, C#, VB, Pascal, Delphi, JScript and so on.

Comments and Discussions

 
GeneralI do need a lib such as this, thanks Pinmemberoneye19-Aug-09 16:26 
GeneralFree()'s memory accounting flaw Pinmembermicbarton8-Jan-07 3:32 
GeneralRe: Free()'s memory accounting flaw Pinmemberssl1216-Sep-12 12:02 
GeneralGreat article! PinmemberQuynh Nguyen27-Jun-06 8:10 
GeneralRe: Great article! Pinmemberdude12327-Jun-06 8:40 
GeneralRe: Great article! PinmemberQuynh Nguyen27-Jun-06 8:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140827.1 | Last Updated 20 Jun 2006
Article Copyright 2006 by Chulliyan
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid