13,002,355 members (77,948 online)
alternative version

#### Stats

107.1K views
124 bookmarked
Posted 23 Jul 2010

# B-Tree Sorted Dictionary

, 27 Aug 2012
 Rate this:
In-memory B-Tree sorted dictionary

## Introduction

B-Tree is a balanced m-way tree. This discussion from Wiki is a good material to introduce one to the characteristics and node layout of a B-Tree data structure: http://en.wikipedia.org/wiki/B-tree.

This article discusses an in-memory B-Tree implementation. Although B-Tree is typically used in file/database system indexing (see Open Source B-Tree On Disk @: http://sop.codeplex.com), there is a significant value in implementing a B-Tree based collection or dictionary in the .NET framework or in any language/platform for that matter.

Typical in-memory sorted dictionary data structures today are based on the Binary Tree algorithm, which is not to be confused with B-Tree. Each node of a Binary Tree can contain a single item, whereas a B-Tree can contain a user defined number of items per node, and its nodes are kept balanced. This is a very important differentiation. Being that each node has a single item, storing a large number of items in a Binary Tree will generate a tall and narrow node graph with numerous nodes.

In a corresponding B-Tree implementation, the graph will tend to be shorter and wider with a lot fewer nodes. This is because a node in a B-Tree is typically configured to have numerous items; e.g., a B-Tree dictionary with 12 items will require a single node to contain the items, and a Binary Tree will require 12 nodes which can be scattered around in the heap (memory). Increasing our item sampling to thousands, hundreds of thousands, or millions, we're talking about a significant differentiation and significant optimization that a corresponding B-Tree based sorted dictionary can provide. A million single item node of a Binary Tree vs. around eighty three thousand of a B-Tree, for a 12 item node setup, and even far less if the user specifies more items per node than mentioned.

With this characterization, it is easy to imagine that a Binary Tree will tend to use more memory, and will tend to generate more memory fragmentation than a respective dictionary utilizing the B-Tree algorithm. And in production environments, we can't afford to have a system that is prone to memory fragmentation as in time, the application will degrade in performance and can cause out of memory conditions due to lack of contiguous allocable space.

## Implementation

In this article and associated code implementation, I've selected C# and the .NET framework as the language and platform of implementation. Since B-Tree is a well established algorithm, I will focus my discussion on the high level design/implementation issues including the interface contract. The complete source code is available for download.

### IBTreeDictionary: Extending the Generics IDictionary

Generics is a great feature for enforcing coding and compile time data type checks. Thus, it is a great feature to support it in this implementation of B-Tree based Sorted Dictionary (note: non-Generics implementation is also supported). The Sorted Dictionary implements the Generics `IDictionary` interface in order to provide the capability mentioned.

In order to expose B-Tree specific features such as duplicate Key is allowed and thus requiring an explicit Item Iterator to give fine grained control in item traversal, I am extending the `IDictionary` interface to add methods for the mentioned feature, giving birth to the `IBTreeDictionary` interface. Here is how the said interface looks like:

```public interface IBaseCollection<T> : ICloneable
{
bool MoveNext();
bool MovePrevious();
bool MoveFirst();
bool MoveLast();
bool Search(T Value);
}

public interface IBTreeDictionary<TKey, TValue> :
System.Collections.Generic.IDictionary<TKey,
TValue>, IBaseCollection<TKey>
{
Sop.Collections.BTree.SortOrderType SortOrder{ get;set; }
BTreeItem<TKey, TValue> CurrentEntry{ get; }
TKey CurrentKey{ get; }
TValue CurrentValue{ get; set; }
void Remove();
}```

In order to consume items even if they have duplicate keys, the dictionary requires a feature to be able to set an item pointer to a desired position, then traverse a range while consuming the values of the items in the range. This is why the `Movexxx` methods and the `CurrentKey`/`CurrentValue` item pointers are required.

Also, B-Tree provides a feature to allow a user to traverse items either in ascending or descending sort order, thus the `SortOrder` attribute was provided.

## Management and Search Functions

Following are selected B-Tree management and search functions to illustrate some behavior which characterizes and gives the user the idea of this flavor implementation of the very useful B-Tree algorithm:

Adding an item to a B-Tree requires determination of the appropriate node where to add the item. This is required because items of the tree are maintained sorted as they get inserted or removed. Traversing the tree in order to find the node can be done recursively or via a looping construct such as a `while` loop. The latter is preferred than recursion as in this situation, it removes the requirement of stack usage, i.e., when done, there is no implicit stack (pop) operations as the innermost function unwinds/returns to the main caller function.

This block illustrates a `while` loop that is used to traverse the tree to find the target node. It uses `BinarySearch` for each node of the tree to further optimize the search per node. This is possible since each item in the node is sorted and thus, `BinarySearch` allows an optimal searching and minimal comparison of keys.

```TreeNode CurrentNode = this;
int Index = 0;
while (true)
{
Index = 0;
byte NoOfOccupiedSlots = CurrentNode.count;
if (NoOfOccupiedSlots > 1)
{
if (ParentBTree.SlotsComparer != null)
Index = Array.BinarySearch(CurrentNode.Slots, 0,
NoOfOccupiedSlots, Item, ParentBTree.SlotsComparer);
else
Index = Array.BinarySearch(CurrentNode.Slots, 0,
NoOfOccupiedSlots, Item);
if (Index < 0)
Index = ~Index;
}
else if (NoOfOccupiedSlots == 1)
{
if (ParentBTree.Comparer != null)
{
if (ParentBTree.Comparer.Compare(CurrentNode.Slots[0].Key, Item.Key) < 0)
Index = 1;
}
else
{
try
{
if (System.Collections.Generic.Comparer<TKey>.Default.Compare(
CurrentNode.Slots[0].Key, Item.Key) < 0)
Index = 1;
}
catch (Exception)
{
if (CurrentNode.Slots[0].Key.ToString().CompareTo(Item.Key.ToString()) < 0)
Index = 1;
}
}
}
if (CurrentNode.Children != null)
// if not an outermost node let next lower level node do the 'Add'.
CurrentNode = CurrentNode.Children[Index];
else
break;
}```

At this point, the correct node to add the item to is found. Now, invoke the actual `Add` function:

```CurrentNode.Add(ParentBTree, Item, Index);
CurrentNode = null;```

Here is the `Add` function that does the actual item insert to the found node. It maintains the items of the node to be sorted, and manages the breakup of the node when it is full, and other B-Tree maintenance actions required such as a broke up node's parent promotion, etc...:

```void Add(BTreeAlgorithm<TKey, TValue> ParentBTree,
BTreeItem<TKey, TValue> Item, int Index)
{
byte NoOfOccupiedSlots = count;
// Add. check if node is not yet full..
if (NoOfOccupiedSlots < ParentBTree.SlotLength)
{
ShiftSlots(Slots, (byte)Index, (byte)NoOfOccupiedSlots);
Slots[Index] = Item;
count++;
return;
}
else
{
Slots.CopyTo(ParentBTree.TempSlots, 0);
byte SlotsHalf = (byte)(ParentBTree.SlotLength >> 1);
if (Parent != null)
{
bool bIsUnBalanced = false;
int iIsThereVacantSlot = 0;
if (IsThereVacantSlotInLeft(ParentBTree, ref bIsUnBalanced))
iIsThereVacantSlot = 1;
else if (IsThereVacantSlotInRight(ParentBTree, ref bIsUnBalanced))
iIsThereVacantSlot = 2;
if (iIsThereVacantSlot > 0)
{
// copy temp buffer contents to the actual slots.
byte b = (byte)(iIsThereVacantSlot == 1 ? 0 : 1);
CopyArrayElements(ParentBTree.TempSlots, b,
Slots, 0, ParentBTree.SlotLength);
if (iIsThereVacantSlot == 1)
// Vacant in left, "skud over"
// the leftmost node's item to parent and left.
DistributeToLeft(ParentBTree,
ParentBTree.TempSlots[ParentBTree.SlotLength]);
else if (iIsThereVacantSlot == 2)
// Vacant in right, move the rightmost
// node item into the vacant slot in right.
DistributeToRight(ParentBTree, ParentBTree.TempSlots[0]);
return;
}
else if (bIsUnBalanced)
{ // if this branch is unbalanced..
// _BreakNode
// Description :
// -copy the left half of the slots
// -copy the right half of the slots
// -zero out the current slot.
// -copy the middle slot
// -allocate memory for children node *s
// -assign the new children nodes.
TreeNode LeftNode;
TreeNode RightNode;
try
{
RightNode = ParentBTree.GetRecycleNode(this);
LeftNode = ParentBTree.GetRecycleNode(this);
CopyArrayElements(ParentBTree.TempSlots, 0,
LeftNode.Slots, 0, SlotsHalf);
LeftNode.count = SlotsHalf;
CopyArrayElements(ParentBTree.TempSlots,
(ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
RightNode.count = SlotsHalf;
ResetArray(Slots, null);
count = 1;
Slots[0] = ParentBTree.TempSlots[SlotsHalf];
Children = new TreeNode[ParentBTree.SlotLength + 1];
ResetArray(Children, null);
Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
//SUCCESSFUL!
return;
}
catch (Exception)
{
Children = null;
LeftNode = null;
RightNode = null;
throw;
}
}
else
{
// All slots are occupied in this and other siblings' nodes..
TreeNode RightNode;
try
{
// prepare this and the right node sibling
// and promote the temporary parent node(pTempSlot).
RightNode = ParentBTree.GetRecycleNode(Parent);
// zero out the current slot.
ResetArray(Slots, null);
// copy the left half of the slots to left sibling
CopyArrayElements(ParentBTree.TempSlots, 0, Slots, 0, SlotsHalf);
count = SlotsHalf;
// copy the right half of the slots to right sibling
CopyArrayElements(ParentBTree.TempSlots,
(ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
RightNode.count = SlotsHalf;
// copy the middle slot to temp parent slot.
ParentBTree.TempParent = ParentBTree.TempSlots[SlotsHalf];
// assign the new children nodes.
ParentBTree.TempParentChildren[
(int)Sop.Collections.BTree.ChildNodes.LeftChild] = this;
ParentBTree.TempParentChildren[
(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
ParentBTree.PromoteParent = (TreeNode)Parent;
ParentBTree.PromoteIndexOfNode = GetIndexOfNode(ParentBTree);
//TreeNode o = (TreeNode)Parent;
//o.Promote(ParentBTree, GetIndexOfNode(ParentBTree));
//SUCCESSFUL!
return;
}
catch (Exception)
{
RightNode = null;
throw;
}
}
}
else
{
// _BreakNode
// Description :
// -copy the left half of the temp slots
// -copy the right half of the temp slots
// -zero out the current slot.
// -copy the middle of temp slot to 1st elem of current slot
// -allocate memory for children node *s
// -assign the new children nodes.
TreeNode LeftNode;
TreeNode RightNode;
try
{
RightNode = ParentBTree.GetRecycleNode(this);
LeftNode = ParentBTree.GetRecycleNode(this);
CopyArrayElements(ParentBTree.TempSlots, 0,
LeftNode.Slots, 0, SlotsHalf);
LeftNode.count = SlotsHalf;
CopyArrayElements(ParentBTree.TempSlots,
(ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
RightNode.count = SlotsHalf;
ResetArray(Slots, null);
Slots[0] = ParentBTree.TempSlots[SlotsHalf];
count = 1;
Children = new TreeNode[ParentBTree.SlotLength + 1];
Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
return; // successful
}
catch (Exception)
{
LeftNode = null;
RightNode = null;
RightNode = null;
throw;
}
}
}
}```

### Remove Method

There are two versions of `Remove` in a B-Tree; one that will remove an item given a Key, and the other that will remove the currently selected item of the tree. The former will cause the B-Tree to search for the item, given a key, and position the item pointer to the item with the matching key in the tree. The same node determination listed in the Add block is used to find the node and item to delete. After the item pointer is positioned at the right one, the item is actually deleted from the tree. Here is the code that illustrates item deletion.

When the item for deletion is not in the outermost node, the tree moves the next item from the outermost node to overwrite the currently selected item's slot, causing removal of the said item from the tree.

```internal protected bool Remove(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
if (Children != null)
{
byte byIndex = ParentBTree.CurrentItem.NodeItemIndex;
MoveNext(ParentBTree);
Slots[byIndex] =
ParentBTree.CurrentItem.Node.Slots[ParentBTree.CurrentItem.NodeItemIndex];
}
return true;
}```

Then it takes care of managing the outermost node either to set to `null` the slot vacated by the moved item, or to maintain the balance of the tree by pulling an item from either the left or right sibling node.

```internal void FixTheVacatedSlot(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
sbyte c;
c = (sbyte)count;
if (c > 1) // if there are more than 1 items in slot then..
{ //***** We don't fix the children since there are no children at this scenario.
if (ParentBTree.CurrentItem.NodeItemIndex < c - 1)
MoveArrayElements(Slots,
(ushort)(ParentBTree.CurrentItem.NodeItemIndex + 1),
ParentBTree.CurrentItem.NodeItemIndex,
(ushort)(c - 1 - ParentBTree.CurrentItem.NodeItemIndex));
count--;
Slots[count] = null; // nullify the last slot.
}
else
{ // only 1 item in slot
if (Parent != null)
{
byte ucIndex;
// if there is a pullable item from sibling nodes.
if (SearchForPullableItem(ParentBTree, out ucIndex))
{
if (ucIndex < GetIndexOfNode(ParentBTree))
PullFromLeft(ParentBTree); // pull an item from left
else
PullFromRight(ParentBTree); // pull an item from right
}
else
{ // Parent has only 2 children nodes..
if (Parent.Children[0] == this)
{ // this is left node
TreeNode RightSibling = GetRightSibling(ParentBTree);
Parent.Slots[1] = RightSibling.Slots[0];
Parent.count = 2;
RightSibling = null;
}
else
{ // this is right node
Parent.Slots[1] = Parent.Slots[0];
TreeNode LeftSibling = GetLeftSibling(ParentBTree);
Parent.Slots[0] = LeftSibling.Slots[0];
Parent.count = 2;
LeftSibling = null;
}
// nullify Parent's children will cause this
// tree node instance to be garbage collected
// as this is child of parent!
Parent.Children[0] = null;
Parent.Children[1] = null;
Parent.Children = null;
Clear();
}
}
else
{ // only 1 item in root node !
Slots[0] = null; // just nullIFY the slot.
count = 0;
// Point the current item pointer to end of tree
}
}
}```

### Search Method

The `Search` method contains code that has the same logic as the first portion of the `Add` method listed above. It traverses the tree from the root down the children nodes, until it finds the target node and item using an iterative approach (`while` loop). But instead of adding an item, when it is found, `Search` merely positions the current item pointer to the found node and item in the node.

Having an item pointer combined with the `Search` and `Movexxx` methods is a powerful feature. E.g., doing a range query is very easy in this combination of methods. Position the item pointer to the (start) item you'd like to query, then use either the `MoveNext` or `MovePrevious` method to sequentially traverse the nearby items, until you'd read all the items in range that you are interested about.

## Using the Code

Download the source code zip file, uncompress it to your desired target folder, and open the B-Tree.sln solution inside VS 2008. Browse through the project `SampleUsage `source code in order to see how to use the `BTreeSortedDictionary`, run it, experiment to use other methods available, and reuse your experience using the Generic `SortedDictionary`. Enjoy!

## Comparison Results

```1 Million Inserts on .Net SortedDictionary: 12 sec 308 MB peak 286 MB end
1 Million Inserts on BTreeDictionary:       9 sec 300 MB peak 277 MB end
1 Million Search on .Net SortedDictionary:  7 sec 309 MB peak 286 MB end
1 Million Search on BTreeDictionary:        7 sec 309 MB peak 277 MB end```

Conclusion: This B-Tree sorted dictionary outperforms the .NET Framework's `SortedDictionary` in both memory utilization and speed of operations. Inserting 1 million items, this BTree outperforms .NET's by 3 seconds, and utilized ~9MB of less memory.

In searching, both are of the same speed, but then again, in memory utilization, BTree dictionary outperforms .NET's `SortedDictionary` by using 9MB of less memory. Multiplying the load 10X and using both in production environment, that is the true test, and I anticipate you will be very satisfied with this BTree dictionary due to architectural reasons I discussed at the top of this article.

Here is the stress program used:

```static void Main(string[] args)
{
SortedDictionary<string, string> l1 =
new SortedDictionary<string, string>();
StressSearch(l1, 1000000);
//** uncomment to run the BTreeDictionary test
//   and comment out the above block to turn off SortedDictionary test
//BTreeDictionary<string, string> l2 =
//            new BTreeDictionary<string, string>();
//StressSearch(l2, 1000000);
return;
}

static void StressAdd(IDictionary<string, string> l, int Max)
{
for (int i = 0; i < Max; i++)
{
l.Add(string.Format("{0} key kds vidsbnvibsnv siuv " +
"siuvsiubvisubviusvifsudjcnjdnc", i),
"value kds vidsbnvibsnv siuv " +
"siuvsiubvisubviusvifsudjcnjdnccscs");
}
}

static void StressSearch(IDictionary<string, string> l, int Max)
{
Console.WriteLine("Search: {0}", DateTime.Now);
for (int i = 0; i < Max; i++)
{
string v = l[string.Format("{0} key kds vidsbnvibsnv siuv " +
"siuvsiubvisubviusvifsudjcnjdnc", i)];
}
Console.WriteLine("Search: {0}", DateTime.Now);
}```

## Points of Interest

I really learnt a lot and enjoyed authoring this B-Tree implementation, which I did the C++ version back in the mid-90s; in fact, I enjoyed it so much, it propelled me to work on the on-disk version that is available now as Open Sourced software in CodePlex (http://sop.codeplex.com). Pls. feel free to join, participate and use the new "on disk" version.

## History

• 7/23/2010 - Initial submission
• 7/24/2010 - Added performance comparisons with .NET Framework's `SortedDictionary`
• 8/14/2012 - Modified "points of interest" section to mention just Open Sourced B-Tree Gold (v4.5) product

## Share

 United States
Microsoft Alumnus (Software Design Engineer)
I'm just fresh off from a software eng'g gig & is looking forward to my next.
Pls. do drop me a line @gerardorecinto@yahoo.com if interested or having any question/feedback on these Open Source projects.

I'm excited to help/volunteer my services.
Have a great day!

## You may also be interested in...

 Pro

 Re: Duplicates Looping and Deleting Gerardo Recinto28-Feb-14 8:42 Gerardo Recinto 28-Feb-14 8:42
 Write to file Shargon_8516-Jul-13 2:58 Shargon_85 16-Jul-13 2:58
 Re: Write to file Gerardo Recinto25-Feb-14 21:21 Gerardo Recinto 25-Feb-14 21:21
 Re: Introduction paragraph correction Gerardo Recinto20-May-13 18:50 Gerardo Recinto 20-May-13 18:50
 Re: Introduction paragraph correction Gerardo Recinto28-Feb-14 11:28 Gerardo Recinto 28-Feb-14 11:28
 Good learning material enjaychang17-Jan-13 2:05 enjaychang 17-Jan-13 2:05
 Re: Good learning material Gerardo Recinto18-Feb-13 12:23 Gerardo Recinto 18-Feb-13 12:23
 Re: Good learning material enjaychang6-Mar-13 3:53 enjaychang 6-Mar-13 3:53
 Memory usage Lcng9-Oct-12 4:52 Lcng 9-Oct-12 4:52
 Re: Memory usage Gerardo Recinto9-Oct-12 20:12 Gerardo Recinto 9-Oct-12 20:12
 B tree inserion of records in C language Member 94559374-Oct-12 3:13 Member 9455937 4-Oct-12 3:13
 Re: B tree inserion of records in C language Gerardo Recinto5-Oct-12 17:28 Gerardo Recinto 5-Oct-12 17:28
 Re: My vote of 5 Gerardo Recinto2-Sep-12 18:23 Gerardo Recinto 2-Sep-12 18:23
 Nice work! Filipe Cantarelli16-Aug-12 8:08 Filipe Cantarelli 16-Aug-12 8:08
 Re: Nice work! Gerardo Recinto16-Aug-12 9:53 Gerardo Recinto 16-Aug-12 9:53
 My vote of 5 Abdul Quader Mamun15-Aug-12 17:55 Abdul Quader Mamun 15-Aug-12 17:55
 Re: My vote of 5 Abdul Quader Mamun15-Aug-12 17:56 Abdul Quader Mamun 15-Aug-12 17:56
 Re: My vote of 5 Gerardo Recinto20-Aug-12 21:23 Gerardo Recinto 20-Aug-12 21:23
 what's the B-Tree like when new word added? murderer_pro2-Nov-11 16:12 murderer_pro 2-Nov-11 16:12
 I don't like your "stress test" [modified] carga16-Jul-11 12:54 carga 16-Jul-11 12:54
 Re: I don't like your "stress test" Gerardo Recinto26-Jul-11 7:07 Gerardo Recinto 26-Jul-11 7:07
 How display the structure of the tree vinycc116-Jun-11 9:43 vinycc1 16-Jun-11 9:43
 Last Visit: 31-Dec-99 18:00     Last Update: 26-Jun-17 18:30 Refresh 123 Next »