Click here to Skip to main content
Click here to Skip to main content

Fast Binary Tree Operations

By , 22 Jan 2005
 

Sample Image - BinaryTree.gif

  1. Introduction
  2. Background
  3. Code Description
  4. Class Usage
  5. Tree Load Balancing
  6. Sample Demo
  7. Source code files
  8. References
  9. Thanks to...

Introduction

In this article, I will show you how to use the Binary Search Tree to store data. And I am using templates for keys and data values to simplify usage of any data types. If you have good experience about the theory behind the binary search trees, you can skip next sections of the background.

Background

[1] Binary search trees are data structures which support many dynamic-set operations including search, insert, delete, minimum, maximum, predecessor, and successor. Basic operations on binary search tree take time proportional to the height of the tree. For a complete binary tree with n nodes, such operations take O(log n) time in the worst-case, as the height of randomly built binary search tree is proportional to log n. In some cases operations take O(n) time in case of sorted input, so the tree will be like a sorted list, so I used the Red-Black tree balancing technique, as described in the Tree Load Balancing section

A binary search tree is organized in a binary tree as shown in figure 1. Such a tree can be represented by a linked data structure in which each node is an object. In addition to the key field, each node contains fields left, right, and parent that point to the nodes corresponding to its left child, its right child, and its parent, respectively. If a child or the parent is missing, the appropriate field field contains a value NIL.

Binary Tree

I will not dig in all Binary Tree functions here, I will just refer to the search function, and you can check reference [1] for more details, and I will talk about some of them in the code description, as many of them is so sophisticated like the Delete function.

Searching

The most common operation performed in binary search tree is searching for a key stored in the tree, which can be done in two ways:

  1. Recursion:

    Node Search(Node, key)
    {
        if Node = NIL or key = Node.key
            return Node
        if key < Node.key
            then return Search(Node.Left, key)
            else return Search(Node.Right, key)
    }
  2. While loop:
    Node Search(key)
    {
        Node = root
        while Node != NIL and key != Node.key 
            do if key < Node.key
                then Node = Node.Left
                else Node = Node.Right        
        return Node
    }

The two ways begin search at the root and trace a path downward in the tree till they find the key or return NIL (not found case). First way code is more simple, but second way optimizes stack usage, which I am preferring in all my cases to deal with huge data without affecting the stack and the performance.

Code Description

All Tree functions encapsulated in a template class CBinaryTree and CBinaryTreeNode.

GetCount returns tree nodes count with repetition (repeated key will assigned to one node).
RemoveAll removes all tree nodes.
Insert inserts new key in the tree.
Search searches for key in the tree.
Min gets minimum node key under the input node.
Max gets maximum node key under the input node.
Successor gets node successor (node which comes next in the over all tree sorting).
Predecessor gets node predecessor (node which comes previous in the over all tree sorting).
Delete deletes node from tree and adjusts its childs' nodes.
Save saves all tree nodes' order in a vector of integers.

In all my code here, I am avoiding recursion functions to avoid stack overflow, as I am dealing with huge data, so you will find all my code using while loops, sometimes code becomes complicated like the RemoveAll function:

RemoveAll function

This function removes all tree nodes, it does that by order to each node to delete its left child then its right child, then delete itself. This can be done with two ways:

  1. While loop:
    // remove all tree nodes
    void RemoveAll()
    {
        TREENODE *node = Root, *pTemp;
        while(node != Nil)
        {
            // check for left child
            if(node->Left != Nil)
                node = node->Left;
            // check for right child
            else    if(node->Right != Nil)
                node = node->Right;
            else    // node has no childs
            {    // save node pointer
                pTemp = node;
                // set node pointer at its parent to NULL
                if(node->Parent != Nil)
                    node->Parent->Childs[node != node->Parent->Left] = Nil;
                // update pointer node to its parent
                node = node->Parent;
                // delete the saved node
                delete pTemp;
            }
        }
        Count = Serial = 0;
        Root = Nil;
        Modified = false;
    }

    As you can see, there is some complication at this section (node has no children).

  2. Recursion way:

    It simply puts delete left and right nodes at each node destructor and call delete for tree root.

    ~CBinaryTreeNode()
    {
        if(Childs[0])
            delete Childs[0];
        if(Childs[1])
            delete Childs[1];
    }
    // ...
    delete Tree.Root;

    All the nodes will be deleted automatically through stack usage, but if you read the comments in the first way code, you will find it easy to understand it, and avoid stack usage.

Min function

The Minimum of a node x can be found by following left child from the node x until a Nil is encountered, as in the following code:

// return minimum key in the tree
TREENODE* Min(TREENODE* node) const
{    
    // iterate in the left branch
    while(node != Nil && node->Left != Nil)
        node = node->Left;
    return node;
}

Successor function

The successor of a node x is the node with the smallest key greater than key[x]. For example, if you add keys (C I D H B F E G A J K) in sequence and will be as shown in the following figure:

Binary Tree

Each arrow represents the direction from each node to its successor. The code in the Successorfunction takes two paths:

  1. If the node has right child then return the left most node in the right sub tree.
  2. Else, it goes up from node until we find a node that is to the left of its parent.
TREENODE* Successor(TREENODE* node) const
{
       // return the left most node in the right sub tree
    if(node->Right != Nil)
        return Min(node->Right);
    // go up from node until we find a node that is the left of its parent
    TREENODE* Parent = node->Parent;
    while(Parent != Nil && node == Parent->Right)
    {
        node = Parent;
        Parent = node->Parent;
    }
    return Parent;
}

You can use the Successor function with the Min function to iterate tree nodes in ascending order.

Delete function

Delete function has three cases, I found it too complicated, and I hope I can describe it, The three cases are:

  1. The node has no child, so we just remove it. As in you can see in figure 2, node K can be deleted by just resetting the pointer from node J.
  2. The node has one child, so we need to splice it as node H in figure 2, we should make node D point to node F as a right child, and set node F parent to point to node D
  3. The node has two children, so we choose its successor to take its position, and splice its successor (join successor parent and child, like the previous case) . For example, if we delete node C at figure 2, we get node C successor (D), so we will splice node D so that node I and H will be connected directly and node C will take node C place as in Figure 3.

    Binary Tree

Code in the Delete is optimized to handle the three cases altogether, so I find it hard to describe it here, but you can try to apply the code in each case and read comments carefully, and you will find it working well.

void Delete(TREENODE* node)
{    
    TREENODE *pSplice = (node->Left == Nil || 
              node->Right == Nil)?node:Successor(node);
    TREENODE *pChild = pSplice->Childs[pSplice->Left == Nil];
    // connect child to spliced node parent
    if(pChild != Nil)
        pChild->Parent = pSplice->Parent;
    // connect spliced node parent to child
    if(pSplice->Parent == Nil)
        Root = pChild;
    else
        pSplice->Parent->Childs[pSplice !=
                   pSplice->Parent->Childs[0]] = pChild;
    // put spliced node in place of node (if required)
    if(pSplice != node)
    {    // copy spliced node
        *node = *pSplice;
        // delete the spliced node
        delete pSplice;
    }
    else
        // delete the node
        delete node;
    Count--;
}

Iterate Tree Nodes

You can call Min(Tree.Root) and then Successor in a loop to retrieve all nodes in order (ascending):

TREENODE* node = Min(Root);
while(node)
{ 
    // use node here then get next one
    node = Successor(node);
}

And in the same way you can call Max(Tree.Root) and then Predecessor in a loop to retrieve all nodes in a reversed order (descending):

TREENODE* node = Max(Root);
while(node)
{ 
    // use node here then get next one
    node = Predecessor(node);
}

Class Usage

CBinaryTree usage is very simple, we just need to decide the KEY and DATA data types, then you can define the class, but you must choose a KEY that supports the compare function as it is used in the Insert and Search functions. For example:

CBiaryTree<string, LPCSTR, int, int> tree;
tree.NoRepeat = true;
for(...)
{
    // ... fill str in some way
    tree.Insert(str);
}

class String of the STL supports the compare function, but in some cases you have to add it yourself, as in case of integers' sort:

class CInt
{
public:
    int m_nKey;
    int compare(int nKey) 
    { return nKey == m_nKey ? 0 : (nKey >= m_nKey ? -1 : 1); }
    void operator=(int nKey) { m_nKey = nKey; }
    bool operator==(int nKey) { return m_nKey == nKey; } 
}; 

CBiaryTree<CInt, int, int, int> tree;
tree.NoRepeat = true;
for(...)
{
    // ... fill n in some way
    tree.Insert(n);
}

Tree Load Balancing

Tree balancing can be achieved using many techniques. In my class here, I am using Red-Black Tree functions for the insertion in the tree. Red-Black Tree simply depends on keep tree height short as possible, as the search and the insertion operations time depend on the tree height. As in the following figure, if the sequence (C, A, B) is added to the tree, the height will be like case 1, but if we have an operation to change to the valid tree in case 3, it will be good as the tree height reduced.

Red Black Tree balancing

So, I have included the functions LeftRotate, RightRotate, and RBInsert to can balance the tree functions.

Sample Demo

The attached source zip file with article contains a sample that parse any folder in a recursive function and parse all its files with the extension specified in the extensions editor (any text files format), and it adds all files tokens to a binary tree. Then you can navigate all tree tokens through the tree control. You can test it in VC6 or 7.1, or just run the attached exe, and don't hesitate to mail me for any help.

Source code files

  1. BinaryTree.h: Binary Tree code.
  2. RBTree.h: Red-Black Tree code.

References

[1] Thomas H.Cormen, Charles E.Leiserson, Ronald L.Rivest, Algorithms 1990

Thanks to...

I awe a lot to my colleagues for helping me in implementing and testing this code. (JAK)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Hatem Mostafa
Other
Egypt Egypt
Member

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionFull Text SearchmemberSunil Kumawat11 Jun '12 - 1:50 
I have enable FILEStream and applying search on binary data field but counld not find out.
Actually I am storing lots of pdf into my database in a VARBINARY(max) FILESTREAM format and applying search onto that column with the help of full text search.
QuestionLicense?memberVu H. Pham25 Oct '10 - 4:20 
Great work, Hatem. I'm wondering if I'm able to use your implementation in a commercial application?
 
Thanks.
AnswerRe: License?memberVu H. Pham25 Oct '10 - 4:21 
Sorry, I have just read this message. Thx alot.
GeneralKey updatememberMKirkove29 Jun '10 - 22:47 
First, thanks a lot for your code. It helps me a lot.
I would like to ask you this question: "Have you planned the update of the nodes' key (in your example, a changing a 'C' to 'O' for example)?"
For updating a key, I delete the corresponding node in the tree and I insert a new one with the new key value. Is there a faster means to do this?
Questionwhere is code?memberefvallejocando20 Apr '10 - 7:11 
the link is corrupt
QuestionHow to defined Root?memberHoang Nam S2 Jul '08 - 18:16 
I'm haved structure a class,with component follow:
-class name.
-fields: name:Dungnguyen,DOB:02/06/2005,....and a lot of other properties .
I want to defined this structure and stored it with headerTable similar Root in BTree.Everyone can help to me!Thanks Everyone.
Questionurgent, need help right awaymemberricky sam1 May '08 - 10:49 
hi,
i need help on my lab, can someone please send me the code for the following program and it needs to be done in c. i'm really frustated beacuse of it and i can't figure it out.
 
Definition: Design, develop and implement a software system that will allow a user to create and manipulate outlines. An outline is a set of statements that are hierarchically related; i.e. are related in a parent/child sense. Develop an Abstract Data Type that will allow the definition and manipulation of an outline; at a minimum the following outline operations should be provided by the ADT.
 

An unlimited size outline
An unlimited number of outline elements at any level
The ability to add and delete an outline element at any point in the outline
The ability to move any subtree of elements to any be a child of any node
Siblings of any parent will be ordered in some manner the aim being correct ordering of the outline when printed.
The ability to print the outline contained in any subtree
When printing the outline the elements should be indented and identified with a numbering scheme (like in the example). that could be used for manipulating the outline
The user interface to the program should provide a rudimentty meny system that allows the user to:
 
insert an outline element
delete an outline element (this implies that if the element has children, the user must be asked what to do with the children)
move an outline element (and its children) to somewhere else in the outline
display the outline
save the outline to a file
load a previously created outline from a file
 

 

 
AAAA ex 1 AAAA (main topic)
 
/ | \ 1.1 BBBB (first child of main topic)
 
BBBB EEEE FFFF 1.1.1 CCCC (FC of FC of main topic)
 
| 1.1.2 DDDD (2C of FC of main topic)
 
+-----+ 1.2 EEEE (2C of main topic)
 
| | 1.3 FFFF (3C of main topic)
 

 
CCCC DDDD
 

 

 
ex. make 1.1 child of 1.2 would produce:
 
AAAA 1 AAAA (main topic)
 
/ \ 1.1 EEEE (1C of main topic)
 
EEEE FFFFF 1.1.1 BBBB (1C of 1C of main topic)
 
\ 1.1.1.1 CCCC (1C of 1c of 1C of main topic)
 
BBBB 1.1.1.2 DDDD (2C of 1c of 1C of main topic)
 
/ \ 1.2 FFFF (2C of main topic)
 
CCCC DDDD
 

 

(notice the renumbering caused by the move)
 
(also notice that the subtree whose head is BBBB has been moved intact to be a child of EEEE)
GeneralInteresting article..memberFresric Bouemont24 Feb '08 - 22:52 
5 worhty article, but I have a question, some of your routines use recursion, how does that affect the speed/running time of the routines in general?
Questionred black tree code in c++memberjalalp29 May '07 - 9:54 
I need the insert code for red black tree in c++
thanks for ur cooperation
 
Lalo
GeneralThanksmemberMr_Noob18 Apr '07 - 12:59 
< thanks this is nice >
GeneralDelete function for RB treememberhansvz3 Jan '07 - 12:24 
Dear Hatem,
 
The Delete function for the regular binary search tree checks explicitly for the case that the node that you are deleting has no children.
 
void Delete(TREENODE* node)
{
...
// connect child to spliced node parent
if(pChild != Nil)
pChild->Parent = pSplice->Parent;
 
For the RB tree you overload the Delete function (because you need to take care of RB violations). Why is there no check on existence of children? It seems to me that the following code (from RBTree.h)
 
void Delete(TREENODE* node)
{
...
// connect child to spliced node parent
pChild->Parent = pSpliced->Parent;
...
 
should crash if pChild is a null pointer?
 
(I must confess that I did not try to compile and run...)
 
Thanks for taking the time to answer,
 
Hans van Zwol
GeneralRe: Delete function for RB treememberHatem Mostafa3 Jan '07 - 18:05 
I think it works well as at this case - "node that you are deleting has no children" - the node has two children equal to Nil, and if u check the code u can find that:
Nil = NewNode();
That means Nil is not equal to NULL in RBTree so Nil is a normal leaf node. So, the code is right. U can test the demo version and try to delete leaf node.
 
Many thanks
GeneralRe: Delete function for RB treememberhansvz4 Jan '07 - 8:24 
Sorry, I missed the diffence between the regular and the RB tree in that respect. Thank you for your time.
 
Hans
QuestionDelete function question.membermattdawg2 Dec '06 - 8:31 
If node H has 2 children how do you handle it? Maybe I am just not seeing it.
AnswerRe: Delete function question.memberHatem Mostafa2 Dec '06 - 17:56 
read the section:
http://www.codeproject.com/cpp/BinaryTree.asp?msg=1786850#Delete[^]
Point 3
 
Thanks
Hatem
QuestionPublic domain?memberjswoofer23 Oct '06 - 3:54 
Is this code public domain? What type of license does it have if it is not?

AnswerRe: Public domain?memberHatem Mostafa24 Oct '06 - 8:57 
The code of this article is public and can be used in all fields including commercial use without any license, just give me a credit in your code and the about of your final product.
 
Thanks
Hatem

GeneralI need a code for Data structure project in c++memberursimran13 Dec '05 - 23:42 
I need code like
Banking simulation
Tree algorithm's
GeneralAnother Question...memberHumanOsc7 Jun '05 - 4:23 
Hello...
 
Another Question...
 
Are you interrested to help me with a smart database project...
 
I believe your skills in programming and algorithm are very useful to optimize it...
 
Best regards... Smile | :)

GeneralQuestion...memberHumanOsc1 Jun '05 - 2:40 
Hi...
 
I'm absolutly fascinated about your well written articels... So i have a little Question about Tree Load Balancing in front of the the ATL Map...
 
Which Tree Load Balancing use the ATL Map ???
 
Can you explain in which cases it's better to use them ???
 
Or is your solution generally the better choice ???
 
Best regards...
GeneralRe: Question...memberHatemMostafa1 Jun '05 - 3:07 
Many thanks,
 
I think u mean SLT map, if that,
I think it is too slow after inserting huge number of keys, after 10,000 keys, its performance goes down, I don't know its internal work, I think it works with hashing, which may face some troubles with collesions. Any way I am not sure as I have tested it with LZW compression and its performance was so so bad. And I am using mine with a search engine in the indexing process and it works well and fast with huge number of terms (more than 4 millions term).
 
Thanks
Hatem
GeneralRe: Question...memberHumanOsc1 Jun '05 - 3:45 
Hi again...
 
At first... Thanks for your very fast reply... Big Grin | :-D
 
I must excuse me for a little mistake in my question Sniff | :^) ... I correctly mean the CAtlMap Template Class which is declared in the atlcoll.h...
 
But also Thanks for your reply and comments over the STL Map Smile | :) ...
 
CAtlMap haves a better performance as the STL version, but i'm not sure in front of your solution D'Oh! | :doh: ...
 
Best regards...
 


 

GeneralRe: Question...memberHatemMostafa1 Jun '05 - 5:27 
Hi,
 
I am sorry, I just heard about the ATL Map, but I got its url:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/vcoriATLCollectionClasses.asp[^]
 
It is do nice library, sure I'll read it and get more ideas, many thanks for directing me to this library, and so sorry for misunderstanding u.
 
I have read a litle lines in the url, it is so good.
 
Thanks
GeneralRe: Question...memberHumanOsc1 Jun '05 - 8:28 
Hello...
 
ATL is the first choice for simple and fast code...
STL is very complex and safe and you can handle every (im)possible szenario with
exceptions...
 
I believe that's the Reason why the most programmers firstly programm with the STL... But in my opinion the ATL is something easier to use...
 
But also big Thanx... For your articel it's very helpful to understand this algorithms...
 

Best Regards...
GeneralRe: Question...memberHumanOsc1 Jun '05 - 9:41 
Hello...
 
Another Question...
 
Are you interrested to help me with a smart database project...
 
I believe your skills in programming and algorithm are very useful to optimize
 
this existing project...
 

Best regards...
 

 


General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 23 Jan 2005
Article Copyright 2004 by Hatem Mostafa
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid