## Introduction

In this article, I have given an introduction of binary trees and hierarchical data structures. In the sample project, I have compared binary trees with qsort. The binary tree is defined with a C++ template. It can be used from any environment supporting C++, and for any data type supporting data comparison operators, `<`

and `>`

. The description is easy to follow. To use the template, you need to include *BTree.h* in your C++ project. For balancing and optimization of data insertion, I have used a simple reordering method instead of Red-Black and AVL trees. The advantage is that data insertion and tree-creation require slightly less time. Even if not perfectly balanced, this method ensures that the data search requires `<code>log2`

comparing operations.

My major interests are multi-dimensional data structures so I mostly utilize algorithms which can be easily generalized for N-dimensions.

## Background

Binary trees are hierarchical data structures which allow insertion and a fast, nearest-neighbours search in one-dimensional data. It can be used instead of qsort and binary search to quickly find the closest points in a data array. The binary tree has the advantage of having a simple structure that allows generalization for more than one dimension - the so-called `KD Tree`

. Therefore, it is good to understand how it works and how it performs data searches. I will keep the explanation clear and easy to follow.

First let's define a `tree`

. A tree is a data-structure composed from a set of nodes. Each node has links to a number of other nodes called `children`

and `parents`

. The difference between them is that children are chosen by some `criterion`

while the parent links merely keep the history of how we have reached a given node.

The binary tree has two children - `<code>Left`

and `<code>Right`

and one `parent`

. The parent link is necessary for nearest-neighbours searching and for optimization of the tree.

Each node of the tree contains some type of data with value `X`

. The left child node must have a smaller `X`

value than the corresponding right value - `<code>X > Left->X`

and `<code>Right->X > X`

. This is the basic criterion of the binary tree. The tree also has a `<code>Root`

. The root is the only node which does not have a `Parent`

. When a new node with value `X`

needs to be added to the tree, we always start from `Root`

and go down in the tree until we reach an empty location. Each time when we pass through a node, we decide to go left or right by comparing the value `X`

with the values of the node - if `X`

is smaller, we go left, if it is bigger, we go right.

In the above example, the new value is `100`

. To find the location of this value, follow the steps within the tree. Starting from the root we check if `100`

is smaller or bigger then `127`

. In this case it is smaller so we go left in the node with value `111`

. Now, `100`

is smaller than `111`

so we go left to node `60`

. This time `100`

is bigger, but `60`

does not have a `Right`

child, it is an empty location, so we attach `100`

here.

Now let's see how the binary tree can be used for something useful. Let's see how the limiting range and the nearest neighbour can be found. Let us choose a new value, `125`

, for example. Between the two existing tree values, we need to determine which is `125`

. Again, we start searching the `Parent`

node of `125`

and after few steps, we reach node `115`

:

It is clear now that since `125`

is the `Right`

child of `115`

, `125`

is the closest from the leftmost neighbour. But the search for the right value is not so simple. In this case, the `Root`

of the tree is `127`

. The rule for binary trees is that the limiting value is the first `Parent`

node with a value bigger than `125`

. So we have to go up in the tree until we find a `Parent`

with a value bigger than `125`

. It is possible in some cases that the node has only one limiting neighbour. The code for binary tree declaration, data insertion and nearest neighbour search is given below.

## Code

The declaration and implementation of binary tree is in *BTree.h*. The class `BTNode`

contains `insert`

and `remove`

functions. The template type is `X`

, with two pointers to the `left`

and `right`

children, and one to the `parent`

. I have made only two small extensions over the standard binary tree, and added one member property - `ParentOrientation`

defined as `bool`

, and one member function - `moveup()`

used to optimize insertion.

template <class>
class BTNode
{
public:
BTNode();
BTNode(Xtype x);
bool Delete_Tree(BTNode<xtype>* Root);
BTNode(BTNode<xtype>* k1, BTNode<xtype>* k2);
BTNode<xtype> *Left, *Right, *Parent;
BTNode<xtype>* place(BTNode<xtype>* T);
BTNode<xtype>* insert(Xtype x,bool&root_changed,bool optimize=true);
bool insert(BTNode<xtype>* node,bool&root_changed,bool optimize=true);
void remove();
int count_childs();
bool moveup();
Xtype x;
bool ParentOrientation ;
};

The `ParentOrientation`

member is used for optimization of insertion and nearest-neighbour search. It identifies what the orientation of a child is with respect to its `parent`

; if it is `0`

, the child is from the left of the parent, if it is `1`

it is on the right. Of course, we can determine this just by comparing the values of the child and the parent, but since the comparing operators `<`

and `>`

can be slow, I have used this flag to decrease their use. The member function `moveup()`

tries to move the node up in the tree. It improves the balance of the tree and optimizes the `insert`

operation. The `Insert`

then looks like this:

template <class>
bool
BTNode<xtype>::insert(BTNode<xtype>* node,bool&root_changed, bool optimize)
{
BTNode<xtype>* pNext = this;
BTNode<xtype>* pFather;
while(pNext)
{
pFather = pNext;
if(node->x < pNext->x)
pNext = pNext->Left;
else if(node->x > pNext->x)
pNext = pNext->Right;
else
return false;
}
if(!pNext)
{
node->Parent = pFather;
if(node->x < pFather->x)
{
pFather->Left = node;
node->ParentOrientation = 0 ;
}
else
{
pFather->Right = node;
node->ParentOrientation = 1;
}
if(optimize)
root_changed = node->moveup();
else
root_changed = false ;
return true;
}
return false;
}

Nearest-neighbour search is explained in the previous section. Below is the code:

template <class>
Xtype FindNearest(BTNode<xtype>* Root, Xtype x, Xtype* l, Xtype* r)
{
Xtype left,right;
BTNode<xtype>* pNext = Root;
BTNode<xtype>* pFather;
bool ParentOrientation ;
while(pNext)
{
pFather = pNext;
if(x<pnext->x)
{
pNext = pNext->Left;
ParentOrientation = false;
}
else if(x>pNext->x)
{
pNext = pNext->Right;
ParentOrientation = true;
}
else
{
*l = pNext->x;
*r = pNext->x;
return pNext->x;
}
}
if(!ParentOrientation)
{
right = pFather->x;
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && !ParentOrientation)
{
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}
if(!pFather)
{
*l = INVALID_VALUE;
*r = right;
return right;
}
else
{
*l = pFather->x;
*r = right;
return (x-pFather->x < right-x ? pFather->x:right);
}
}
else
{
left = pFather->x;
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && ParentOrientation)
{
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}
if(!pFather)
{
*l = left;
*r = INVALID_VALUE;
return left;
}
else
{
*l = left;
*r = pFather->x;
return (x-left < pFather->x-x ? left:pFather->x);;
}
}
return INVALID_VALUE;
}

As was mentioned above, optimization of the tree is realized with `moveup()`

routine. Why is optimization needed? Well, if the values contained in the tree come in random (not ordered) sequence, the tree will be naturally balanced. However, there is no guarantee that values are not going to come in some non-random pattern. For example, if it is something like `2,5,8,11`

..., then the binary tree will become a linked list and the search of nearest neighbours will become sequential search, which is ineffective. There are a lot of balancing methods - Red-Black,AVL... I prefer my own local tree optimization which removes the worst cases. The illustration below shows how it works:

Here `N`

is the node to be inserted, `P`

is its parent, and `G`

is its grand-parent. If it happens like in both cases shown above on the left, or it mirror cases, it is possible to move `P`

or `N`

up, and `G`

down in the tree. The three nodes - `N`

, `P`

and `G`

are reordered as shown on the right. In this way, the height of the tree is decreased, the worst cases are corrected and the balance of the tree is improved. From the test I performed, I discovered that the `insert`

operation and tree creation are less than 10% slower. If you increase the average height of randomly chosen data, you find that performance is still 10% slower, but this percentage rises with data size and ensures that the average height is always log_{2}N, where N is the data size. This means that the average path which we will pass while looking for a value in a large array (100 000)will be about 20, because 2 of power 20 makes 100 000. The project source contains the application performing these tests. If you want to check the average tree height for different points, change the numbers for `POINTS_NUM`

and change the `generate_random_point()`

routine if it is too big.

## License

This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.

A list of licenses authors might use can be found here.