## Introduction

In this article I have given an introduction of binary trees and hierarchical data structures. In the sample project I have compared binary trees with qsort. The binary tree is defined with a C++ template. It can be used from any environment supporting C++, and for any data type supporting data comparison operators, < and >. The description is easy to follow. To use the template you need to include BTree.h in your C++ project. For balancing and optimization of data insertion I have used a simple reordering method instead of Red-Black and AVL trees. The advantage is that data insertion and tree-creation require slightly less time. Even if not perfectly balanced, this method ensures that the data search requires **log2** comparing operations.

My major interests are multi-dimensional data structures so I mostly utilize algorithms which can be easily generalized for N-dimensions.

## Background

Binary trees are hierarchical data structures which allow insertion and a fast, nearest-neighbours search in one-dimensional data. It can be used instead of qsort and binary search to quickly find the closest points in a data array. The binary tree has the advantage of having a simple structure that allows generalization for more then one dimension - the so-called **KD Tree**. Therefore, it is good to understand how it works and how it performs data searches. I will keep the explanation clear and easy to follow.

First let's define a **tree**. A tree is a data-structure composed from a set of nodes. Each node has links to a number of other nodes called **children** and **parents**. The difference between them is that children are chosen by some **criterion** while the parent links merely keep the history of how we have reached a given node.

The binary tree has two childen - **Left** and **Right** and one parent. The parent link is necessary for nearest-neighbours searching and for optimization of the tree.

Each node of the tree contains some type of data with value X. The left child node must have a smaller X value than the corresponding right value - **X > Left->X** and **Right->X > X**. This is the basic criterion of the binary tree. The tree also has a **Root**. The root is the only node which does not have a Parent. When a new node with value X needs to be added to the tree, we always start from Root and go down in the tree until we reach an empty location. Each time when we pass through a node we decide to go left or right by comparing the value X with the values of the node - if X is smaller, we go left, if it is bigger we go right.

In the above example the new value is 100. To find the location of this value, follow the steps within the tree. Starting from the root we check if 100 is smaller or bigger then 127. In this case it is smaller so we go left in the node with value 111. Now, 100 is smaller than 111 so we go left to node 60. This time 100 is bigger, but 60 do not have a Right child, it is an empty location, so we attach 100 here.

Now let's see how the binary tree can be used for something useful. Let's see how the limiting range and the nearest neighbour can be found. Let us choose a new value, 125, for example. Between the two existing tree values, we need to determine which is 125. Again we start searching the Parent node of 125 and after few steps we reach node 115:

It is clear now that since 125 is the Right child of 115, 125 is the closest from the leftmost neighbour. But the search for the right value is not so simple. In this case the Root of the tree is 127. The rule for binary trees is that the limiting value is the first Parent node with a value bigger than 125. So we have to go up in the tree until we find a Parent with a value bigger than 125. It is possible in some cases that the node has only one limiting neighbour. The code for binary tree declaration, data insertion and nearest neighbour search is given below.

## Code

The declaration and implementation of binary tree is in BTree.h. The class BTNode contains insert and remove functions. The template type is X, with two pointers to the left and right children, and one to the parent. I have made only two small extensions over the standard binary tree, and added one member property - `ParentOrientation`

defined as bool, and one member function - `moveup()`

used to optimize insertion.

template <class>
class BTNode
{
public:
BTNode();
BTNode(Xtype x);
bool Delete_Tree(BTNode<xtype>* Root);
BTNode(BTNode<xtype>* k1, BTNode<xtype>* k2);
BTNode<xtype> *Left, *Right, *Parent;
BTNode<xtype>* place(BTNode<xtype>* T);
BTNode<xtype>* insert(Xtype x,bool&root_changed,bool optimize=true);
bool insert(BTNode<xtype>* node,bool&root_changed,bool optimize=true);
void remove();
int count_childs();
bool moveup();
Xtype x;
bool ParentOrientation ; };

The `ParentOrientation`

member is used for optimization of insertion and nearest-neighbour search. It identifies what the orientation of a child is with respect to its parent; if it is 0 the child is from the left of the parent, if it is 1 it is on the right. Of course we can determine this just by comparing the values of the child and the parent but since the comparing operators < and > can be slow, I have used this flag to decrease their use. The member function `moveup()`

tries to move the node up in the tree. It improves the balance of the tree and optimizes the insert operation. The Insert then looks like this:

template <class>
bool
BTNode<xtype>::insert(BTNode<xtype>* node,bool&root_changed, bool optimize)
{
BTNode<xtype>* pNext = this;
BTNode<xtype>* pFather;
while(pNext) {
pFather = pNext;
if(node->x < pNext->x)
pNext = pNext->Left;
else if(node->x > pNext->x)
pNext = pNext->Right;
else
return false;
}
if(!pNext) {
node->Parent = pFather;
if(node->x < pFather->x)
{
pFather->Left = node;
node->ParentOrientation = 0 ;
}
else
{
pFather->Right = node;
node->ParentOrientation = 1;
}
if(optimize)
root_changed = node->moveup();
else
root_changed = false ;
return true;
}
return false;
}

Nearest-neighbour search is explained in the previous section. Below is the code:

template <class>
Xtype FindNearest(BTNode<xtype>* Root, Xtype x, Xtype* l, Xtype* r)
{
Xtype left,right;
BTNode<xtype>* pNext = Root;
BTNode<xtype>* pFather;
bool ParentOrientation ;
while(pNext)
{
pFather = pNext;
if(x<pnext->x)
{
pNext = pNext->Left;
ParentOrientation = false;
}
else if(x>pNext->x)
{
pNext = pNext->Right;
ParentOrientation = true;
}
else
{
*l = pNext->x;
*r = pNext->x;
return pNext->x;
}
}
if(!ParentOrientation) {
right = pFather->x;
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && !ParentOrientation)
{
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}
if(!pFather)
{
*l = INVALID_VALUE; *r = right;
return right;
}
else
{
*l = pFather->x;
*r = right;
return (x-pFather->x < right-x ? pFather->x:right);
}
}
else {
left = pFather->x;
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
while(pFather && ParentOrientation)
{
ParentOrientation = pFather->ParentOrientation ;
pFather = pFather->Parent;
}
if(!pFather)
{
*l = left;
*r = INVALID_VALUE; return left;
}
else
{
*l = left;
*r = pFather->x;
return (x-left < pFather->x-x ? left:pFather->x);;
}
}
return INVALID_VALUE; }

As was mentioned above, optimization of the tree is realized with `moveup()`

routine. Why is optimization needed? Well, if the values contained in the tree come in random (not ordered) sequence the tree will be naturally balanced. However there is no guarantee that values are not going to come in some non-random pattern. For example, if it is something like 2,5,8,11..., then the binary tree will become a linked list and the search of nearest neighbours will become sequential search, which is ineffective. There are a lot of balancing methods - Red-Black,AVL... I prefer my own local tree optimization which removes the worst cases. The illustration below shows how it works:

Here **N** is the node to be inserted, **P** is it's parent, and **G** is it's grand-parent. If it happens like in both cases shown above on the left, or it mirror cases, it is possible to move **P** or **N** up, and **G** down in the tree. The three nodes - **N**, **P** and **G** are reordered as shown on the right. In this way the height of the tree is decreased, the worst cases are corrected and the balance of the tree is improved. From the test I performed, I discovered that the insert operation and tree creation are less than 10% slower. If you increase the average height of randomly choosen data, you find that performance is still 10% slower, but this percentage rises with data size and ensures that the average height is always log_{2}N, where N is the data size. This means that the average path which we will pass while looking for a value in a large array (100 000)will be about 20, because 2 of power 20 makes 100 000. The project source contains the application performing these tests. If you want to check the average tree height for different points, change the numbers for `POINTS_NUM`

and change the `generate_random_point()`

routine if it is too big.