Introduction
In Computer Science, a binary tree is a hierarchical structure of nodes, each node referencing at most to two child nodes. Every binary tree has a root from which the first two child nodes originate. If a node has no children, then such nodes are usually termed leaves, and mark the extent of the tree structure.
A particular kind of binary tree, called the binary search tree, is very useful for storing data for rapid access, storage, and deletion. Data in a binary search tree are stored in tree nodes, and must have associated with them an ordinal value or key; these keys are used to structure the tree such that the value of a left child node is less than that of the parent node, and the value of a right child node is greater than that of the parent node. Sometimes, the key and datum are one and the same. Typical key values include simple integers or strings, the actual data for the key will depend on the application. In this article, I describe a binary search tree that stores string/double pairs. That is, the key is the string value and the data associated with the key is a double value. Developers can search the tree using string values.

Background
There are a number of basic operations one can apply to a binary search tree, the most obvious include, insertion, searching, and deletion.
To insert a new node into a tree, the following method can be used. We first start at the root of the tree, and compare the ordinal value of the root to the ordinal value of the node to be inserted. If the ordinal values are identical, then we have a duplicate and we return to the caller indicating so. If the ordinal value is less than the root, then we follow the left branch of the root, else we follow the right branch. We now start the comparison again but at the branch we took, comparing the ordinal value of the child with the node to be inserted. Traversal of the tree continues in this manner until we reach a left or right node which is empty and we can go no further. At this point, we insert the new node into this empty location. Note that new nodes are always inserted as leaves into the tree, and strictly speaking, nodes are thus appended rather than inserted.
Searching a binary search tree is almost identical to inserting a new node except that we stop the traversal when we find the node we're looking for (during an insertion, this would indicate a duplicate node in the tree). If the node is not located, then we report this to the caller.
Both insertion and searching are naturally recursive and are, arguably, easier to understand when considered in terms of their unit operation. A basic recursive search algorithm will look like:
node search (node, key) {
if node is null then return null;
if node.key = key then
return node
if key < node then
return search (node.left, key);
else
return search (node.right, key);
In the source code provided with this article, insertion is implemented recursively, while searching uses an iterative approach.
Deletion is a little bit more complicated but boils down to three rules. The three rules refer to deleting nodes without any children, nodes with one child, and nodes with two children. If a node has no children, then the node is simply deleted. If the node has one child, then the node is deleted and the child node is brought forward to link to the parent. The complication occurs when a node has two children. However, even here, the rules are straightforward when stated. To delete a node with two children, the next ordinal node (called the successive node) on the right branch is used to replaced the deleted node. The successive node is then deleted. The successive node will always be the left most node on the right branch (likewise, the predecessor node will be the right most node on the left branch). The figure below illustrates the deletion rules.

A common alternative to using binary search tree is to use Hash tables. Hash tables have better search and insertion performance metrics. In theory, the time it takes to insert or search for an item in a Hash table is independent of the number of data items stored. In contrast, a binary search tree scales with log (N) where N is the number of data items (still far better than a linear search). The .NET libraries contain explicit support for Hash tables.
Balanced Trees
The time taken to insert or search for a specific item in a tree will be affected by a tree's depth. Deep trees take longer to search, and the insertion order into a tree can affect a tree's shape. A random insertion order will generally produce a more bushy and hence shallower tree compared to an ordered insert. Bushy trees are often called balanced trees, and although not implemented here, balancing a tree is a highly desirable feature for a binary search tree implementation. Certain algorithms such as the red-black tree will auto-balance as the tree is constructed (see Red/Black tree animation). The figure below shows three trees generated by three identical data sets but inserted in a different order. The first is the most balanced and hence the most shallow of the three examples.
Implementing the search and insertion methods using a recursive approach has the potential to yield poor performance, particularly when the trees are unbalanced.
Using the Code
Using the source code provided with this article is very easy. The following code illustrates the instantiation of a new binary tree, the insertion of data into the tree, and subsequent retrieval. The method insert() is used to insert new data, and the method findSymbol() is used to locate and retrieve data. If findSymbol() fails to locate the data item, it returns null. When successful, findSymbol() returns a TTreeNode object which has two properties, name and value. The following code illustrates how to use the binary search tree. The class name of the binary tree is TBinarySTree, and the individual nodes have class type TTreeNode.
bt = new TBinarySTree();
bt.insert ("Canis Minoris", 5.37);
bt.insert ("Gamma Cancri", 4.66);
bt.insert ("Phi Centauri", 3.83);
bt.insert ("Kappa Tauri", 4.21);
TTreeNode symbol = bt.findSymbol ("Phi Centauri");
if (symbol != null)
Console.WriteLine ("Star {1} has magnitude = {0}", symbol.name, symbol.value);
Other methods of interest include:
bt.clear();
bt.count();
count returns the number of nodes in the tree.
bt.delete (key);
delete will delete the node with the given key. If the method fails to locate the node, the method throws a simple exception.
The source code is licensed under the BSD license. The source should compile on C# 2.0.
To use the source code, unpack the source, load the binary tree solution (binaryTree.sln) and compile the BinaryTree project. In your own project, include as a reference to BinaryTree.dll. This version was written using Visual Studio 2003.
Points of Interest
The following graphs compare the performance of the Binary Tree search with .NET's built-in Hashtable class. The implementations follow the expected scaling laws as the number of stored data items increase. The X axis indicates the number of data items stored, ranging from 1000 to 1,000,000 items (21 intervals in all). The Y axis indicates the average time required to retrieve one data item (averaged over 20,000 retrieval attempts). The Hashtable follows roughly O(1), that is the time taken to retrieve data is independent of the number of data items stored. In contrast, the binary search tree scales roughly O(log (N)). However, this is far better than a linear search which would scale as O(N); that is, doubling the number of stored items doubles the average time taken to retrieve a single data item. The graph on the left shows the data plotted on log axes.
Times were computed using the QueryPerformanceCounter() method. The code for timing was derived from Tobi+C#=T#.
Other Possibilities
One should consider the implementation outlined here as the minimum practical implementation. The project where this implementation originated did not require any further sophistication. However, there are a number of areas where it could be significantly improved. In particular, two areas warrant further work:
- The current implementation is specific to storing name/value pairs. Ideally, one would prefer a more generic implementation where a developer could employ their own object type.
- The implementation may suffer performance degradation when subjected to large data sets if the trees become significantly unbalanced. Ideally one could implement the Red/Black variant to avoid this issue (see reference 1. at the end of the article for details).
Other minor changes include using a property in place of the public method count, adding further utility methods, and changing the naming convention on the classes and methods to make them more consistent with .NET.
History
- Fixed a small error in the third tree, Figure 3 (missing C node).
- There is an older article on CodeProject which discusses Red-Black trees in C#, something I should have spotted earlier (Red-Black Trees in C#).
References
There appears to be very little material on Binary Search Trees using .NET 1.1; the following, particularly the first link, provide material related to .NET 2.0.
- Examination of Binary Search Trees using C# 2.0.
- C5 is a .NET 2.0 library of generic collection classes for C#.
- A collection of data structures including binary search trees.
|
|
 |
 | Question Regarding Count Function Accuracy NuNn | 12:13 17 Dec '09 |
|
 |
I have a few questions regarding the accuracy of the Count function in a couple of scenarios. The first being when a new node is called with insert. According to your insert code block (NOTE: variable names have changed for my purposes):
public Node InsertNode(string NAME, double VALUE) { Node obj_Node = new Node(NAME, VALUE); try
{ if (obj_RootNode == null) obj_RootNode = obj_Node; else
....
wouldn't you want to also add this node to the array list here so it has a count of one? Also, in your delete function although you set the value of the node to null, I believe the ArrayList still counts null values when you call the count function. Otherwise, nice piece of code!
|
|
|
|
 |
 | SortedDictionary br1annn | 16:12 16 Nov '09 |
|
 |
Herbert, good job. I'm just wonderig if SortedDictionary obviates the need to code your own binary tree class? I know you posted this many years ago.
|
|
|
|
 |
|
 |
You might be right. The reason why I wrote it was because I had some legacy code in Delphi that used such a function and I just translated it over to C#. If I didn't it again perhaps I would do something different.
|
|
|
|
 |
 | results of measurment Member 4162639 | 22:37 2 Sep '08 |
|
 |
Hi,
My question is little funny. I had added implementation of red-black tree in this application, for comparing results. I got results for searching, e.g. number 10 graded -5. I am little confused with this number, it is hard to get string from tree whose structure has 5 000 000 string in this short time. Can you help me to explain this?
|
|
|
|
 |
|
 |
Sorry for not getting back to you sooner. I'm not sure if I know what you mean by -5?
|
|
|
|
 |
 | Where is this algorithm practically used Alberto Bencivenni | 22:02 26 Aug '08 |
|
 |
Nice article thanks.
Can you make me some real world example on where this algorith is practically used? It is very interesting but don't know where to apply it. You could even add them in the beginnign of the article.
Thanks,
|
|
|
|
 |
|
 |
The algorithm is useful anywhere you need to store large amounts of data and need to access that data quickly. For example you might have a million user names plus their details and you want to quickly find a particular person in the list. Rather than search the list one item at a time you could use a binary tree to very rapidly locate the name.
|
|
|
|
 |
|
 |
Hi Herbert,
Thanks for your answer but I don't see the binary nature of the problem even in this example. Don't misunderstand me, I really can't it is probably a limit of mines. I don't see ho to split a million username in parent and two childs an so on.
Maybe there are some articles on the net that can give me an introduction, ifyou know some, please point me there.
Thanks,
|
|
|
|
 |
|
 |
In order to store data in a binary tree you must first convert the data into some kind of numerical form. For example if you have a million unique user names, you can convert each name in to a number, this could be done by converting the letters in the names to the ascii numbers. Once in numeric form you can then decide whether a given name goes to the left or right side of a tree node depending on its value.
To make the problem simpler, let's say you only have five numbers, 8, 3, 10, 1, 6 and you want to store these in a binary tree. To start with the binary tree is empty. The first number is eight, and with this we create a node that represents the first node in the tree. The next number is a 3, this number is less than the first node, therefore we create a child node to the left of the parent node. The next number is a 10, this number is greater than the first node, therefore we create a child node to the right of the parent node. The next number is 1, since this is less than the first node, we go to the left, but the left node already has a child so we compare the number, 1 to the left child node. Since 1 is less than 3, we create a left node on the child to store the 1 and so on. The best way to learn this is to write out the process on paper and to build a picture of the tree yourself.
Check the section marked Background, this gives a discussion on how to insert information into the tree. Getting information out of the tree is the same but the reverse process.
For reference you could also consult Wikipedia at: http://en.wikipedia.org/wiki/Binary_search_tree[^]
For example this is the text from wikipedia that describes insertion:
"Insertion begins as a search would begin; if the root is not equal to the value, we search the left or right subtrees as before. Eventually, we will reach an external node and add the value as its right or left child, depending on the node's value. In other words, we examine the root and recursively insert the new node to the left subtree if the new value is less than the root, or the right subtree if the new value is greater than or equal to the root."
|
|
|
|
 |
|
 |
Thanks again Herbert,
Now it's clearer. Suppose you have a Point p1 = new Point(3, 4) and a list with 1,000,000 points: you want to know if there is a point in the list equal to p1. Can we use this algorithm to avoid looping the all array elements and therefore saving time?
Thanks,
Alberto
|
|
|
|
 |
 | Excellent newspicy | 7:27 20 Aug '08 |
|
 |
Excellent job Herbert!
I have a problem that no seems to one could help with, I use the Chart Framework, ZedGraph (i also here i code project) and i try to handle a huge amount of data, lets say > 10 000 000 points!! Yes its huge, and now i have serious problem.
I have wondered if i should use some kind of hashtable for it, and store the index in a binary tree, but im not sure. I dont now how it could be used in this case, i have heard about it somewhere. Have you som idea to implement this in the case to to store the data? Im not so good in hashtable, and have short time to think of what strategy i should work on.
Greate job anyway, im gona read this, hope im on right track somehow.
/Betz
|
|
|
|
 |
 | Re: Excellent Herbert Sauro | 8:47 20 Aug '08 |
|
 |
Binary trees can be difficult to work with if you're not used to them. I would recommend using a hashtable. Since C# has hashtable support that would be the easiest thing to do. Just google hashtable and C#.
|
|
|
|
 |
|
 |
It feels, something says that binary search should work good, if you already have an hashtable and then use binary tree for store the index key. Perhaps that would be a good combination?
/Betz
|
|
|
|
 |
 | C# vs Java Guido_d | 3:49 19 Aug '08 |
|
 |
Hi,
Very nice and elaborate article, and especially because of that I hate to be a whiner, but you have 'sinned' by applying Java coding standards to C# ... pascal vs camel, and properties vs accessor methods see http://msdn.microsoft.com/en-us/library/xzf533w0(VS.71).aspx for the full specs.. Other than that, again, extremely well set up article.
-- edit oops I read your footnote about naming conventions. See, I'm just a whiner.
|
|
|
|
 |
|
 |
You're right I have, and it's because I find the camel still easier on my eyes (I'm not a Java programmer). Everyone is free however to change the style to suite their needs. I understand however that consistency is good but I think in this case MS chose Pascal style simply because they couldn't be seen to be even more like Java even though I believe camel style to be easier in terms of readability. I've been a long time Delphi programmer and I've never quite liked capitalizating variables, seems to makes the variables too loud to me, but this is a personal preference.
I notice from the guidelines that it seems to imply that pascal as well as camel is ok to use, or at least they mention both which seems to suggest that both as ok to use?
As for methods vs. properties I am not sure what the issue is. Delphi has had properties in object pascal since day one and as Delphi programmers we tended to use them as gate keepers if special processing was required or to enforce read only access. I think in the code I posted there were no such situations, though I am happy to stand corrected.
Thanks for the comments.
|
|
|
|
 |
 | Left? Right? Member 4101640 | 20:10 13 Aug '08 |
|
 |
Hi, Your 2nd paragraph reads:
If the ordinal value is less than the root, then we follow the right branch of the root, else we follow the left branch.
Seems to be a typo 'cause I think it should be the left branch if the ordinal value is less than the root...right?
Your pseudo code says right:
if key < node then return search (node.left, key); else return search (node.right, key);
-richard
|
|
|
|
 |
|
 |
You're right, thanks for spotting the error, I'll try and upload an updated version.
-H
|
|
|
|
 |
 | Mistake {demon] | 21:28 28 May '07 |
|
 |
The figure 3 under the balanced trees is wrong. The node "C" is not there but present in the sequence
demon
|
|
|
|
 |
|
 |
Thanks for spotting it, I'll fix it.
H
|
|
|
|
 |
 | Good job! Bashir Magomedov | 21:15 28 May '07 |
|
 |
Hi Herbert! You did good job. Yourt article is really good . I remember the time (several yars ago), when I was coding such a binary-trees library on Delphi. There were no any sources except huge book named "Algorithms". And of couse I coded that library with a lot of bugs, but it is still working .
The search on binary sequences via binary trees is a really fastest algorithm I've ever seen. And it can deal with highly distorted input patterns, with level of noise up to 70% (depending on loading coeff).
http://www.iont.ru/projects/rfbr/90308/ttrt.php[^] (article is in Russian, but there are some graphs, you can take a look)
Thank you for .NET version of this alghorithm!
|
|
|
|
 |
|
 |
Thanks for the comments, Delphi is a very good language, it is a pity it isn't cross platform, particularly the GUI side which is very mature. Given the switch of Apple to Intel chips, Borland (CodeGear) could easily port at least the compiler to Linux and Mac. The lack of portability is one reason for doing more more work on the .NET side especially because Mono provides some degree of portability.
H
|
|
|
|
 |
|
|
Last Updated 18 Aug 2008 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010