## Introduction

Red-Black trees are ordered binary trees with one extra attribute in each node: the color, which is either red or black. Like the Treap, and the AVL Tree, a Red-Black tree is a self-balancing tree that automatically keeps the tree’s height as short as possible. Since search times on trees are dependent on the tree’s height (the higher the tree, the more nodes to examine), keeping the height as short as possible increases performance of the tree.

Red black trees were introduced by Rudolf Bayer as “Symmetric Bimary B-Trees” in his 1972 paper, Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms, published in *Acta Informatica*, Volume 1, pages 290-306. Later, Leonidas J.Guibas and Robert Sedgewick added the red and black property and gave the tree its name (see: Guibas, L. and Sedgewick, R. "A Dichromatic Framework for Balanced Trees" In Proc. 19^{th} IEEE Symp. Foundations of Computer Science, pp. 8-21, 1978).

Apparently, Java’s *TreeMap* class is implemented as a Red-Black tree as well as IBM's old ISAM (Indexed Sequential Access Method ISAM) and SoftCraft's *Btrieve*.

This article provides a Red-BlackTree implementation in the C# language.

## Background

*Ordered binary trees* are popular and fundamental data structures that store data in linked *nodes*. Each node has, at most, 2 child nodes linked to itself. Some nodes may not have any child nodes, others may have one child node, but no node will have more than two child nodes. A node having at least one child node is referred to as a *parent* node.

Ultimately, all nodes of a tree are child nodes of the *root* node. The root node is the top node of the entire tree. Every child node contains a value, or a key, that determines its position in the tree relative to its parent. Since the root node is the top parent, all nodes are organized relative to the root node in *branches*. Child nodes on the left side of the root have keys that are less than the parent’s key, and child nodes on the right have keys that are greater than the root. This property is extended to every node of the tree.

Because each node is linked (or points) to the next node (unless it is a *leaf*), the tree can be walked (or *traversed*) to produce an ordered list of keys. Binary trees combine the functionality of ordered arrays and linked lists.

#### Problem: Unbalanced Trees

Ordered Binary trees are not without problems. If items are added to the tree in sequential (ascending or descending) order, the result is a vertical tree.

This results in the worst case searching time. Essentially, each item adds to the height of the tree which increases the time to retrieve any given node. If the tree contains 10 nodes, it will take 10 comparisons (beginning at the root) to reach the 10^{th} node. Thus an ordered binary tree's worst case searching time is O(n) or linear time.

However, if items are inserted randomly, the height of the tree is shortened as nodes are spread horizontally.

Therefore, trees created from random items have better look-up times than trees created from ordered items. More formally, the time it takes to search an ordered binary tree depends on its *topology*. The greater the *breadth*, the faster the performance. Trees are said to be perfectly *balanced* when all their leaf nodes are at the same level. So, the closer the tree is to being perfectly balanced, the faster it will perform.

#### Self-Balancing Trees

In many applications, if not most, there isn’t a convenient way to randomize the input prior to inserting it into an ordered tree. Fortunately, this isn’t necessary. Self-balancing trees reorder their nodes after insertions and deletions to keep the tree balanced. By reordering the nodes, self- balancing trees give the effect of random input.

Rebalancing is accomplished by *rotating* nodes left or right. This won’t destroy their key order. In other words, the tree is restructured but the child nodes maintain their key order relative to their parents.

To rotate right, push node X down and to the right. Node X's left child replaces X, and the left child's right child becomes X's left child.

To rotate left, push node X down and to the left. Node X's right child replaces X, and the right child's left child becomes X's right child.

Different balancing algorithms exist. Treaps use a random priority in the nodes to randomize and balance the tree. AVL trees use a balance-factor. Red-Black trees use color to balance the tree.

#### Red-Black trees

Red-Black trees are ordered binary trees where each node uses a color attribute, either red or black, to keep the tree balanced. Rarely do balancing algorithms perfectly balance a tree but they come close. For a red-black tree, no leaf is more than twice as far from the root as any other. A red-black tree has the following properties:

- The root is black.
- All leaves (sentinel nodes) are black.
- Red nodes can only have black children.
- All paths from a node to its leaves contain the same number of black nodes.

The last property, in particular, keeps the tree height short and increases the breadth of the tree. By forcing each leaf to have the same black height, the tree will tend to spread horizontally, which increases performance.

The leaf nodes that are labeled “nil” are *sentinel* nodes. These nodes contain null or nil values, and are used to indicate the end of a *subtree*. They are crucial to maintaining the red-black properties and are key to a successful implementation. Sentinel nodes are always colored black. Therefore, standalone red nodes, such as “24” and “40” in Figure 6, automatically have two black child leaves. Sentinel nodes are not always displayed in red-black tree depictions but they are always implied.

For optimum performance, all data structures and algorithms used in an application should be evaluated and chosen based on the need of the application. Red-Black trees perform well. The average and worst-case insert, delete, and search time is *O*(lg *n*). In applications where the data is constantly changing, red-black trees can perform faster than arrays and linked lists.

## Using the Code

The project available for download includes a red-black tree implementation and a Test project that gives examples using the tree. Extract the zip file into a directory of your choice. The zipped file will create its own directory called *RedBlackCS*.

The project is contained with the `RedBlackCS`

namespace and consists of four classes:

`ReadBlack`

: The main class that implements the API and functionality.
`RedBlackEnumerator`

: Returns the keys or data objects of the tree in sorted order.
`RedBlackException`

: Distinguishes exceptions in the tree from .NET exceptions.
`RedBlackNode`

: Encapsulates a node.

To use the tree, include the *RedBlackCS.dll* as a Reference to the calling project.

To create a `RedBlack`

object, call the default constructor:

RedBlack redBlack = new RedBlack();

The `RedBlack`

's `Add`

method requires a key and data object passed as arguments.

public void Add(IComparable key, object data)

In order for the `RedBlack`

object to make the necessary key comparisons, the key object must implement the .NET `IComparable`

interface:

public class MyKey : IComparable
{
private int intMyKey;
public int Key
{
get
{
return intMyKey;
}
set
{
intMyKey = value;
}
}
public MyKey(int key)
{
intMyKey = key;
}
public int CompareTo(object key)
{
if(Key > ((MyKey)key).Key)
return 1;
else
if(Key < ((MyKey)key).Key)
return -1;
else
return 0;
}
}

#### Other Methods

Calling the `GetData()`

method passing a key object as an argument retrieves a data object from the tree.

public object GetData(IComparable key)

Nodes are removed by calling the `Remove()`

method.

public void Remove(IComparable key)

Additionally, the `RedBlack`

class contains several other methods that offer convenient functionality:

`GetMinKey()`

: Returns the minimum key value.
`GetMaxKey()`

: Returns the maximum key value.
`GetMinValue()`

: Returns the object having the minimum key value.
`GetMaxValue()`

: Returns the object having the maximum key value.
`GetEnumerator()`

: Returns a `RedBlackEnumerator`

used to iterate through the `Treap`

.
`Keys()`

: Returns a `RedBlackEnumerator`

used to iterate through the keys.
`Values()`

: Returns a `RedBlackEnumerator`

used to iterate through the data objects.
`RemoveMin()`

: Removes the node with the minimum key.
`RemoveMax()`

: Removes the node with the maximum key.

## Partial Output from the Test Project

The sample project demonstrates various method calls to the RedBlack tree and displays the effect of the calls by dumping the tree’s contents to the Console. Executing the sample project produces the following partial output:

## Points of Interest

The `RedBlackEnumerator`

returns the keys and/or the data objects contained, in ascending or descending order. To implement this functionality, I used the .NET `Stack`

class to keep the next node in sequence on the top of the `Stack`

. As the tree is traversed, each child node is pushed onto the stack until the next node in sequence is found. This keeps the child nodes towards the top of the stack and the parent nodes further down in the stack.

Also, unlike my Treap implementation, the `RedBlack`

class saves the last node retrieved (or added) in the event that the same key is requested. This probably won’t happen often, but if it does, it will save a tree walk searching for the key.

## Links

- For a mathematical description, see: MathWorld’s Red-Black Tree.
- An excellent tutorial is located at: Red Black Trees.
- The following web site contains implementations in Visual Basic and C. It was very helpful to me: Red-Black Trees.
- Finally, a demo that I used to test my implementation is found at Red/Black Tree Demo. This is a great tool to understand the red-black tree properties.

## Improvements

I’m sure there’re many. One in particular would be to replace the `IComparable`

interface with an `Int32`

. This removes the need for a separate class that implements the `IComparable`

interface since the `Int32`

class already implements the `IComparable`

interface. This would make the implementation less general but it would speed up performance, I think.

It would be nice if the test project displayed the tree in a graphical format, even a simple one.

Roy is a software developer who digs all aspects of software development, from design and architecture to implementation.