With C# 2.0, a powerful language feature is being introduced which all C++ programmers have been missing since the beginnings of this altogether so cleanly and beautifully designed language: generics. This article uses this new feature to implement a generic tree class. Tree structures have more than just one natural way of being traversed, this is also a good motivation to demonstrate the use of C# 2.0's new iterator feature. If you wonder how you can build the code or how you can start trying out the new language features, go look at Microsoft's site where you can download a beta of their new Express line of developers products.
For a general introduction into the topic of generics and/or iterators, please take a look at the specification of C# 2.0 by Microsoft which does a good job on explaining them in detail.
When using the code, you will be dealing with two classes:
NodeCollection. Before going into the details of the implementation, it's probably a good idea to scan the demo project which comes with the source archive. The demo project is a console application which takes a single parameter. This parameter should be a path pointing to a directory. The demo first loads the file hierarchy rooted inside that path into an internally kept tree structure using the
Node class, and then iterates over this tree structure once depth-first, then breadth-first. In the demo's
Run() method, you see how to specialize the
Node class for a given type, and construct an instance:
void Run(string rootdir)
Node<string> root = new Node<string>(rootdir);
The code above creates an instance of the
Node class specialized for containing data of type
string and initializes the instance with a value. The
Node class contains a property
Data which allows read-only access to the element type for which it is specialized. Exactly this property is what you initialize in the constructor code. If you want to access the children of a node, you use the
Children property which returns an instance of
NodeCollection. Each node has a
Parent property, too, which is read-only. If you add a node as the child of another, the first node's
Parent property will reflect this change. Newly constructed nodes return
null as their
Parent, meaning they are located in the root of the hierarchy. The
NodeCollection, on the other hand, has a read-only property
Owner returning the
Node whose children it contains.
NodeCollection is not meant to be used as a general-purpose container for
Nodes, thus its constructor is not publicly accessible.
Node contains other useful properties and methods: the
Root property gets the root of a node (wouldn't you have guessed ;-)),
IsAncestorOf() takes another node and returns whether the instance node is an ancestor of the passed node,
IsDescendantOf() checks for exactly the opposite condition, and
DoesShareHierarchyWith() takes another node and returns whether the instance node and the passed node are equal or have an ancestor-descendant relationship (
true) or not (
GetBreadthFirstEnumerator() return iterators performing a depth-first or breadth-first traversal. Note that the iterators returned by these methods iterate over the
Data values of the nodes, not the nodes themselves, meaning that
.Current will return the value of
Data of the currently visited node.
The implementation of this small library makes use of two new features of C# 2.0: generics and iterators. Generics are used to allow the nodes to function as a type-safe container. Without them, the
Data property would have to be of type object, requiring clients to cast to the correct type and allowing to instantiate
Node with objects of different types. Iterators, on the other hand, are used for implementing depth-first and breadth-first traversal in a compellingly intuitive way. Let's take a look at their implementation:
public IEnumerator<Element> GetDepthFirstEnumerator()
yield return mData;
foreach (Node<Element> kid in mChildren)
IEnumerator<Element> kidenumerator= kid.GetDepthFirstEnumerator();
yield return kidenumerator.Current;
IEnumerator<> is defined in
System.Collections.Generic and defines a typesafe iterator, i.e., an iterator whose
Current property is strongly typed.
Now, our depth-first traversal first returns the data of the current node itself, then iterates over the children nodes, gets their enumerators, and returns the values visited. For example, if you build up a structure like:
Node<string> root= new Node<string>("Gramps");
Node<string> son= new Node<string>("Son");
Node<string> daughter= new Node<string>("Daughter");
a call like:
foreach (string s in root.GetDepthFirstEnumerator())
will result in:
root's data, i.e., "Gramps"; then it starts iterating over its children, the first one being
son.GetDepthFirstEnumerator() gets called and yields its data "Son" before starting to iterate over its own children. Both children yield their data "Grandson" and "Granddaughter", but no more because they don't have children themselves. Now,
son.GetDepthFirstEnumerator() is done, and
root.GetDepthFirstEnumerator() goes on to
daughter, which, having no children of its own, yields only its own data "Daughter".
Breadth-first traversal is a little more complicated, but still C# 2.0's iterator facility allows an incredibly easy implementation:
public IEnumerator<Element> GetBreadthFirstEnumerator()
Queue<Node<Element>> todo = new Queue<Node<Element>>();
while (0 < todo.Count)
Node<Element> n = todo.Dequeue();
foreach (Node<Element> kid in n.mChildren)
yield return n.mData;
Assuming the same node structure as for the depth-first example, the code piece:
foreach (string s in root.GetBreadthFirstEnumerator())
produces this output:
In this case, recursion is unrolled into a loop. First,
root.GetBreadthFirstEnumerator() creates a queue and puts
root inside. Then, getting into the
while loop, it unqueues
root. Now, it iterates over
daughter and puts them into the queue. Then, it yields
root's data "Gramps". Next time inside the loop, it pops
son from the queue, pushes
son's children into the queue, and yields
son's data "Son". Next time, it pops
daughter from the queue which has no children, so it yields
daughter's data "Daugther" and proceeds. Now, it pops the two children of
son, both of which don't have children of their own, thus do not add anything to the queue. Their data "Grandson" and "Granddaugther" gets yielded, and next time checking the loop-condition, the queue is empty and traversal thus finished.
Conclusion and To-Do
This little class gets you a generic tree container, if you need a quick and simple solution. It shows off the possibilities the new language features of C# 2.0 facilitate. In order to create a really useful, general purpose tree library, though, the code would have to be more adaptable to specific needs of the clients. Here's a short list of ideas - and who knows, maybe my next article will have some or even all of them implemented:
- At the moment,
Node is implemented for trees with an arbitrary number of children. But very often, you know that each node will have a fixed number of children (binary trees, quad trees etc.). For such scenarios, the presented implementation is clearly inefficient as it makes use internally of a framework container (
System.Containers.Generic.List) where it should be using an array. So, it would be nice if nodes were parameterizable with a strategy for keeping children. Ideally, there would be two predefined strategies in the library, one using the list container, another one for arrays of static size, and clients could implement an interface for other cases.
- Considering that quite often there is the need to visualize a tree structure, it would make sense to write an adaptor for filling a
TreeView from a
- As for now, both traversal algorithms do not supply clients with any information about where exactly they are in the tree. Now, in game applications, for example, it's usually important to know how deep a depth-first traversal has gone currently. Besides, sometimes they want to parameterize how deep traversal will proceed, or they want to add nodes as they get deeper into the tree.
2004-08-10: first release.