Optimizing building trees from a database






4.50/5 (12 votes)
Jan 20, 2005
4 min read

92492

665
How a different way of looking at a problem can result in better performance.
Introduction
Some time ago, I had to write some code that created an object tree from a DataView
. Pretty straight forward job, the code worked and I didn't bother with it for a while.
Then recently I started using the ANTS Profiler and found that the tree construction was actually rather slow. I examined my old code again and came up with a way of speeding it up significantly...
This article explains how the recursive character of the DataView
can be used to get a big performance improvement.
Note: the method I present doesn't rely on incremental sorting of the DataView
. Sorted backwards or even a mess will work. This is a very welcome property if the tree data is from an external, uncontrollable source.
What's in the ZIP?
The ZIP-file contains the tree building library code, and the test application code. Also included is the tree DataSet
in XML format. The tree consists of 745 nodes, going about five levels deep. The test application is a simple console application that creates TreeNode
s using the slow and the fast way and shows the loading time in seconds. It does this five times to get a better idea of the performance boost.
From the tree's point of view
In my original code, I handled the problem from the tree's point of view. The tree always has a root node, so I started with that node, and searched for its child nodes in the DataView
. For every child node, I searched for their respective child nodes as well, and so on.
The DataSet
consists of a collection of records with three columns: NodeID
, ParentNodeID
and NodeText
. The data is recursive: The ParentNodeID
refers to the NodeID
of the parent node. The database makes sure that a node can't be attached to a non-existent parent node. Because of this rule, a root node is made by having the node point to itself.
public class TreeNode
{
// Declarations
private int id; // The ID of the node
private TreeNode parentNode; // The parent node of the node
private string text; // The payload of the node
private ArrayList childNodes; // Contains the childnodes of the node
// Constructor
public TreeNode(int id, TreeNode parentNode)
{
// Initialize the FastTreeNode
this.id = id;
this.parentNode = parentNode;
this.text = "";
childNodes = new ArrayList();
// Check if a parent was supplied
if (parentNode != null)
{
//Yes, then let the parentnode know it has a new child
parentNode.ChildNodes.Add(this);
}
}
// Properties
public int ID
{
get {return id;}
}
public TreeNode ParentNode
{
get {return parentNode;}
set
{
parentNode = value;
// Let the parent node know it has a new child,
// if possible, and if necessary
if (parentNode != null && !parentNode.ChildNodes.Contains(this))
parentNode.ChildNodes.Add(this);
}
}
public string Text
{
get {return text;}
set {text = value;}
}
public ArrayList ChildNodes
{
get {return childNodes;}
}
}
The TreeNode
class basically reflects the database structure. It has an ID, and it refers to a parent. A root node has a null
parent. As a simple payload, the node has a Text
property.
public TreeNode LoadTree(DataView dataTree)
{
// Prepare the resulting rootnode
TreeNode rootNode = null;
// Loop through the dataview
foreach (DataRowView dataNode in dataTree)
{
// Store the current record in typed variables
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
// Check if the current node is the rootnode
if (nodeID == parentNodeID)
{
// Yes, so create the node, and start
// searching for it's child nodes
rootNode = new TreeNode(nodeID, null);
rootNode.Text = text;
LoadChildNodes(rootNode, dataTree);
}
}
// Return the result
return rootNode;
}
private void LoadChildNodes(TreeNode parentNode, DataView dataTree)
{
// Loop throught the dataview
foreach (DataRowView dataNode in dataTree)
{
// Store the current record in typed variables
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
// Check if the current node is not the root node,
// and if the current node
// is a child node of the supplied parent node
if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
{
// Yes, create the node
// and start searching for it's child nodes.
TreeNode node = new TreeNode(nodeID, parentNode);
node.Text = text;
LoadChildNodes (node, dataTree);
}
}
}
The LoadTree
method calls the recursive LoadChildNodes
method. This makes the code quite compact, however, there's one big problem with it: for every node, it scans the entire DataView
, including already handled nodes. So with the test DataSet
, 555770 records will be read. Combined with the fact that the DataView
doesn't read too fast, the total time required to place all nodes is considerable.
Also, if you try to load a very deeply branched tree, it might cause stack overflow problems.
From the DataView's point of view
The DataView
has a linear character: it starts at record 0 and ends at record x. Just going through the records from 0 to x to create each node doesn't seem possible at first: if its parent node was not created earlier, it's impossible to create the current node. Or is it?
The trick is to let go of the idea that each object you create should be complete. I don't know anything about the parent node, but I do know the ID it will have. So I can create an new node with only the parent's ID and no parent node, and store it in a Hashtable
. If I find the ID of the empty node later on in the DataSet
, I don't create it but get the stored node from the Hashtable
instead and fill in the empty properties.
public TreeNode LoadTree(DataView dataTree)
{
// Loop through all records in the dataview
foreach (DataRowView dataNode in dataTree)
{
// Store the current record in some typed variables
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
// Check if the node was already created
if (nodeList.Contains(nodeID))
{
// Yes, fill in the missing properties
TreeNode node = (TreeNode)nodeList[nodeID];
node.Text = text;
// If the node is not a root node,
// and there's no parent yet, look up or create
if (nodeID != parentNodeID && node.ParentNode == null)
{
TreeNode parentNode = null;
// Check if the parentnode was already created
if (nodeList.Contains(parentNodeID))
{
// Yes, so use that one
parentNode = (TreeNode)nodeList[parentNodeID];
}
else
{
// The parentnode doesn't exist yet, so create a partial parentnode.
parentNode = new TreeNode(parentNodeID, null);
nodeList.Add(parentNodeID, parentNode);
}
node.ParentNode = parentNode;
}
}
else
{
// New node, check if it's a root node
if (nodeID == parentNodeID)
{
// Yes, so no need for looking up or creating a parentnode
TreeNode node = new TreeNode(nodeID, null);
node.Text = text;
nodeList.Add(nodeID, node);
}
else
{
// New child node
TreeNode parentNode = null;
// Check if the parentnode was already created
if (nodeList.Contains(parentNodeID))
{
// Yes, so use that one
parentNode = (TreeNode)nodeList[parentNodeID];
}
else
{
// No, so create a partial parentnode.
parentNode = new TreeNode(parentNodeID, null);
nodeList.Add(parentNodeID, parentNode);
}
// Create the new node
TreeNode node = new TreeNode(nodeID, parentNode);
node.Text = text;
nodeList.Add(nodeID, node);
}
}
}
// Find the rootnode
IDictionaryEnumerator nodeEnumerator = nodeList.GetEnumerator();
while (nodeEnumerator.MoveNext())
{
TreeNode node = (TreeNode)nodeEnumerator.Value;
if (node.ParentNode == null)
return node;
}
// Return nothing if no rootnode was found
return null;
}
The LoadTree
method here is a bit longer but it reads every record only once. Instead of searching nodes in the DataView
, they're searched using the much faster Hashtable
. The only additional thing that needs to be done at the end is locating the root node, whereas the SlowTree version has it ready at hand: it was the starting point. However, the extra time this extra code requires isn't really noticeable because of the speed with which the tree was built.
The results
On my 1.8GHz laptop, the test application requires just a little above two seconds to get the tree using the slow method. Using the fast method, it only takes about 0.004 seconds to create the same tree. That's about 500 times faster!
Note: of course, the test tree is a very simple tree. In real-life applications, each tree node might have a bigger payload than just a text. This will probably slow it down a little, but it will still be much faster than the original slow tree. I've seen improvements up to 400 times faster in my applications.
Conclusion
Applications can have a bad performance because:
- it calls (fast) methods very often, or
- it calls methods that are very slow, or, worst of all,
- it calls methods that are very slow very often.
The solution offered in this article avoids this by keeping the calls to the slowest method (reading records from a DataView
) to an absolute minimum. Admittedly, this is only possible because of the close relationship between the child- and parent objects, and some clever using of the little information we have about the objects.
Acknowledgements
I'd like to thank:
- Daniel Strigl for his easy-to-use HighPerformanceTimer class[^] , which I used to take the measurements.
- My colleague Jan, who originally came up with the described idea.