Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / C#
Article

Optimizing building trees from a database

Rate me:
Please Sign up or sign in to vote.
4.50/5 (13 votes)
20 Jan 20054 min read 89.8K   662   54   19
How a different way of looking at a problem can result in better performance.

Introduction

Some time ago, I had to write some code that created an object tree from a DataView. Pretty straight forward job, the code worked and I didn't bother with it for a while.

Then recently I started using the ANTS Profiler and found that the tree construction was actually rather slow. I examined my old code again and came up with a way of speeding it up significantly...

This article explains how the recursive character of the DataView can be used to get a big performance improvement.

Note: the method I present doesn't rely on incremental sorting of the DataView. Sorted backwards or even a mess will work. This is a very welcome property if the tree data is from an external, uncontrollable source.

What's in the ZIP?

The ZIP-file contains the tree building library code, and the test application code. Also included is the tree DataSet in XML format. The tree consists of 745 nodes, going about five levels deep. The test application is a simple console application that creates TreeNodes using the slow and the fast way and shows the loading time in seconds. It does this five times to get a better idea of the performance boost.

From the tree's point of view

In my original code, I handled the problem from the tree's point of view. The tree always has a root node, so I started with that node, and searched for its child nodes in the DataView. For every child node, I searched for their respective child nodes as well, and so on.

The DataSet consists of a collection of records with three columns: NodeID, ParentNodeID and NodeText. The data is recursive: The ParentNodeID refers to the NodeID of the parent node. The database makes sure that a node can't be attached to a non-existent parent node. Because of this rule, a root node is made by having the node point to itself.

C#
public class TreeNode
{
  // Declarations
  private int id;               // The ID of the node
  private TreeNode parentNode;  // The parent node of the node
  private string text;          // The payload of the node
  private ArrayList childNodes; // Contains the childnodes of the node

  // Constructor
  public TreeNode(int id, TreeNode parentNode)
  {
    // Initialize the FastTreeNode
    this.id = id;
    this.parentNode = parentNode;
    this.text = "";
    childNodes = new ArrayList();

    // Check if a parent was supplied
    if (parentNode != null)
    {
      //Yes, then let the parentnode know it has a new child
      parentNode.ChildNodes.Add(this);
    }
  }

  // Properties
  public int ID
  {
    get {return id;}
  }

  public TreeNode ParentNode
  {
    get {return parentNode;}
    set
    {
      parentNode = value;
      // Let the parent node know it has a new child,
      // if possible, and if necessary
      if (parentNode != null && !parentNode.ChildNodes.Contains(this))
        parentNode.ChildNodes.Add(this);
    }
  }

  public string Text
  {
    get {return text;}
    set {text = value;}
  }

  public ArrayList ChildNodes
  {
    get {return childNodes;}
  }
}

The TreeNode class basically reflects the database structure. It has an ID, and it refers to a parent. A root node has a null parent. As a simple payload, the node has a Text property.

C#
public TreeNode LoadTree(DataView dataTree)
    {
      // Prepare the resulting rootnode
      TreeNode rootNode = null;

      // Loop through the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is the rootnode
        if (nodeID == parentNodeID)
        {
          // Yes, so create the node, and start
          // searching for it's child nodes
          rootNode = new TreeNode(nodeID, null);
          rootNode.Text = text;
          LoadChildNodes(rootNode, dataTree);
        }
      }

      // Return the result
      return rootNode;
    }

    private void LoadChildNodes(TreeNode parentNode, DataView dataTree)
    {
      // Loop throught the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is not the root node,
        // and if the current node
        // is a child node of the supplied parent node
        if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
        {
          // Yes, create the node
          // and start searching for it's child nodes.
          TreeNode node = new TreeNode(nodeID, parentNode);
          node.Text = text;
          LoadChildNodes (node, dataTree);
        }
      }
    }

The LoadTree method calls the recursive LoadChildNodes method. This makes the code quite compact, however, there's one big problem with it: for every node, it scans the entire DataView, including already handled nodes. So with the test DataSet, 555770 records will be read. Combined with the fact that the DataView doesn't read too fast, the total time required to place all nodes is considerable.

Also, if you try to load a very deeply branched tree, it might cause stack overflow problems.

From the DataView's point of view

The DataView has a linear character: it starts at record 0 and ends at record x. Just going through the records from 0 to x to create each node doesn't seem possible at first: if its parent node was not created earlier, it's impossible to create the current node. Or is it?

The trick is to let go of the idea that each object you create should be complete. I don't know anything about the parent node, but I do know the ID it will have. So I can create an new node with only the parent's ID and no parent node, and store it in a Hashtable. If I find the ID of the empty node later on in the DataSet, I don't create it but get the stored node from the Hashtable instead and fill in the empty properties.

C#
public TreeNode LoadTree(DataView dataTree)
    {
      // Loop through all records in the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in some typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the node was already created
        if (nodeList.Contains(nodeID))
        {
          // Yes, fill in the missing properties
          TreeNode node = (TreeNode)nodeList[nodeID];
          node.Text = text;
          // If the node is not a root node,
          // and there's no parent yet, look up or create
          if (nodeID != parentNodeID && node.ParentNode == null)
          {
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // The parentnode doesn't exist yet, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }
            node.ParentNode = parentNode;
          }
        }
        else
        {
          // New node, check if it's a root node
          if (nodeID == parentNodeID)
          {
            // Yes, so no need for looking up or creating a parentnode
            TreeNode node = new TreeNode(nodeID, null);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
          else
          {
            // New child node
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // No, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }

            // Create the new node
            TreeNode node = new TreeNode(nodeID, parentNode);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
        }
      } 
     
      // Find the rootnode
      IDictionaryEnumerator nodeEnumerator = nodeList.GetEnumerator();
      while (nodeEnumerator.MoveNext())
      {
        TreeNode node = (TreeNode)nodeEnumerator.Value;
        if (node.ParentNode == null)
          return node;
      }

      // Return nothing if no rootnode was found
      return null;
    }

The LoadTree method here is a bit longer but it reads every record only once. Instead of searching nodes in the DataView, they're searched using the much faster Hashtable. The only additional thing that needs to be done at the end is locating the root node, whereas the SlowTree version has it ready at hand: it was the starting point. However, the extra time this extra code requires isn't really noticeable because of the speed with which the tree was built.

The results

On my 1.8GHz laptop, the test application requires just a little above two seconds to get the tree using the slow method. Using the fast method, it only takes about 0.004 seconds to create the same tree. That's about 500 times faster!

Note: of course, the test tree is a very simple tree. In real-life applications, each tree node might have a bigger payload than just a text. This will probably slow it down a little, but it will still be much faster than the original slow tree. I've seen improvements up to 400 times faster in my applications.

Conclusion

Applications can have a bad performance because:

  • it calls (fast) methods very often, or
  • it calls methods that are very slow, or, worst of all,
  • it calls methods that are very slow very often.

The solution offered in this article avoids this by keeping the calls to the slowest method (reading records from a DataView) to an absolute minimum. Admittedly, this is only possible because of the close relationship between the child- and parent objects, and some clever using of the little information we have about the objects.

Acknowledgements

I'd like to thank:

  • Daniel Strigl for his easy-to-use HighPerformanceTimer class[^] , which I used to take the measurements.
  • My colleague Jan, who originally came up with the described idea.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer
Netherlands Netherlands
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralSlow Tree Can Approach Fast Tree Pin
zaighum8-Apr-05 21:50
zaighum8-Apr-05 21:50 
GeneralFaster SlowTree with DataTable.Select Pin
tw6531-Jan-05 0:46
tw6531-Jan-05 0:46 
GeneralFasterTree Pin
NewKidX28-Jan-05 2:11
NewKidX28-Jan-05 2:11 
GeneralRe: FasterTree Pin
NielsHoldijk29-Jan-05 6:16
NielsHoldijk29-Jan-05 6:16 
GeneralLoad them dynamicly Pin
Tom Janssens20-Jan-05 20:58
Tom Janssens20-Jan-05 20:58 
GeneralRe: Load them dynamicly Pin
NielsHoldijk20-Jan-05 21:59
NielsHoldijk20-Jan-05 21:59 
GeneralAn even faster way Pin
Sebastien Lorion20-Jan-05 18:10
Sebastien Lorion20-Jan-05 18:10 
GeneralRe: An even faster way Pin
NielsHoldijk20-Jan-05 21:42
NielsHoldijk20-Jan-05 21:42 
GeneralRe: An even faster way Pin
Sebastien Lorion21-Jan-05 3:11
Sebastien Lorion21-Jan-05 3:11 
GeneralRe: An even faster way Pin
NielsHoldijk22-Jan-05 0:59
NielsHoldijk22-Jan-05 0:59 
With your suggestion in mind I've created a 'SortedTree' without a lookup Hashtable to compare it. It is indeed faster, but with only about 1 or 2 thousands of a second. I expect the memory usage to be lower as well, although very little because, if I'm not mistaken, the FastTree's hashtable only stores very compact references instead of the actual node. Oh, and the SortedTree code is shorter as well.

If the tree is pre-sorted both methods actually work pretty much the same, because the FastTree method will never have to create a unknown parent. I think the speed difference is in the extra loop at the end of the FastTree code to find the root node. The SortedTree doesn't need it because it started with that root node.

To prevent undermining my own article, I also shuffled the xml file a bit. Both the SlowTree and the FastTree still came up with the root node, whereas the SortedTree only came up with 'null reference' exceptions.

So I'll stick to my FastTree. I'm happy to pay 0.002 seconds for reliability Smile | :) Especially when you get the tree data from an external system where you can't control the contents, I don't think that the extra safety is a luxury...

I'll add a note to the article that it also works with messed up trees.

Niels
GeneralRe: An even faster way Pin
Sebastien Lorion22-Jan-05 10:15
Sebastien Lorion22-Jan-05 10:15 
GeneralRe: An even faster way Pin
NielsHoldijk23-Jan-05 3:38
NielsHoldijk23-Jan-05 3:38 
GeneralRe: An even faster way Pin
Ashaman26-Jan-05 3:18
Ashaman26-Jan-05 3:18 
GeneralRe: An even faster way Pin
GWSyZyGy28-Jan-05 4:00
GWSyZyGy28-Jan-05 4:00 
GeneralRe: An even faster way Pin
Sebastien Lorion31-Jan-05 18:06
Sebastien Lorion31-Jan-05 18:06 
GeneralRe: An even faster way Pin
Sebastien Lorion31-Jan-05 18:12
Sebastien Lorion31-Jan-05 18:12 
GeneralRe: An even faster way Pin
Sebastien Lorion31-Jan-05 18:25
Sebastien Lorion31-Jan-05 18:25 
GeneralRe: An even faster way Pin
TheLionClaws3-Aug-05 12:10
TheLionClaws3-Aug-05 12:10 
GeneralTest Pin
BringerOfRocketyDeath20-Jan-05 12:20
sussBringerOfRocketyDeath20-Jan-05 12:20 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.