Click here to Skip to main content
Click here to Skip to main content

Optimizing building trees from a database

, 20 Jan 2005
Rate this:
Please Sign up or sign in to vote.
How a different way of looking at a problem can result in better performance.

Introduction

Some time ago, I had to write some code that created an object tree from a DataView. Pretty straight forward job, the code worked and I didn't bother with it for a while.

Then recently I started using the ANTS Profiler and found that the tree construction was actually rather slow. I examined my old code again and came up with a way of speeding it up significantly...

This article explains how the recursive character of the DataView can be used to get a big performance improvement.

Note: the method I present doesn't rely on incremental sorting of the DataView. Sorted backwards or even a mess will work. This is a very welcome property if the tree data is from an external, uncontrollable source.

What's in the ZIP?

The ZIP-file contains the tree building library code, and the test application code. Also included is the tree DataSet in XML format. The tree consists of 745 nodes, going about five levels deep. The test application is a simple console application that creates TreeNodes using the slow and the fast way and shows the loading time in seconds. It does this five times to get a better idea of the performance boost.

From the tree's point of view

In my original code, I handled the problem from the tree's point of view. The tree always has a root node, so I started with that node, and searched for its child nodes in the DataView. For every child node, I searched for their respective child nodes as well, and so on.

The DataSet consists of a collection of records with three columns: NodeID, ParentNodeID and NodeText. The data is recursive: The ParentNodeID refers to the NodeID of the parent node. The database makes sure that a node can't be attached to a non-existent parent node. Because of this rule, a root node is made by having the node point to itself.

  public class TreeNode
  {
    // Declarations
    private int id;               // The ID of the node
    private TreeNode parentNode;  // The parent node of the node
    private string text;          // The payload of the node
    private ArrayList childNodes; // Contains the childnodes of the node

    // Constructor
    public TreeNode(int id, TreeNode parentNode)
    {
      // Initialize the FastTreeNode
      this.id = id;
      this.parentNode = parentNode;
      this.text = "";
      childNodes = new ArrayList();

      // Check if a parent was supplied
      if (parentNode != null)
      {
        //Yes, then let the parentnode know it has a new child
        parentNode.ChildNodes.Add(this);
      }
    }

    // Properties
    public int ID
    {
      get {return id;}
    }

    public TreeNode ParentNode
    {
      get {return parentNode;}
      set
      {
        parentNode = value;
        // Let the parent node know it has a new child,
        // if possible, and if necessary
        if (parentNode != null && !parentNode.ChildNodes.Contains(this))
          parentNode.ChildNodes.Add(this);
      }
    }

    public string Text
    {
      get {return text;}
      set {text = value;}
    }

    public ArrayList ChildNodes
    {
      get {return childNodes;}
    }
  }

The TreeNode class basically reflects the database structure. It has an ID, and it refers to a parent. A root node has a null parent. As a simple payload, the node has a Text property.

public TreeNode LoadTree(DataView dataTree)
    {
      // Prepare the resulting rootnode
      TreeNode rootNode = null;

      // Loop through the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is the rootnode
        if (nodeID == parentNodeID)
        {
          // Yes, so create the node, and start
          // searching for it's child nodes
          rootNode = new TreeNode(nodeID, null);
          rootNode.Text = text;
          LoadChildNodes(rootNode, dataTree);
        }
      }

      // Return the result
      return rootNode;
    }

    private void LoadChildNodes(TreeNode parentNode, DataView dataTree)
    {
      // Loop throught the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the current node is not the root node,
        // and if the current node
        // is a child node of the supplied parent node
        if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
        {
          // Yes, create the node
          // and start searching for it's child nodes.
          TreeNode node = new TreeNode(nodeID, parentNode);
          node.Text = text;
          LoadChildNodes (node, dataTree);
        }
      }
    }

The LoadTree method calls the recursive LoadChildNodes method. This makes the code quite compact, however, there's one big problem with it: for every node, it scans the entire DataView, including already handled nodes. So with the test DataSet, 555770 records will be read. Combined with the fact that the DataView doesn't read too fast, the total time required to place all nodes is considerable.

Also, if you try to load a very deeply branched tree, it might cause stack overflow problems.

From the DataView's point of view

The DataView has a linear character: it starts at record 0 and ends at record x. Just going through the records from 0 to x to create each node doesn't seem possible at first: if its parent node was not created earlier, it's impossible to create the current node. Or is it?

The trick is to let go of the idea that each object you create should be complete. I don't know anything about the parent node, but I do know the ID it will have. So I can create an new node with only the parent's ID and no parent node, and store it in a Hashtable. If I find the ID of the empty node later on in the DataSet, I don't create it but get the stored node from the Hashtable instead and fill in the empty properties.

public TreeNode LoadTree(DataView dataTree)
    {
      // Loop through all records in the dataview
      foreach (DataRowView dataNode in dataTree)
      {
        // Store the current record in some typed variables
        int nodeID = (int)dataNode["NodeID"];
        int parentNodeID = (int)dataNode["ParentNodeID"];
        string text = dataNode["NodeText"].ToString();

        // Check if the node was already created
        if (nodeList.Contains(nodeID))
        {
          // Yes, fill in the missing properties
          TreeNode node = (TreeNode)nodeList[nodeID];
          node.Text = text;
          // If the node is not a root node,
          // and there's no parent yet, look up or create
          if (nodeID != parentNodeID && node.ParentNode == null)
          {
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // The parentnode doesn't exist yet, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }
            node.ParentNode = parentNode;
          }
        }
        else
        {
          // New node, check if it's a root node
          if (nodeID == parentNodeID)
          {
            // Yes, so no need for looking up or creating a parentnode
            TreeNode node = new TreeNode(nodeID, null);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
          else
          {
            // New child node
            TreeNode parentNode = null;
            // Check if the parentnode was already created
            if (nodeList.Contains(parentNodeID))
            {
              // Yes, so use that one
              parentNode = (TreeNode)nodeList[parentNodeID];
            }
            else
            {
              // No, so create a partial parentnode.
              parentNode = new TreeNode(parentNodeID, null);
              nodeList.Add(parentNodeID, parentNode);
            }

            // Create the new node
            TreeNode node = new TreeNode(nodeID, parentNode);
            node.Text = text;
            nodeList.Add(nodeID, node);
          }
        }
      } 
     
      // Find the rootnode
      IDictionaryEnumerator nodeEnumerator = nodeList.GetEnumerator();
      while (nodeEnumerator.MoveNext())
      {
        TreeNode node = (TreeNode)nodeEnumerator.Value;
        if (node.ParentNode == null)
          return node;
      }

      // Return nothing if no rootnode was found
      return null;
    }

The LoadTree method here is a bit longer but it reads every record only once. Instead of searching nodes in the DataView, they're searched using the much faster Hashtable. The only additional thing that needs to be done at the end is locating the root node, whereas the SlowTree version has it ready at hand: it was the starting point. However, the extra time this extra code requires isn't really noticeable because of the speed with which the tree was built.

The results

On my 1.8GHz laptop, the test application requires just a little above two seconds to get the tree using the slow method. Using the fast method, it only takes about 0.004 seconds to create the same tree. That's about 500 times faster!

Note: of course, the test tree is a very simple tree. In real-life applications, each tree node might have a bigger payload than just a text. This will probably slow it down a little, but it will still be much faster than the original slow tree. I've seen improvements up to 400 times faster in my applications.

Conclusion

Applications can have a bad performance because:

  • it calls (fast) methods very often, or
  • it calls methods that are very slow, or, worst of all,
  • it calls methods that are very slow very often.

The solution offered in this article avoids this by keeping the calls to the slowest method (reading records from a DataView) to an absolute minimum. Admittedly, this is only possible because of the close relationship between the child- and parent objects, and some clever using of the little information we have about the objects.

Acknowledgements

I'd like to thank:

  • Daniel Strigl for his easy-to-use HighPerformanceTimer class[^] , which I used to take the measurements.
  • My colleague Jan, who originally came up with the described idea.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

NielsHoldijk
Software Developer
Netherlands Netherlands
No Biography provided

Comments and Discussions

 
GeneralFaster SlowTree with DataTable.Select Pinmembertw6531-Jan-05 0:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140721.1 | Last Updated 20 Jan 2005
Article Copyright 2005 by NielsHoldijk
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid