|
|
Comments and Discussions
|
|
 |

|
I think slow tree can approach to fast tree if you dont pass through all nodes but only child nodes.If you use filter before searching then theres no big difference...
|
|
|
|

|
Your SlowTree is realy slow
Starting the test...
SlowTree: 1,40959060439246
FastTree: 0,00388066081024264
SlowTree: 1,44099208139582
FastTree: 0,00261122572840962
SlowTree: 1,40443464183297
FastTree: 0,00406448305580737
SlowTree: 1,42344348234203
FastTree: 0,00319314326262137
SlowTree: 1,41355256045112
FastTree: 0,00258608286807402
Press enter to continue...
My SlowTree is faster
Starting the test...
SlowTree: 0,0713822566834612
FastTree: 0,0060739563268516
SlowTree: 0,0322873437825198
FastTree: 0,00300876228682696
SlowTree: 0,0326753819270326
FastTree: 0,00305541626100524
SlowTree: 0,0469537329465058
FastTree: 0,00263469239805618
SlowTree: 0,032997769269558
FastTree: 0,00299283847528108
Press enter to continue...
change the change the call in Class1.cs Line 34 to
TreeNode slowRoot = slowTree.LoadTree(data.Table);
and merge code in SlowTree.cs
public TreeNode LoadTree(DataTable dataTree)
{
// Prepare the resulting rootnode
TreeNode rootNode = null;
// Loop through the dataview
foreach (DataRow dataNode in dataTree.Select("NodeID = ParentNodeID"))
{
// Yes, so create the node, and start searching for it's child nodes
rootNode = new TreeNode((int)dataNode["NodeID"], null);
rootNode.Text = dataNode["NodeText"].ToString();
LoadChildNodes(rootNode, dataTree);
}
// Return the result
return rootNode;
}
private void LoadChildNodes(TreeNode parentNode, DataTable dataTree)
{
// Loop throught the dataview
foreach (DataRow dataNode in dataTree.Select("ParentNodeID ='" + parentNode.ID + "'"))
{
// Store the current record in typed variables
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
// Check if the current node is not the root node, and if the current node is a child node of the supplied parent node
if (nodeID != parentNodeID && parentNodeID == parentNode.ID)
{
// Yes, create the node and start searching for it's child nodes.
TreeNode node = new TreeNode(nodeID, parentNode);
node.Text = text;
LoadChildNodes (node, dataTree);
}
}
}
viele Grüße aus Berlin
Thomas
|
|
|
|

|
Hi,
You could gain even more performance if you separate the code that creates the nodes from the code that lays out the structure of the tree. In the first pass you create the nodes straight and store them into an array (to preserve the order from the datasource) as well as a hashtable (for futrther parent lookups). In the second pass iterate over the array and for each node there find it's matching parent using the hashtable. The only addition you need is to add ParentId property to the TreeNode class.
public class FasterTree
{
private Hashtable nodeHash;
private TreeNode[] nodeList;
public TreeNode LoadTree(DataView dataTree)
{
TreeNode root = null;
nodeHash = new Hashtable(dataTree.Count);
nodeList = new TreeNode[dataTree.Count];
for (int i = 0; i < dataTree.Count; i++)
{
DataRowView dataNode = dataTree[i];
int nodeID = (int)dataNode["NodeID"];
int parentNodeID = (int)dataNode["ParentNodeID"];
string text = dataNode["NodeText"].ToString();
TreeNode node = new TreeNode(nodeID, null);
node.ParentID = parentNodeID; node.Text = text;
nodeHash.Add(nodeID, node);
nodeList[i] = node;
}
for (int i = 0; i < nodeList.Length; i++)
{
TreeNode node = nodeList[i];
if (node.ID == node.ParentID)
root = node;
else
node.ParentNode = (TreeNode)nodeHash[node.ParentID];
}
return root;
}
}
The effect is that you do the necessary minimum of parent lookups (only one for each node, always hits). Also the code is simpler. And it's indeed faster. The only drawback might be the slightly increased memory footprint.
Cheers,
NewKid.
|
|
|
|

|
I tested your version, and it's faster indeed. I'd like to comment about the reason for using the array though: Rather than for preserving the original order (not necessary), I'd use it's speed advantage over the Hashtable as a argument.
There's one thing I'm wondering about, and it may be a bit far fetched. What if the 'parent must exist' constraint doesn't exist in the datasource? Ugly, but possible. The parent lookup in the second pass would fail because the requested key doesn't exist in the hashtable. Easily fixed with a Contains check, but there is a difference with the FastTree implementation then.
The FastTree would create valid, complete treenodes pointing to rootnodes. The FasterTree would create the same treenode as a rootnode itself.
I'm inclined to think that the FastTree is more accurate in this case: if the node says it has a parent, it can't possibly be a rootnode.
Of course this would only matter if not one rootnode was returned, but all rootnodes instead. I guess this would be something like a trunkless tree.
Perhaps things like pathfinding algorithms have this problem?
Niels
|
|
|
|

|
Why do you need to load all the treenodes at once ? I suppose a user doesn't need them all every time he/she uses the prog.
Add a dummy treenode to each visible node, delte the dummy and load the children when the node is opened.
Not only will your loading be faster, but you will also avoid problematic memory usage...
Tom
Core
|
|
|
|

|
I used the tree in a web page. Because reloading the page every time a user opens a node would make it really slow I decided to put the whole thing in layers. And then you need the entire tree...
Also, to replace the dummy child node with the actual children you'd need to scan the data again. I think that would slow it down actually. The only thing I can imagine is that the slowness of the tree will be spread over every clicked node, instead of over the entire tree at once. This might be less noticable to the user, I guess.
Niels
|
|
|
|

|
If the childID is always greater than the parentID, you can simply sort the view on the ID column and then fill the tree, with the first row being the root.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
I agree, but it's not always possible to guarantee that the database will store the nodes in the proper order. Also, by sticking to that rule it would not be possible to move nodes from one parent to another parent if that other parent has a higher ID than the child ID.
Niels
|
|
|
|

|
Hence the starting if of my comment. In my experience, users can rarely move nodes around, they usually only renames, adds or removes them. And as long as the ID column is a generated sequence, you are in business.
Also, I would suggest that you use a IDataReader if you can.
With the code you present, you will have 3 full copies of the data at the end. Using a data reader, you will save one and if pre-sorting the nodes is possible, you will save another.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
With your suggestion in mind I've created a 'SortedTree' without a lookup Hashtable to compare it. It is indeed faster, but with only about 1 or 2 thousands of a second. I expect the memory usage to be lower as well, although very little because, if I'm not mistaken, the FastTree's hashtable only stores very compact references instead of the actual node. Oh, and the SortedTree code is shorter as well.
If the tree is pre-sorted both methods actually work pretty much the same, because the FastTree method will never have to create a unknown parent. I think the speed difference is in the extra loop at the end of the FastTree code to find the root node. The SortedTree doesn't need it because it started with that root node.
To prevent undermining my own article, I also shuffled the xml file a bit. Both the SlowTree and the FastTree still came up with the root node, whereas the SortedTree only came up with 'null reference' exceptions.
So I'll stick to my FastTree. I'm happy to pay 0.002 seconds for reliability Especially when you get the tree data from an external system where you can't control the contents, I don't think that the extra safety is a luxury...
I'll add a note to the article that it also works with messed up trees.
Niels
|
|
|
|

|
1-2 ms for a total of what ? It's the ratio that matters.
The xml file contains only 750 nodes. You can compare BubbleSort and QuickSort for N=10 and call them even, but now do it for N=1000000.
Anyway, your function is already linear in time and space so you won't see any big improvements no matter how hard you try. The SlowTree was in O(N^2), so that's why you got such a speed gain even with a low node count. I did write my comment a bit fast because you are right about the HashTable memory footprint.
For the kick of it, I wrote a quick implementation of the so-called SortedTree. One which uses a DataView and another which uses a IDataReader. I compared both to FastTree with a set of ~400000 nodes generated from the files on my drives. I included the LoadData() method in the benchmark because I sorted the nodes directly on the DB with an ORDER BY clause.
With DataView, I get ~10% speed gain and no significant memory gain. With DataReader, I get ~20% speed gain and ~20% memory gain. Not a lot, but still a nice improvement.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
The 1-2 ms was compared to the FastTree. With the FastTree getting 6-7 ms (it performed a little slower with the same code and data Probably some other process hogging the CPU at the time) that would mean a 30% gain. I didn't test with such huge tree but 30% would be noticable then, I admit.
The DataReader implementation should be faster because it cuts out the middle man. I'd like to note that there is a (small) catch with it. 2 DataReaders on a single connection won't work. So every active DataReader should have it's own exclusive connection. If multiple users execute the same code at the same time you might run out of connections. We had problems with that in our web applications, and decided to write a database handler which returns dataviews to be able to release the connection as soon as possible.
I realize that the FastTree is a bit overkill if the tree data can be sorted. Could you send me your 'SortedTree' implementation? I'd like to compare it to my implementation and then include it in the article, with the FastTree as an alternative for messy tree data.
Niels
|
|
|
|

|
This thread was a very interesting read. In my experience, any tree that could be guaranteed to be sortable, with ParentID's always being lower in value than the ID of their children, evenually could not be so guaranteed. I have written a similar tree population mechanism that initially sorts by partentid and childid, in case I'm lucky, but then keeps track of those nodes that may need to be reattempted to be inserted into the tree after the first pass. I will change this particular part to use your partially populated node hashtable instead because that's a great idea.
The only other thing my population method allows is the text of a node to hold those records that cannot be inserted into the tree because their parent is non-null and is not otherwise in the dataset. This should never happen but not every project gets the same attention to referential integrity, so I like to help my apps help me to discover issues.
Great article.
-Kevin Buchan
|
|
|
|

|
I agree that faster is not always better. Reliability is far more important.
There are two additional problems to contend with when building trees -- one is the possibility of having orphan branches in the data (missing parent), and the second is the possibility of circular references (NodeA -> NodeB -> NodeA). Without testing, I'd bet the "slow" tree method would either blow up with null reference exceptions, or run forever and ever and ever .... On the other hand, both the "fast" and the "sorted" methods have a chance of handling the problems (since they iterate the data just once), but the "fast" method will likely handle them better since each node is not required to have a parent at creation. The "sorted" method expects the parent node to already exist.
2¢
|
|
|
|

|
Orphan branches should lead to a new root since the definition of a root is a node with no parent.
Also, by definition a tree does not contain cycle, otherwise it is a graph which is quite different.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
It all depends what you mean by reliability. Do you want your application to mask potential database integrity failures or to output an error when it encounters corrupted data. Sure thing, if you know your input might need some correction, use whatever means it takes to handle exceptions, but otherwise, assert that the data is well formed (programming by contract).
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
Sorry for the delay ... Here is the code :
public static TreeNode LoadTree(DataView dataTree, string idColumn, string parentColumn, string dataColumn, string sort)
{
if (dataTree == null)
throw new ArgumentNullException("dataTree");
if (dataTree.Table == null)
throw new ArgumentException("No data available (DataView.Table property is null).", "dataTree");
if (idColumn == null)
throw new ArgumentNullException("idColumn");
if (parentColumn == null)
throw new ArgumentNullException("parentColumn");
if (dataColumn == null)
throw new ArgumentNullException("dataColumn");
return LoadTree(dataTree, dataTree.Table.Columns.IndexOf(idColumn), dataTree.Table.Columns.IndexOf(parentColumn), dataTree.Table.Columns.IndexOf(dataColumn), sort);
}
public static TreeNode LoadTree(DataView dataTree, int idColumn, int parentColumn, int dataColumn, string sort)
{
if (dataTree == null)
throw new ArgumentNullException("dataTree");
if (dataTree.Table == null)
throw new ArgumentException("No data available (DataView.Table property is null).", "dataTree");
if (idColumn < 0)
throw new ArgumentOutOfRangeException("idColumn", idColumn, "Must be a positive integer.");
if (parentColumn < 0)
throw new ArgumentOutOfRangeException("parentColumn", parentColumn, "Must be a positive integer.");
if (dataColumn < 0)
throw new ArgumentOutOfRangeException("dataColumn", dataColumn, "Must be a positive integer.");
if (sort != null && sort != "" && dataTree.Sort != sort)
dataTree.Sort = sort;
Hashtable nodeList = null;
TreeNode root = null;
if (dataTree.Count > 0)
{
DataRowView row = dataTree[0];
root = new TreeNode((int) row[idColumn], null);
root.Text = row[dataColumn].ToString();
}
if (dataTree.Count > 1)
{
nodeList = new Hashtable();
nodeList.Add(root.ID, root);
}
for (int i = 1; i < dataTree.Count; i++)
{
DataRowView row = dataTree[i];
int ID = (int) row[idColumn];
object parentID = row[parentColumn];
string data = row[dataColumn].ToString();
TreeNode parentNode = (TreeNode) nodeList[parentID];
if (parentNode == null)
throw new InvalidOperationException(String.Format("Cannot find parent with ID = '{0}'.", parentID));
TreeNode node = new TreeNode((int) ID, parentNode);
node.Text = data.ToString();
nodeList.Add(ID, node);
}
return root;
}
public static TreeNode LoadTree(IDataReader dataTree, string idColumn, string parentColumn, string dataColumn)
{
if (dataTree == null)
throw new ArgumentNullException("dataTree");
if (idColumn == null)
throw new ArgumentNullException("idColumn");
if (parentColumn == null)
throw new ArgumentNullException("parentColumn");
if (dataColumn == null)
throw new ArgumentNullException("dataColumn");
return LoadTree(dataTree, dataTree.GetOrdinal(idColumn), dataTree.GetOrdinal(parentColumn), dataTree.GetOrdinal(dataColumn));
}
public static TreeNode LoadTree(IDataReader dataTree, int idColumn, int parentColumn, int dataColumn)
{
if (dataTree == null)
throw new ArgumentNullException("dataTree");
if (idColumn < 0)
throw new ArgumentOutOfRangeException("idColumn", idColumn, "Must be a positive integer.");
if (parentColumn < 0)
throw new ArgumentOutOfRangeException("parentColumn", parentColumn, "Must be a positive integer.");
if (dataColumn < 0)
throw new ArgumentOutOfRangeException("dataColumn", dataColumn, "Must be a positive integer.");
Hashtable nodeList = new Hashtable();
TreeNode root = null;
if (dataTree.Read())
{
root = new TreeNode((int) dataTree[idColumn], null);
root.Text = dataTree[dataColumn].ToString();
nodeList.Add(root.ID, root);
}
while (dataTree.Read())
{
int ID = (int) dataTree[idColumn];
object parentID = dataTree[parentColumn];
string data = dataTree[dataColumn].ToString();
TreeNode parentNode = (TreeNode) nodeList[parentID];
if (parentNode == null)
throw new InvalidOperationException(String.Format("Cannot find parent with ID = '{0}'.", parentID));
TreeNode node = new TreeNode((int) ID, parentNode);
node.Text = data.ToString();
nodeList.Add(ID, node);
}
return root;
}
If you can obtain the level of the node (using LEVEL keyword in Oracle for example, see http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/sql_elements6a.htm#9547[^]), you can also make further optimizations, like using a Queue instead of a Hashtable, hence minimizing even more the memory requirement.
Sébastien
Intelligence shared is intelligence squared.
Homepage : http://sebastienlorion.com
|
|
|
|

|
What's the point of this crap without a treeview control. This console is crap.
|
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
How a different way of looking at a problem can result in better performance.
| Type | Article |
| Licence | |
| First Posted | 20 Jan 2005 |
| Views | 72,912 |
| Bookmarked | 52 times |
|
|