Click here to Skip to main content
Click here to Skip to main content

Implementing a Tree Structure with Database

By , 5 Mar 2011
 

Introduction

Tree structures are very useful in implementing hierarchical structures which are helpful for software developers to develop applications which are more realistic and tangible to the customers who will use them. For example, in implementing "departments structures" or "products tree" or "organization charts" with unknown level of depth tree nodes, it's inevitable to use these structures in database.

In this article, we will review stored procedures and functions needed to maintain a tree structure in a database. In this case, we have used Microsoft SQL Server 2008 as our DBMS. We will see how to List, Add, Edit, Delete, Move and Copy nodes in the tree with SQL and how to maintain our tree by removing orphan nodes. These procedures will help programmers to easily maintain tree structures in their Windows or web applications. After having these means, it's up to the programmer to provide a user friendly interface to work with tree structures.

Background

A tree structure consists of multiple nodes that every node has a parent node in such a way that we do not have any ring in our node relationships. We should have at least an Identification field for each node and a pointer to its parent node. It's obvious that root nodes don't have any parent. We could add extra information to each node's data structure regarding the problem we are handling.

Using the Code

Creation

First of all, we should create a Tree table in our database which is named TreeNodes. We have an identification field and a title and a pointer to the parent identification.

CREATE TABLE [dbo].[TreeNodes](
	[TN_ID] [int] IDENTITY(1,1) NOT NULL,
	[TN_Title] [nvarchar](200) NOT NULL,
	[TN_ParentID] [int] NULL,
 CONSTRAINT [PK_TreeNodes] PRIMARY KEY CLUSTERED
(
	[TN_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, _
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[TreeNodes]  WITH CHECK ADD  CONSTRAINT _
[FK_TreeNodes_TreeNodes] FOREIGN KEY([TN_ParentID])
REFERENCES [dbo].[TreeNodes] ([TN_ID])
GO

ALTER TABLE [dbo].[TreeNodes] CHECK CONSTRAINT [FK_TreeNodes_TreeNodes]
GO

Insertion

Now we should implement our code to easily Add Edit Remove and Move nodes in the tree.

For insertion operation, we have only two conditions:

  1. Nodes that haven't any parent
  2. Nodes that have a parent

So this is our insertion stored procedure:

CREATE PROCEDURE [dbo].[sp_insertTreeNode]
@parentid int,
@nt nvarchar(200),
@newid int output
AS
insert into TreeNodes (TN_Title, TN_ParentID)
   values(@nt, @parentid)
set @newid = SCOPE_IDENTITY()

We return @newid which is filled by newly inserted identification value, to the caller for further operations on it.

Update

After that, we have Update procedure to update the information in a node:

CREATE PROCEDURE [dbo].[sp_updateTreeNode]
@tn_id int,
@newTitle nvarchar(200)
AS
update TreeNodes set tn_title = @newTitle
where tn_id = @tn_id      

Deletion

Then, we should be able to delete a node. We should take care of deleting its children firstly otherwise we have orphans in our database. We will delete child nodes recursively.

In recursive calling of stored procedures and user defined functions, there are two obstacles you may encounter. Firstly, you may get maximum nested calls error and secondly you may get global cursor name conflict error if you have used them. I couldn't find a solution to overcome 32 level limit of nested procedure calls in SQL Server 2008 and for solving the latter problem, you may consider using this configuration command:

alter database DBName set CURSOR_DEFAULT  LOCAL

So here is our delete procedure:

CREATE PROCEDURE [dbo].[sp_deleteTreeNode]
@tn_id int
AS
if (@@NESTLEVEL=1)
	alter database DBName set CURSOR_DEFAULT  LOCAL

declare @child_count int

select @child_count=  count(*) from treenodes where tn_parentid=@tn_id

if @child_count>0
begin
	--firstly delete child nodes recursively
	declare @child_id int
	DECLARE child_nodes  CURSOR FOR
		SELECT tn_id FROM treenodes
		WHERE tn_parentid = @tn_id

	OPEN child_nodes

	FETCH NEXT FROM child_nodes INTO @child_id

	WHILE @@FETCH_STATUS = 0
	BEGIN
	   exec sp_deleteTreeNode @child_id

	   FETCH NEXT FROM child_nodes INTO @child_id
	END

	CLOSE child_nodes
	DEALLOCATE child_nodes
end

--delete a node without any child
delete from treenodes where tn_id= @tn_id

It first deletes child nodes recursively and then removes the requested node from the tree.

Note that setting CURSOR_DEFAULT to LOCAL prevents errors in recursive stored procedures that have a static cursor name.

You may also consider using cascaded deletes if your DBMS supports and also using triggers on deletion to remove child nodes as an alternative method to direct removing of child nodes.

Move

For moving a node:

CREATE PROCEDURE [dbo].[sp_moveTreeNode]
@tn_id int,
@newParent int
AS
update TreeNodes set TN_ParentID = @newParent
where tn_id = @tn_id

List

For listing nodes in applications, we should list the childs of a node and then go one level more deep. Retrieving nodes recursively and building the tree in each depth is a very clear method to implement.

CREATE PROCEDURE [dbo].[sp_listNodes]
@tn_id as int
AS
select * from TreeNodes
where TN_ParentID= @tn_id
order by TN_ParentID

For getting the list of root nodes, we should call it with @tn_id parameter set to null.

Subtree List

Sometimes it's useful to have the list of whole children of a node:

CREATE FUNCTION [dbo].[fn_listSubTreeNodes]
(
@tn_id int
)
RETURNS
@res TABLE
(
	cn_id int
)
AS
BEGIN
	if @tn_id<1
		RETURN

	insert into @res values(@tn_id);

	declare @child_count int
	select @child_count=  count(*) from treenodes where tn_parentid=@tn_id
	if @child_count>0
	begin
		declare @child_id int
		DECLARE child_nodes  CURSOR FOR
			SELECT tn_id FROM treenodes
			WHERE tn_parentid = @tn_id

		OPEN child_nodes

		FETCH NEXT FROM child_nodes INTO @child_id

		WHILE @@FETCH_STATUS = 0
		BEGIN

			insert @res
			select * from DBName.dbo.fn_listSubTreeNodes(@child_id)

		   FETCH NEXT FROM child_nodes INTO @child_id
		END

		CLOSE child_nodes
		DEALLOCATE child_nodes
	end

	RETURN
END

In this case, we have used table valued functions to accumulate results in a recursive manner. Consider that the observing node is included in the results. You may use @@NESTLEVEL variable to avoid adding the root node to the result set when it is one.

Thanks to Mika Wendelius who noted me on using Common Table Expressions, here is an alternative implementation for listing subtree nodes:

CREATE PROCEDURE [dbo].[sp_listSubTreeNodes]
@tn_id as int
AS
BEGIN
with subtree(tn_id , tn_title , tn_parentid)
as
(
select * from TreeNodes where TN_ID = @tn_id
union all
select e.TN_ID, e.TN_Title ,e.TN_ParentID from TreeNodes _
as e inner join subtree on subtree.TN_ID=e.TN_ParentID
)
select * from subtree
END

Copy

Some users may want to duplicate some parts of our tree in another part of it so we should have copyTree procedure:

CREATE PROCEDURE [dbo].[sp_copyTree]
@src_tn_id as int,
@tgt_tn_id as int
AS
declare @newTarget as int

if (@@NESTLEVEL=1)
	alter database DBName set CURSOR_DEFAULT  LOCAL

insert into TreeNodes(TN_ParentID, TN_Title) values _
(@tgt_tn_id, (select tn_title from TreeNodes where TN_ID=@src_tn_id))
set @newTarget = SCOPE_IDENTITY()

declare @child_count int
select @child_count=count(*) from treenodes where tn_parentid=@src_tn_id

if @child_count>0
begin
	declare @child_id int
	DECLARE child_nodes  CURSOR FOR
		SELECT tn_id FROM treenodes
		WHERE tn_parentid = @src_tn_id

	OPEN child_nodes

	FETCH NEXT FROM child_nodes INTO @child_id

	WHILE @@FETCH_STATUS = 0
	BEGIN
	   exec sp_copyTree @child_id , @newTarget
	   FETCH NEXT FROM child_nodes INTO @child_id
	END

	CLOSE child_nodes
	DEALLOCATE child_nodes
end

It starts from the source node and copies all of its children to their corresponding new parents.

Orphanage!

The nodes that their parents do not exist in database are orphans. You may not have orphans by using constraints in your database but if in some cases you felt that there may be some in your database as a result of application failures or lack of constraints, then you should decide on deleting orphan nodes cruelly or making them adopted by a new parent ! If you are the cruel person, here is the solution:

CREATE PROCEDURE [dbo].[sp_removeOrphans]
AS
declare @orphanCount int

select @orphanCount = COUNT(*) from TreeNodes where _
(TN_ParentID not in (select TN_ID from TreeNodes))  and  (TN_ParentID is not null)
while (@orphanCount >0)
begin
	delete TreeNodes where (TN_ParentID not in _
	(select TN_ID from TreeNodes))  and  (TN_ParentID is not null)
	select @orphanCount = COUNT(*) from TreeNodes where _
	(TN_ParentID not in (select TN_ID from TreeNodes))  _
	and  (TN_ParentID is not null)
end

It just deletes as many orphans as it sees!
But if some day, you wanted to restore them you should just set their root node parent to null or new existing node.

Finally

It needs a user friendly interface to list, add, delete, edit, move or copy nodes in a database tree.

Remember the constraint of this tree is nested procedure calls level which is 32 as a default for SQL Server 2008 and global cursor names which was solved by this command:

alter database DBName set CURSOR_DEFAULT  LOCAL

And consider that this model is not the most optimum designed model for tree structures but it could inspire some people to use them in their databases.
For other methods of implementing trees, consider the Member 3377871 comment at the end of this page.

Thank you very much.

Points of Interest

I got into trouble when SQL server showed me some errors in executing recursive procedures, but I managed to Google the problem and finally found the solution to overcome conflicts between global cursor names.

Also, it makes every programmer hesitant that nested procedure calls limit is 32 in SQL server 2008 and subsequently makes our tree depth limited to 32 levels.

History

  • 25th February, 2011: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Amir Mahfoozi
Architect
United States United States
Member
He is a software programmer since commodore 64 era !
 
Produced many software products with Delphi and SQL Server.
After developing some projects with ASP.NET and C# now he is an enthusiast programmer of WPF beside designing software systems.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberangshuman.dash7 May '13 - 0:22 
good...
GeneralRe: My vote of 5memberAmir Mahfoozi7 May '13 - 0:52 
Thank you.
QuestionWhat can I say you saved my projectmembersanthosh196927 Sep '12 - 0:41 
What can I say, you saved my project- Thanks a lot Genius.
AnswerRe: What can I say you saved my projectmemberAmir Mahfoozi27 Sep '12 - 0:59 
Hi Santosh,
It's my pleasure to hear about that. Thank you. Smile | :)
Generalcomparison of different types of implementations of hierarchical data [modified]memberAmir Mahfoozi13 Mar '11 - 9:11 
These lines are extracted from here :
 
http://msdn.microsoft.com/en-us/library/bb677173.aspx[^]
 

When to Use Alternatives to hierarchyid

 

Parent/Child might be superior when the following conditions exist:

 
  • The size of the key is very critical. For the same number of nodes, a hierarchyid value is equal to or larger than an integer-family (smallint, int, bigint) value. This is only a reason to use Parent/Child in rare cases, because hierarchyid has significantly better locality of I/O and CPU complexity than the common table expressions required when you are using a Parent/Child structure.

  • Queries rarely query across sections of the hierarchy. In other words if queries usually address only a single point in the hierarchy. In these cases co-location is not important. For example, Parent/Child is superior if the organization table is only used for running payroll for individual employees.
  • Non-leaf subtrees move frequently and performance is very important. In a parent/child representation changing the location of a row in a hierarchy affects a single row. Changing the location of a row in a hierarchyid usage affects n rows, where n is number of nodes in the sub-tree being moved.
 

Using XML data type can be superior when all the following are true:

 
  • The complete hierarchy is always stored and retrieved.
  • The data is consumed in XML format by the application.
  • Predicate searches are extremely limited and not performance critical.
 

Key Properties of hierarchyid

 
A value of the hierarchyid data type represents a position in a tree hierarchy. Values for hierarchyid have the following properties:
 
  • Extremely compact

    The average number of bits that are required to represent a node in a tree with n nodes depends on the average fanout (the average number of children of a node). For small fanouts, (0-7) the size is about 6*logAn bits, where A is the average fanout. A node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage.


  •  
  • Comparison is in depth-first order

    Given two hierarchyid values a and b, a<b means a comes before b in a depth-first traversal of the tree. Indexes on hierarchyid data types are in depth-first order, and nodes close to each other in a depth-first traversal are stored near each other. For example, the children of a record are stored adjacent to that record.


  •  
  • Support for arbitrary insertions and deletions

    By using the GetDescendant method, it is always possible to generate a sibling to the right of any given node, to the left of any given node, or between any two siblings. The comparison property is maintained when an arbitrary number of nodes is inserted or deleted from the hierarchy. Most insertions and deletions preserve the compactness property. However, insertions between two nodes will produce hierarchyid values with a slightly less compact representation.
 

Limitations of hierarchyid

 
The hierarchyid data type has the following limitations:
 
  • A column of type hierarchyid does not automatically represent a tree. It is up to the application to generate and assign hierarchyid values in such a way that the desired relationship between rows is reflected in the values. Some applications might not even want to have a column of type hierarchyid represent a tree. Perhaps the values are references to location in a hierarchy defined in another table.
  •  
  • It is up to the application to manage concurrency in generating and assigning hierarchyid values. There is no guarantee that hierarchyid values in a column are unique unless the application uses a unique key constraint or enforces uniqueness itself through its own logic.
  •  
  • Hierarchical relationships represented by hierarchyid values are not enforced like a foreign key relationship. It is possible and sometimes appropriate to have a hierarchical relationship where A has a child B, and then A is deleted leaving B with a relationship to a nonexistent record. If this behavior is unacceptable, the application must query for descendants before deleting parents.

modified on Saturday, March 26, 2011 10:31 PM

RantThis is not the way to do a tree structure!memberMember 337787126 Feb '11 - 11:34 
I don't know how many times I've had to deal with tree structures implemented like this. I know I made this same mistake when I was starting out with databases. On the surface it looks like the way to do it, but there are several much better ways to do it. The problem with this approach is that traversing the tree is a very expensive operation. If the tree is only 2 or 3 levels deep this may not be a problem, but as soon as the tree starts to get deep it does not scale well (I remember getting yelled at by a DBA for doing this Wink | ;) ).
 
One of the easiest ways to implement a tree structure that is easy to traverse is to use a structure as follows:
 
create table my_tree (
  id int IDENTITY(1,1) NOT NULL,
  path VARCHAR(200) not null,
  title nvarchar(200) not null
)
 
We make a trade off between disk space and access speed. This would allow us to easily store a tree with 50 levels or more (make the path column longer to allow deeper trees).
 

idpathtitle
1/this is the root
2/1this is the first item under the root
3/1another item under the root
4/1/2an item under the first item under the root
5/1/2/4a deeper item
 
To get the whole tree you can then do the following:
 
select * from my_tree order by  path+'/'+cast(id as varchar)
 
The expression in the order by could be eliminated if the path were updated to contain the full path to the node rather than the path to the parent. To do this nicely we would need to pre-allocate the node ID or use some other method for identifying the nodes that does not relate to the ID column.
 
To get a sub-tree:
 
select * from my_tree where path like '/1/2/%' order by  path+'/'+cast(id as varchar)
 
Similarly deleting a sub-tree becomes:
 
delete from my_tree where path like '/1/2/%'
 
Moving items around in the tree becomes a little more complicated.
 
update my_tree set path='/1/3/'+substring(path, 6, 1000) where path like '/1/2/%'
 
There are a couple of even more efficient ways of implementing trees, but they are a lot more complicated to implement.
See here an example Smile | :)
GeneralRe: This is not the way to do a tree structure!memberAmir Mahfoozi27 Feb '11 - 5:25 
Of course I didn't tell that this is the best way to implement trees.
And you know that there are so many ways to consider and implement regarding to our problem.
 
For example your mentioned method is not capable of implementing jungles which have multiple root nodes
but with some assumptions or some manipulations on the way each node path is stored it could handle jungles.
 
All in all, I have added your comment to the revised version of the text
 
Thank you for your comment Wink | ;)

modified on Sunday, February 27, 2011 1:06 PM

GeneralRe: This is not the way to do a tree structure!memberDaniel Parnell27 Feb '11 - 21:18 
The main issue I have is that people starting out with programming will create tree structures using the approach. I am caused daily pain by this structure along with several other database design issues in my day job Wink | ;)
GeneralRe: This is not the way to do a tree structure!memberimgen2 Mar '11 - 0:20 
I'm a little confused here.
select * from my_tree where path like '/1/2%' order by  path+'/'+cast(id as varchar)
what this expression does? Finding the tree of root node with the ID 1? But what does the '2' do here? But anyway, this is an interesting idea. Definitely better in performance compared to the method used in the article. Thx.
GeneralRe: This is not the way to do a tree structure!memberDaniel Parnell2 Mar '11 - 0:37 
The path in the tree is just a string. The expression '/1/2%' is using the SQL LIKE operation to search for any path that starts with '/1/2'. There is a bit of an error there as it should be '/1/2/%' because '/1/20' also matches '/1/2%' Wink | ;)
GeneralRe: This is not the way to do a tree structure!memberimgen2 Mar '11 - 1:38 
Cool. Thanks. That's what I thought (of the error). And the link you presented is great, although needs a little rethinkingSmile | :) .
GeneralRe: This is not the way to do a tree structure... lets not be too hasty here ;)memberMikie9 Mar '11 - 1:38 
While your suggestion *may* have some advantages when it comes to scaling it does not have anywhere near the same rigor and referential integrity that the example in the original article had. The original with its parent/child relationship is far more likely to maintain database integrity should anyone try deleting or renaming parts of the tree they shouldn't whereas your solution appears to just ignore this problem at a database level and leaves it to client code or using the supplied stored procedures to ensure that the right changes are applied. For example, if I were to implement some code to delete the "/1/2" node in your solution I'd end up with orphan nodes whereas the original is far less likely to allow this to happen.
 
Every solution is different but I'd generally always favor integrity - in the database - over speed.
 
I've worked on at least one solution that uses built-in functions/stored procedures to help with the problem of traversing the structures and return the full path for any given record (by iterating parent records etc). This solution significantly outperformed what you've proposed here with the added benefits of using far less disk space when scaled up to millions of records.
GeneralFew questionsmemberMika Wendelius26 Feb '11 - 4:43 
Hi,
 
Trees are very useful in several situations and your article covers part of the basic operations used with trees. At this point I have few questions concerning the article:
- Is there a reason why you haven't enforced constraints between nodes and let the SQL Server handle for example deletions?
- Orphans shouldn't be allowed at all (this goes back to constraints)
- Why use recursive iterations in procedures instead of using recursive SQL statements (CTE)?
The need to optimize rises from a bad design.My articles[^]

GeneralRe: Few questionsmemberAmir Mahfoozi27 Feb '11 - 5:15 
Hi Mika Smile | :)
 
Of course these are basic operations related to a tree structure and there may be other operations that weren't mentioned above.
Table definition should had a foreign key constraint and I had missed to add it.
Also I got some errors after adding delete triggers which was caused by self referencing foreign keys Confused | :confused: I didn't try to get around the problem but If someone tell me the reason it would be appreciated.
 
After adding the foreign key constraint there shouldn't be any orphans in the table.
I couldn't apply cascading delete to the foreign key. Is it related to self referenced foreign key ? If we had cascading delete for the foreign key, Of course the need for recursive deleting would dissolve.
 
And despite the fact that there are so many works to do to have well maintained database tree, I was intended to provide the basic tree operations with SQL statements which are available in most common DBMSs or at least could be deployed to other SQL platforms with a little change.
 
And about CTEs, you are right. I could use them in implementing ListSubTreeNodes and I will add as an alternation to it ASAP.
But when we want to do a recursive operation and we want to do something in each step I think it's not a good Idea to use CTEs because CTEs do the whole recursion operation at once and provide the final result but for example in deleting we want to traverse to the leafs and then return back and delete parent nodes.
 
Finally thanks for you valuable notes on the article.
GeneralRe: Few questionsmemberMika Wendelius4 Mar '11 - 9:14 
Hi Amir,
 
Amir Mahfoozi wrote:
Also I got some errors after adding delete triggers which was caused by self
referencing foreign keys Confused | :confused: I didn't try to get around the
problem but If someone tell me the reason it would be appreciated.


 
Amir Mahfoozi wrote:
I couldn't apply cascading delete to the foreign key. Is it related to self
referenced foreign key ? If we had cascading delete for the foreign key, Of
course the need for recursive deleting would dissolve.


 
Most likely the problem you're encountering is this: http://support.microsoft.com/kb/321843[^]
 
I believe that this could be an interesting article for you: http://technet.microsoft.com/en-us/library/bb677173.aspx[^]
 
Best regards Smile | :)
 
mika
The need to optimize rises from a bad design.My articles[^]

GeneralRe: Few questionsmemberAmir Mahfoozi5 Mar '11 - 5:18 
Hi Mika
 
Very nice and useful link you have sent.
Despite the means that MS have provided for tree data structures ,everybody should think and design the best data structure for each individual problem. Some of them work best in some situations and not well in some other situations.
May be someday someone invent a tree structure that could be definitely used in any situation without any doubt about its performance.
For current variations of representing tree structures take a look this:
http://en.wikipedia.org/wiki/Tree_structure#Representing_trees
May be some day you or someone else add a new method to them. Smile | :)

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 5 Mar 2011
Article Copyright 2011 by Amir Mahfoozi
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid