Click here to Skip to main content
15,860,943 members
Articles / Database Development / SQL Server

Implementing a Tree Structure with Database

Rate me:
Please Sign up or sign in to vote.
4.82/5 (10 votes)
5 Mar 2011CPOL5 min read 77.6K   19   16
This article describes implementing and manipulating a tree structure by means of SQL

Introduction

Tree structures are very useful in implementing hierarchical structures which are helpful for software developers to develop applications which are more realistic and tangible to the customers who will use them. For example, in implementing "departments structures" or "products tree" or "organization charts" with unknown level of depth tree nodes, it's inevitable to use these structures in database.

In this article, we will review stored procedures and functions needed to maintain a tree structure in a database. In this case, we have used Microsoft SQL Server 2008 as our DBMS. We will see how to List, Add, Edit, Delete, Move and Copy nodes in the tree with SQL and how to maintain our tree by removing orphan nodes. These procedures will help programmers to easily maintain tree structures in their Windows or web applications. After having these means, it's up to the programmer to provide a user friendly interface to work with tree structures.

Background

A tree structure consists of multiple nodes that every node has a parent node in such a way that we do not have any ring in our node relationships. We should have at least an Identification field for each node and a pointer to its parent node. It's obvious that root nodes don't have any parent. We could add extra information to each node's data structure regarding the problem we are handling.

Using the Code

Creation

First of all, we should create a Tree table in our database which is named TreeNodes. We have an identification field and a title and a pointer to the parent identification.

SQL
CREATE TABLE [dbo].[TreeNodes](
	[TN_ID] [int] IDENTITY(1,1) NOT NULL,
	[TN_Title] [nvarchar](200) NOT NULL,
	[TN_ParentID] [int] NULL,
 CONSTRAINT [PK_TreeNodes] PRIMARY KEY CLUSTERED
(
	[TN_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, _
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[TreeNodes]  WITH CHECK ADD  CONSTRAINT _
[FK_TreeNodes_TreeNodes] FOREIGN KEY([TN_ParentID])
REFERENCES [dbo].[TreeNodes] ([TN_ID])
GO

ALTER TABLE [dbo].[TreeNodes] CHECK CONSTRAINT [FK_TreeNodes_TreeNodes]
GO

Insertion

Now we should implement our code to easily Add Edit Remove and Move nodes in the tree.

For insertion operation, we have only two conditions:

  1. Nodes that haven't any parent
  2. Nodes that have a parent

So this is our insertion stored procedure:

SQL
CREATE PROCEDURE [dbo].[sp_insertTreeNode]
@parentid int,
@nt nvarchar(200),
@newid int output
AS
insert into TreeNodes (TN_Title, TN_ParentID)
   values(@nt, @parentid)
set @newid = SCOPE_IDENTITY()

We return @newid which is filled by newly inserted identification value, to the caller for further operations on it.

Update

After that, we have Update procedure to update the information in a node:

SQL
CREATE PROCEDURE [dbo].[sp_updateTreeNode]
@tn_id int,
@newTitle nvarchar(200)
AS
update TreeNodes set tn_title = @newTitle
where tn_id = @tn_id      

Deletion

Then, we should be able to delete a node. We should take care of deleting its children firstly otherwise we have orphans in our database. We will delete child nodes recursively.

In recursive calling of stored procedures and user defined functions, there are two obstacles you may encounter. Firstly, you may get maximum nested calls error and secondly you may get global cursor name conflict error if you have used them. I couldn't find a solution to overcome 32 level limit of nested procedure calls in SQL Server 2008 and for solving the latter problem, you may consider using this configuration command:

SQL
alter database DBName set CURSOR_DEFAULT  LOCAL

So here is our delete procedure:

SQL
CREATE PROCEDURE [dbo].[sp_deleteTreeNode]
@tn_id int
AS
if (@@NESTLEVEL=1)
	alter database DBName set CURSOR_DEFAULT  LOCAL

declare @child_count int

select @child_count=  count(*) from treenodes where tn_parentid=@tn_id

if @child_count>0
begin
	--firstly delete child nodes recursively
	declare @child_id int
	DECLARE child_nodes  CURSOR FOR
		SELECT tn_id FROM treenodes
		WHERE tn_parentid = @tn_id

	OPEN child_nodes

	FETCH NEXT FROM child_nodes INTO @child_id

	WHILE @@FETCH_STATUS = 0
	BEGIN
	   exec sp_deleteTreeNode @child_id

	   FETCH NEXT FROM child_nodes INTO @child_id
	END

	CLOSE child_nodes
	DEALLOCATE child_nodes
end

--delete a node without any child
delete from treenodes where tn_id= @tn_id

It first deletes child nodes recursively and then removes the requested node from the tree.

Note that setting CURSOR_DEFAULT to LOCAL prevents errors in recursive stored procedures that have a static cursor name.

You may also consider using cascaded deletes if your DBMS supports and also using triggers on deletion to remove child nodes as an alternative method to direct removing of child nodes.

Move

For moving a node:

SQL
CREATE PROCEDURE [dbo].[sp_moveTreeNode]
@tn_id int,
@newParent int
AS
update TreeNodes set TN_ParentID = @newParent
where tn_id = @tn_id

List

For listing nodes in applications, we should list the childs of a node and then go one level more deep. Retrieving nodes recursively and building the tree in each depth is a very clear method to implement.

SQL
CREATE PROCEDURE [dbo].[sp_listNodes]
@tn_id as int
AS
select * from TreeNodes
where TN_ParentID= @tn_id
order by TN_ParentID

For getting the list of root nodes, we should call it with @tn_id parameter set to null.

Subtree List

Sometimes it's useful to have the list of whole children of a node:

SQL
CREATE FUNCTION [dbo].[fn_listSubTreeNodes]
(
@tn_id int
)
RETURNS
@res TABLE
(
	cn_id int
)
AS
BEGIN
	if @tn_id<1
		RETURN

	insert into @res values(@tn_id);

	declare @child_count int
	select @child_count=  count(*) from treenodes where tn_parentid=@tn_id
	if @child_count>0
	begin
		declare @child_id int
		DECLARE child_nodes  CURSOR FOR
			SELECT tn_id FROM treenodes
			WHERE tn_parentid = @tn_id

		OPEN child_nodes

		FETCH NEXT FROM child_nodes INTO @child_id

		WHILE @@FETCH_STATUS = 0
		BEGIN

			insert @res
			select * from DBName.dbo.fn_listSubTreeNodes(@child_id)

		   FETCH NEXT FROM child_nodes INTO @child_id
		END

		CLOSE child_nodes
		DEALLOCATE child_nodes
	end

	RETURN
END

In this case, we have used table valued functions to accumulate results in a recursive manner. Consider that the observing node is included in the results. You may use @@NESTLEVEL variable to avoid adding the root node to the result set when it is one.

Thanks to Mika Wendelius who noted me on using Common Table Expressions, here is an alternative implementation for listing subtree nodes:

SQL
CREATE PROCEDURE [dbo].[sp_listSubTreeNodes]
@tn_id as int
AS
BEGIN
with subtree(tn_id , tn_title , tn_parentid)
as
(
select * from TreeNodes where TN_ID = @tn_id
union all
select e.TN_ID, e.TN_Title ,e.TN_ParentID from TreeNodes _
as e inner join subtree on subtree.TN_ID=e.TN_ParentID
)
select * from subtree
END

Copy

Some users may want to duplicate some parts of our tree in another part of it so we should have copyTree procedure:

SQL
CREATE PROCEDURE [dbo].[sp_copyTree]
@src_tn_id as int,
@tgt_tn_id as int
AS
declare @newTarget as int

if (@@NESTLEVEL=1)
	alter database DBName set CURSOR_DEFAULT  LOCAL

insert into TreeNodes(TN_ParentID, TN_Title) values _
(@tgt_tn_id, (select tn_title from TreeNodes where TN_ID=@src_tn_id))
set @newTarget = SCOPE_IDENTITY()

declare @child_count int
select @child_count=count(*) from treenodes where tn_parentid=@src_tn_id

if @child_count>0
begin
	declare @child_id int
	DECLARE child_nodes  CURSOR FOR
		SELECT tn_id FROM treenodes
		WHERE tn_parentid = @src_tn_id

	OPEN child_nodes

	FETCH NEXT FROM child_nodes INTO @child_id

	WHILE @@FETCH_STATUS = 0
	BEGIN
	   exec sp_copyTree @child_id , @newTarget
	   FETCH NEXT FROM child_nodes INTO @child_id
	END

	CLOSE child_nodes
	DEALLOCATE child_nodes
end

It starts from the source node and copies all of its children to their corresponding new parents.

Orphanage!

The nodes that their parents do not exist in database are orphans. You may not have orphans by using constraints in your database but if in some cases you felt that there may be some in your database as a result of application failures or lack of constraints, then you should decide on deleting orphan nodes cruelly or making them adopted by a new parent ! If you are the cruel person, here is the solution:

SQL
CREATE PROCEDURE [dbo].[sp_removeOrphans]
AS
declare @orphanCount int

select @orphanCount = COUNT(*) from TreeNodes where _
(TN_ParentID not in (select TN_ID from TreeNodes))  and  (TN_ParentID is not null)
while (@orphanCount >0)
begin
	delete TreeNodes where (TN_ParentID not in _
	(select TN_ID from TreeNodes))  and  (TN_ParentID is not null)
	select @orphanCount = COUNT(*) from TreeNodes where _
	(TN_ParentID not in (select TN_ID from TreeNodes))  _
	and  (TN_ParentID is not null)
end

It just deletes as many orphans as it sees!
But if some day, you wanted to restore them you should just set their root node parent to null or new existing node.

Finally

It needs a user friendly interface to list, add, delete, edit, move or copy nodes in a database tree.

Remember the constraint of this tree is nested procedure calls level which is 32 as a default for SQL Server 2008 and global cursor names which was solved by this command:

SQL
alter database DBName set CURSOR_DEFAULT  LOCAL

And consider that this model is not the most optimum designed model for tree structures but it could inspire some people to use them in their databases.
For other methods of implementing trees, consider the Member 3377871 comment at the end of this page.

Thank you very much.

Points of Interest

I got into trouble when SQL server showed me some errors in executing recursive procedures, but I managed to Google the problem and finally found the solution to overcome conflicts between global cursor names.

Also, it makes every programmer hesitant that nested procedure calls limit is 32 in SQL server 2008 and subsequently makes our tree depth limited to 32 levels.

History

  • 25th February, 2011: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Australia Australia
He is a software programmer since commodore 64 era !

Produced many software products with Delphi and SQL Server.
After developing some projects with ASP.NET and C# now he is an enthusiast programmer of WPF beside designing software systems.

Comments and Discussions

 
GeneralMy vote of 5 Pin
angshuman.dash7-May-13 0:22
angshuman.dash7-May-13 0:22 
GeneralRe: My vote of 5 Pin
Amir Mahfoozi7-May-13 0:52
Amir Mahfoozi7-May-13 0:52 
QuestionWhat can I say you saved my project Pin
santhosh196927-Sep-12 0:41
santhosh196927-Sep-12 0:41 
AnswerRe: What can I say you saved my project Pin
Amir Mahfoozi27-Sep-12 0:59
Amir Mahfoozi27-Sep-12 0:59 
Generalcomparison of different types of implementations of hierarchical data [modified] Pin
Amir Mahfoozi13-Mar-11 9:11
Amir Mahfoozi13-Mar-11 9:11 
These lines are extracted from here :

http://msdn.microsoft.com/en-us/library/bb677173.aspx[^]

When to Use Alternatives to hierarchyid



Parent/Child might be superior when the following conditions exist:




  • The size of the key is very critical. For the same number of nodes, a hierarchyid value is equal to or larger than an integer-family (smallint, int, bigint) value. This is only a reason to use Parent/Child in rare cases, because hierarchyid has significantly better locality of I/O and CPU complexity than the common table expressions required when you are using a Parent/Child structure.

  • Queries rarely query across sections of the hierarchy. In other words if queries usually address only a single point in the hierarchy. In these cases co-location is not important. For example, Parent/Child is superior if the organization table is only used for running payroll for individual employees.

  • Non-leaf subtrees move frequently and performance is very important. In a parent/child representation changing the location of a row in a hierarchy affects a single row. Changing the location of a row in a hierarchyid usage affects n rows, where n is number of nodes in the sub-tree being moved.



Using XML data type can be superior when all the following are true:




  • The complete hierarchy is always stored and retrieved.

  • The data is consumed in XML format by the application.

  • Predicate searches are extremely limited and not performance critical.




Key Properties of hierarchyid



A value of the hierarchyid data type represents a position in a tree hierarchy. Values for hierarchyid have the following properties:


  • Extremely compact

    The average number of bits that are required to represent a node in a tree with n nodes depends on the average fanout (the average number of children of a node). For small fanouts, (0-7) the size is about 6*logAn bits, where A is the average fanout. A node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage.



  • Comparison is in depth-first order

    Given two hierarchyid values a and b, a<b means a comes before b in a depth-first traversal of the tree. Indexes on hierarchyid data types are in depth-first order, and nodes close to each other in a depth-first traversal are stored near each other. For example, the children of a record are stored adjacent to that record.



  • Support for arbitrary insertions and deletions

    By using the GetDescendant method, it is always possible to generate a sibling to the right of any given node, to the left of any given node, or between any two siblings. The comparison property is maintained when an arbitrary number of nodes is inserted or deleted from the hierarchy. Most insertions and deletions preserve the compactness property. However, insertions between two nodes will produce hierarchyid values with a slightly less compact representation.


Limitations of hierarchyid



The hierarchyid data type has the following limitations:


  • A column of type hierarchyid does not automatically represent a tree. It is up to the application to generate and assign hierarchyid values in such a way that the desired relationship between rows is reflected in the values. Some applications might not even want to have a column of type hierarchyid represent a tree. Perhaps the values are references to location in a hierarchy defined in another table.
  • It is up to the application to manage concurrency in generating and assigning hierarchyid values. There is no guarantee that hierarchyid values in a column are unique unless the application uses a unique key constraint or enforces uniqueness itself through its own logic.
  • Hierarchical relationships represented by hierarchyid values are not enforced like a foreign key relationship. It is possible and sometimes appropriate to have a hierarchical relationship where A has a child B, and then A is deleted leaving B with a relationship to a nonexistent record. If this behavior is unacceptable, the application must query for descendants before deleting parents.


modified on Saturday, March 26, 2011 10:31 PM

RantThis is not the way to do a tree structure! Pin
Daniel Parnell26-Feb-11 11:34
Daniel Parnell26-Feb-11 11:34 
GeneralRe: This is not the way to do a tree structure! Pin
Amir Mahfoozi27-Feb-11 5:25
Amir Mahfoozi27-Feb-11 5:25 
GeneralRe: This is not the way to do a tree structure! Pin
Daniel Parnell27-Feb-11 21:18
Daniel Parnell27-Feb-11 21:18 
GeneralRe: This is not the way to do a tree structure! Pin
imgen2-Mar-11 0:20
imgen2-Mar-11 0:20 
GeneralRe: This is not the way to do a tree structure! Pin
Daniel Parnell2-Mar-11 0:37
Daniel Parnell2-Mar-11 0:37 
GeneralRe: This is not the way to do a tree structure! Pin
imgen2-Mar-11 1:38
imgen2-Mar-11 1:38 
GeneralRe: This is not the way to do a tree structure... lets not be too hasty here ;) Pin
Mikie9-Mar-11 1:38
Mikie9-Mar-11 1:38 
GeneralFew questions Pin
Wendelius26-Feb-11 4:43
mentorWendelius26-Feb-11 4:43 
GeneralRe: Few questions Pin
Amir Mahfoozi27-Feb-11 5:15
Amir Mahfoozi27-Feb-11 5:15 
GeneralRe: Few questions Pin
Wendelius4-Mar-11 9:14
mentorWendelius4-Mar-11 9:14 
GeneralRe: Few questions Pin
Amir Mahfoozi5-Mar-11 5:18
Amir Mahfoozi5-Mar-11 5:18 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.