Click here to Skip to main content
15,860,943 members
Articles / Database Development / SQL Server
Article

Improve hierarchy performance using nested sets

Rate me:
Please Sign up or sign in to vote.
4.36/5 (21 votes)
17 May 20033 min read 248.9K   1.3K   62   23
This article describes how to use nested sets to improve performance of hierarchies within SQL Server and other relational databases

Introduction

This article and accompanying code demonstrates the theory and practice behind "Nested Sets".  The theory is applicable with any type hierarchy in any database, however the example code was written using SQL Server 2000 and every attempt has been made to make it compatible with most major SQL providers.

The Theory

Nested sets rely on a demoralization of parts of the hierarchy, They can be used to efficiently use a hierarchy without relying on the Parent / Child link.  How a person usually views or represents a hierarchy is by using parent child relationship, this can be seen below:

 Image 1

The above hierarchy shows a make-believe company organisational structure, this data can be compiled into a database table and represented very easily, the SQL Script below shows how to create the table and populate the data:

SQL
-- Create the SQL Table
CREATE TABLE Employee
(
  EmployeeID  INT  IDENTITY(1, 1) NOT NULL,
  ParentID    INT NULL,
  JobTitle    VARCHAR(50) NOT NULL,
  FirstName   VARCHAR(50) NOT NULL, 
  PRIMARY KEY(EmployeeID),
  FOREIGN KEY ParentID REFERENCES Employee (EmployeeID)
)

GO

And populate the data:

SQL
-- Set identity insert ON so we can insert the employee ID
SET IDENTITY_INSERT Employee ON 
SET NOCOUNT ON

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (1, NULL, 'Managing Director', 'Bill')
-- Employees under Bill
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (2, 1, 'Customer Services', 'Angela')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (3, 1, 'Development Manager', 'Ben')
-- Employess under angela
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (4, 2, 'Assistant 1', 'Henry')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (5, 2, 'Assistant 2', 'Nicola')
-- Employees under ben
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (6, 3, 'Snr Developer', 'Kerry')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (7, 3, 'Assistant', 'James')
-- Emplyees under kerry
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (8, 6, 'Jrn Developer', 'Tim')
SET NOCOUNT OFF

SET IDENTITY_INSERT Employee OFF

The above SQL code will create the table needed to represent the above hierarchy.  Most hierarchy are developed using the above ParentID method, and usually it is enough to get by with especially if the hierarchy has a fixed amount of levels.  However if a large part of an application is based around the hierarchy and the hierarchy is dynamic, then the above method becomes less efficient and very costly to use.  Various common hierarchy operations become difficult.  For instance finding the leaf node of the hierarchy using the ParentID becomes very difficult and can not easily be retrieved with a simple SELECT statement.  To find any parent of a child at any level of the hierarchy you must use the ParentID of each node, this is also difficult and inefficient to retrieve with a simple select statement. 

This is where nested sets become infinitely useful.  If we were to represent the above hierarchy slightly differently we can gain a lot of advantages.  Below is the same hierarchy represented as nested sets:

 Image 2

Although at first this is slightly more confusing to read visually, to a computer system this is a cinch.  The above diagram is indicating that each node is aware of all its descendents, and vice versa.  For example, we can see easily enough from the diagram that Ben and Angela are children of Bill because they are contained within Bill.  We can also see that Tim And James are children of Ben because they are contained within Ben. This ultimately gives us a link between each member of the hierarchy.

For us to represent this is data terms we need to store two more values per node, these are the extent values of the set. These are shown below on the same diagram in blue and red.

 Image 3

Lets call these Left (Red Values) and Right (Blue Values)These values should be stored on the Employee table along with the ParentID and this is all we need to fully represent a nested sets hierarchy.

SQL
-- Alter the employee table to include the left and right extent values
ALTER TABLE Employee ADD [LeftExtent] INT NULL
ALTER TABLE Employee ADD [RightExtent] INT NULL

GO

By using the Left and Right values of the database you can pick out some simple rules and simple queries:

SQL
-- If Left is between parent left and parent right then the node is a child
SELECT * FROM Employee 
    CROSS JOIN (
                       SELECT [LeftExtent], [RightExtent] 
                       FROM Employee WHERE Firstname = 'Ben'
        ) AS Parent
WHERE Employee.[LeftExtent] 
BETWEEN Parent.[LeftExtent] and Parent.[RightExtent]

-- All Leaf nodes (an item with any children) can be 
-- identified by Left = Right - 1
SELECT * FROM Employee WHERE [LeftExtent] = [RightExtent] - 1

More advanced queries can be used, but before these are explained the Left and Right values must be populated.  Below is the SQL code used to to populate the Left and Right values, the explanation for how it work is contained in the comments:

SQL
-- Create the stack table
DECLARE @tmpStack TABLE 
  (
    ID INT IDENTITY(1, 1), 
    EmployeeID INT NOT NULL, 
    LeftExtent INT NULL, 
    RightExtent INT NULL
  )

-- what we do is start from a parent, in this instance
-- we want to start from the very top which is the NULL parent
DECLARE @parentid AS INTEGER
SET @parentid = NULL 

-- our counter is the variable used to set the left and right extent
DECLARE @counter AS INTEGER
SET @counter = 1

-- what we do is go check the parentid for children, if it 
-- has children we push
-- it into the stack and move on to the next child
DECLARE @id AS INTEGER
DECLARE @oldid AS INTEGER

SET NOCOUNT ON

WHILE 1 = 1
BEGIN
SET @id = NULL

  -- get the first row which is a child of @partentid, which has not already
  -- been pushed onto the stack
  INSERT INTO @tmpStack
    (
      EmployeeID, 
      LeftExtent
    )
  SELECT TOP 1 EmployeeID, @counter 
  FROM Employee 
  WHERE ISNULL(ParentID, 0) = ISNULL(@parentid,0) 
  AND EmployeeID NOT IN (SELECT EmployeeID FROM @tmpStack)



  -- just check we are getting the right identity value
  -- this value does not get reset once its reas so we 
  -- need to see if its the same as the older one
  SET @id = SCOPE_IDENTITY()
  IF ISNULL(@id, 0) = ISNULL(@oldid, 0)
    SET @id = NULL
    ELSE
  BEGIN
    SET @oldid = @id
    SET @counter = @counter + 1
  END


  -- if we have children (i.e if this operation inserted a row, 
  -- then we carry on
  IF @id IS NULL
  BEGIN
    -- no rows left, which means we want to pop the last item
    SELECT TOP 1 @id = ID FROM @tmpStack
    WHERE RightExtent IS NULL ORDER BY ID DESC
    
    -- there are no items left to pop, so exit the procedure
    IF @id IS NULL
      BREAK

    UPDATE @tmpStack
    SET RightExtent = @counter
    WHERE ID = @id

    SET @counter = @counter + 1 

    SELECT @parentid = ParentID 
    FROM Employee WHERE EmployeeID = 
      (SELECT EmployeeID FROM @tmpStack WHERE ID = @id)
  END
  ELSE
  BEGIN
    -- this is easy enough, move on to the next level
    -- we want the parent id of the item just inserted 
    SELECT @parentid = EmployeeID FROM @tmpStack WHERE ID = @id
  END
END

-- And update all the Items in Hier_Dept
-- in the original hierarchy table
UPDATE e
SET LeftExtent = ts.LeftExtent,
RightExtent = ts.RightExtent
FROM Employee e
INNER JOIN @tmpStack ts
ON ts.EmployeeID = e.EmployeeID

GO

The logic to the above code is simple enough, if you imagine the hierarchy as its original structure, the left and right values need to be numbered as if you were tracing a line around the outside of the tree e.g.:

Image 4

Below are some more examples or using the left and right values to query the data:

SQL
-- Which level of the hierarchy are the employees on
SELECT COUNT(P2.EmployeeID) AS level, 
  P1.FirstName,
  P1.LeftExtent ,
  P1.RightExtent
FROM Employee AS P1
INNER JOIN Employee AS P2 
  ON (P1.LeftExtent BETWEEN P2.LeftExtent AND P2.RightExtent)
GROUP BY P1.FirstName, P1.LeftExtent, P1.RightExtent
ORDER BY Level ASC

-- counting the employees of a manager (children of a node)
SELECT p1.FirstName,
COUNT(p2.EmployeeID) Employees
FROM Employee p1
INNER JOIN Employee p2
ON (p2.LeftExtent BETWEEN p1.LeftExtent AND p1.RightExtent 
    AND p2.LeftExtent <> p1.LeftExtent)
GROUP BY p1.FirstName
ORDER BY Employees DESC 

That's about it.  Nested sets in a nut-shell.  I hope you have found the code useful and interesting.  If you need more detailed information about the theory there are plenty of SQL websites around which will have some information on this subject.  Any corrections please don't hesitate to email me!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionField LeftExtent Pin
jonny_g715-Apr-13 8:42
jonny_g715-Apr-13 8:42 
Generalnot only insertion but also moving nodes Pin
Alejandro Xalabarder2-May-10 0:09
Alejandro Xalabarder2-May-10 0:09 
Generalfloating-point Pin
Scoby925-Aug-07 20:23
Scoby925-Aug-07 20:23 
GeneralRe: floating-point Pin
jjdeluxe2-Nov-07 8:21
jjdeluxe2-Nov-07 8:21 
GeneralRe: floating-point Pin
Alejandro Xalabarder2-May-10 2:17
Alejandro Xalabarder2-May-10 2:17 
GeneralGreat in theory but... Pin
DaveSadler26-Mar-06 17:35
DaveSadler26-Mar-06 17:35 
AnswerRe: Great in theory but... Pin
Jeremy Tierman7-Feb-08 15:56
Jeremy Tierman7-Feb-08 15:56 
GeneralRe: Great in theory but... Pin
Rajeev Jayaram9-Jul-12 5:54
Rajeev Jayaram9-Jul-12 5:54 
GeneralError in Syntax Pin
seshukanuri18-Dec-05 17:11
seshukanuri18-Dec-05 17:11 
GeneralPerformance Pin
Paul Odeon7-May-05 6:21
sussPaul Odeon7-May-05 6:21 
QuestionHow to insert new records into set's? Pin
Ishigava5-Aug-03 23:15
Ishigava5-Aug-03 23:15 
AnswerRe: How to insert new records into set's? Pin
Hugo Hallman2-Jan-04 3:15
Hugo Hallman2-Jan-04 3:15 
GeneralRe: How to insert new records into set's? Pin
Mike Osky11-Jan-04 12:49
Mike Osky11-Jan-04 12:49 
AnswerRe: How to insert new records into set's? Pin
Keith Bogart27-Aug-04 8:57
Keith Bogart27-Aug-04 8:57 
GeneralGood Joe's idea :-) Pin
oleg_cherkasenko24-May-03 1:27
oleg_cherkasenko24-May-03 1:27 
GeneralWould be shameful if this wasn't well-known. Pin
mrluc26-Oct-04 7:13
mrluc26-Oct-04 7:13 
GeneralRe: Would be shameful if this wasn't well-known. Pin
Anonymous3-May-05 11:21
Anonymous3-May-05 11:21 
GeneralRe: Would be shameful if this wasn't well-known. Pin
James Simpson25-Jul-05 8:11
James Simpson25-Jul-05 8:11 
My intention was never to claim any intellectual rights over this theory I simply wanted to make codeproject user aware it existed. I never claimed I created the concept. I wanted to bring the theory onto codeproject with some example code. You are right, I could have simply created an article with a link to other articles about this subject but I wanted to write my own as an attempt to better my article writing. I have not simply ripped of any specific article, I have written my own article based on my understanding on the concept as explained to me by a fellow developer.

I was trying to share this idea rather than claim ownership of it.

James Simpson
Web Developer
imebgo@hotmail.com

P S - This is what part of the alphabet would look like if Q and R were eliminated
Mitch Hedberg

GeneralRe: Would be shameful if this wasn't well-known. Pin
Anonymous2-Oct-05 13:17
Anonymous2-Oct-05 13:17 
GeneralRe: James you did a great job Pin
josephsj18-Nov-08 23:33
josephsj18-Nov-08 23:33 
GeneralRe: Would be shameful if this wasn't well-known. Pin
Jeff Moden3-Jun-12 14:14
Jeff Moden3-Jun-12 14:14 
GeneralRe: Would be shameful if this wasn't well-known. Pin
Member 432945423-Dec-14 8:02
Member 432945423-Dec-14 8:02 
GeneralRe: Would be shameful if this wasn't well-known. Pin
Jeff Moden25-Sep-17 9:49
Jeff Moden25-Sep-17 9:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.