Click here to Skip to main content
11,578,524 members (60,685 online)
Click here to Skip to main content

Expanding SQL Hierarchies: The Dualistic Approach

, 18 Oct 2006 MIT 17.3K 15
Rate this:
Please Sign up or sign in to vote.
This article discusses a method for retrieveing hierarchical data from a relational database

Introduction

Yes, this is the same old story of hierarchical structures stored in relational databases. Webmail content, company hierarchy, task dependencies - you name it. I came across this issue when I was designing a mini-forum component for a web portal and could not find a solution in the net. So, here is my approach. I used SQL Server 2000 and 2005.

Background

Suppose you have a table that stores some entities (messages, employees, etc.). Every entity instance can be a parent for many other instances. Some instances do not have children - they are leaf nodes in the tree hierarchy. Some instances do not have parents - they are root nodes. Consider a simple table:

create table Messages
(
    [MessageID] [uniqueidentifier] primary key clustered NOT NULL ,
    [ParentID] [uniqueidentifier] NULL ,
    [ReceivedDate] [datetime] NOT NULL,
    [Subject] varchar(256) NULL,
    CONSTRAINT [FK_Messages_Messages] FOREIGN KEY ([ParentID]) <BR>    REFERENCES Messages ([MessageID])
)

Yes, primary key is a GUID, so you won't be tempted to use it for ordering and sorting purposes. No, there is no "NestedLevel" field that holds generation number. Yes, MessageID is a clustered primary key, so records are stored in the database in random order. Let's not cheat. Parent-child relations are defined by a pair of GUIDs, and all sorting is performed using datetime field. That's it. Subject field is just a cosmetic feature.

Let's populate our table with some meaningful data:

insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('F24B7175-1905-4783-A8D6-0DA6F1759F2C', NULL, getdate(), 'Message 1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('E10B2FC6-E860-4BFF-8642-29A318CDB6CE',  <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 1, getdate()),  <BR>'Message 1-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('E931AC59-4356-48AD-A587-9F0246532733',  <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 2, getdate()),  <BR>'Message 1-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('FAEF6F5B-54AF-452F-A72E-3DBD21F643FD',  <BR>'E931AC59-4356-48AD-A587-9F0246532733', dateadd(day, 2, getdate()),  <BR>'Message 1-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('5B1D05CE-7889-4BD9-8ABC-20A1269668F5',  <BR>'FAEF6F5B-54AF-452F-A72E-3DBD21F643FD', dateadd(day, 2, getdate()),  <BR>'Message 1-2-1-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('A3E5F551-6531-4D12-8A44-7C45C81C181A',  <BR>'FAEF6F5B-54AF-452F-A72E-3DBD21F643FD', dateadd(day, 3, getdate()),  <BR>'Message 1-2-1-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('298D6A02-5BB6-4EDD-8030-7E15CD16A021',  <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 3, getdate()),  <BR>'Message 1-2-1-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('CBCE69FF-19C1-47E7-842A-4944BA146BAA',  <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 4, getdate()),  <BR>'Message 1-2-1-2-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('3C521C3A-AA40-49D0-83FA-E5101913A0E3',  <BR>'CBCE69FF-19C1-47E7-842A-4944BA146BAA', dateadd(day, 4, getdate()),  <BR>'Message 1-2-1-2-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('50D2CA2B-A484-482F-A41F-ADFF1A636BE3',  <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 5, getdate()),  <BR>'Message 1-2-1-2-3')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('71AD6FBD-EE92-473A-A038-C59CB7FF658A',  <BR>'E931AC59-4356-48AD-A587-9F0246532733', dateadd(day, 3, getdate()),  <BR>'Message 1-2-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('C300611C-D8EF-419B-8736-97AAE62108D8',  <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 3, getdate()),  <BR>'Message 1-3')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('A4E8E001-99F1-450F-A4D7-3E0A491A085E',  <BR>'C300611C-D8EF-419B-8736-97AAE62108D8', dateadd(day, 3, getdate()),  <BR>'Message 1-3-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('CD7F8A67-B735-4590-A92C-D91F0341DD7F',  <BR>'C300611C-D8EF-419B-8736-97AAE62108D8', dateadd(day, 4, getdate()), <BR>'Message 1-3-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('31A05919-1BCB-403E-807A-56B50B776C4E', NULL, getdate(), 'Message 2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject)  <BR>values ('A731F5DC-88F7-49E7-BCBB-F912C69BFC7C',  <BR>'31A05919-1BCB-403E-807A-56B50B776C4E', dateadd(day, 1, getdate()),  <BR>'Message 2-1')
        

We are on a quest to retrieve these records so they can be easily presented, say, on a webmail application page:

Id                                   Indent      Subject             
------------------------------------ ----------- -------------------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1           Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2           Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2           Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3           Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4           Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4           Message 1-2-1-2
298D6A02-5BB6-4EDD-8030-7E15CD16A021 5           Message 1-2-1-2-1
CBCE69FF-19C1-47E7-842A-4944BA146BAA 5           Message 1-2-1-2-2
3C521C3A-AA40-49D0-83FA-E5101913A0E3 6           Message 1-2-1-2-2-1
50D2CA2B-A484-482F-A41F-ADFF1A636BE3 5           Message 1-2-1-2-3
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3           Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2           Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3           Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3           Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1           Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2           Message 2-1
        

Do you want it fast?

Search the net: it's full of samples that use cursors, recursion and stack tables. They work. But I want it to work fast. I love joins. SQL Server is good at making joins, let's use them.

We can come up with some SQL code that links all messages on the first few levels.

-- Retrieve the whole hierarchy from level 1 to level 4
-- Temp table @t: store UIDs of 3 parent 
declare @t table (
l1 uniqueidentifier, l2 uniqueidentifier, l3 uniqueidentifier,  <BR>l4 uniqueidentifier,
id uniqueidentifier, depth int)

-- Populate @t: expand msg hierarchy for levels 1-4
insert into @t
select m1.MessageID, m2.MessageID, m3.MessageID, m4.MessageID, null, 1
from Messages as m1
left outer join Messages m2 on m1.MessageID=m2.ParentID
left outer join Messages m3 on m2.MessageID=m3.ParentID
left outer join Messages m4 on m3.MessageID=m4.ParentID
where m1.ParentID is NULL

-- Calculate node level for each node and get tree depth
declare @depth int
update @t set depth = depth + 1 where l2 is not null
update @t set depth = depth + 1 where l3 is not null
update @t set depth = depth + 1 where l4 is not null
select @depth = max(depth) from @t

-- Since we have made several joins, we have only leaf nodes of level 4 in  <BR>-- @t. Add missing leaf nodes of level 1
insert into @t select distinct l1, NULL, NULL, NULL, NULL, 1 from @t <BR>where l2 is not NULL
-- Add missing leaf nodes of level 2
insert into @t select distinct l1, l2, NULL, NULL, NULL, 2 from @t<BR>where l3 is not NULL
-- Add missing leaf nodes of level 3
insert into @t select distinct l1, l2, l3, NULL, NULL, 3 from @t <BR>where l4 is not NULL

-- Populate id field, get the rightmost msg id from @t
update @t set id=coalesce(l4, l3, l2, l1)

-- Final select: order items
select id, depth, m.Subject
from @t as t
left outer join Messages as m1 on m1.MessageID=t.l1
left outer join Messages as m2 on m2.MessageID=t.l2
left outer join Messages as m3 on m3.MessageID=t.l3
left outer join Messages as m4 on m4.MessageID=t.l4
left outer join Messages as m on m.MessageID=t.id
order by m1.ReceivedDate, l1, m2.ReceivedDate, l2, m3.ReceivedDate, l3, <BR>      m4.ReceivedDate, l4
        

It seems to work fine. Look at the results, all levels from 1 to 4 are displayed in the proper order:

id                                   depth       Subject         
------------------------------------ ----------- -----------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1           Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2           Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2           Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3           Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4           Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4           Message 1-2-1-2
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3           Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2           Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3           Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3           Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1           Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2           Message 2-1        

Do you want it big?

Major problem: the number of levels this code can handle is hardcoded. Looks like it's time to pull out the ace from your sleeve and write a simple code generator that builds SQL code that makes ten, twenty, thirty joins and brings you the whole hierarchy in one shot. That's what I did. And it worked. But:

  • only for modest amounts of data and depths; 30-messages-deep forum thread is not something unusual, and 20 joins is a lot of work for SQL Server already;
  • only for limited depths: the query that used 19 joins didn't fit into varchar(8000) variable and could not be executed.

I am not even going to publish this code here, believe me: it works, but it looks really scary. The third reason for not using code generator was ... well, you may call it religious: I just hate dynamic SQL. Every time I have to put exec(@sql) call I have a feeling that I unleash a small beast that eventually will make the programmer's life miserable.

Do you want it to work?

Time to remember about cursors and recursion. I have implemented a recursive function because SQL Server allows table-valued functions and the code looks smooth. Your SQL dialect may offer other opportunities for recursion, so do not hesitate to try them. The idea is simple: pull out a few (four) hierarchy levels in one shot, analyze, drill down if needed using cursor and recursion.

CREATE FUNCTION BrowseMessages
( @rootMessageID uniqueidentifier,
  @rootDepth int,
  @isDrillDown bit )
RETURNS @RtnValue table (Id uniqueidentifier, Depth int, <BR>        Subject varchar(256))
AS
begin
    declare @t table (
    l1 uniqueidentifier, l2 uniqueidentifier, l3 uniqueidentifier, <BR>    l4 uniqueidentifier,
    id uniqueidentifier, depth int)
    insert into @t
    select
    m1.MessageID, m2.MessageID, m3.MessageID, m4.MessageID, null, 1
    from Messages as m1
    left outer join Messages m2 on m1.MessageID=m2.ParentID
    left outer join Messages m3 on m2.MessageID=m3.ParentID
    left outer join Messages m4 on m3.MessageID=m4.ParentID
    -- Add your filtering and paging here
    where (@rootMessageID is NULL and m1.ParentID is NULL) or <BR>          (@rootMessageID is NOT NULL and m1.ParentID=@rootMessageID)
    
    -- Get tree depth
    declare @depth int
    update @t set depth = depth + 1 where l2 is not null
    update @t set depth = depth + 1 where l3 is not null
    update @t set depth = depth + 1 where l4 is not null
    select @depth = max(depth) from @t
    
    -- Add missing leaf nodes of levels 1,2,3
    insert into @t select distinct l1, NULL, NULL,  NULL, NULL, 1 from @t <BR>    where l2 is not NULL
    insert into @t select distinct l1, l2, NULL, NULL, NULL, 2 from @t <BR>    where l3 is not NULL
    insert into @t select distinct l1, l2, l3, NULL, NULL, 3 from @t <BR>    where l4 is not NULL
    
    -- Populate id field
    update @t set id=coalesce(l4, l3, l2, l1)
    
    -- Collect result nodes in @tt and:
    -- - add @rootDepth
    -- - order them
    declare @tt table (Id uniqueidentifier, Depth int, Subject varchar(256))
    insert into @tt
    select Id=id, Depth=depth+@rootDepth, m.Subject
    from @t as t
    left outer join Messages as m1 on m1.MessageID=t.l1
    left outer join Messages as m2 on m2.MessageID=t.l2
    left outer join Messages as m3 on m3.MessageID=t.l3
    left outer join Messages as m4 on m4.MessageID=t.l4
    left outer join Messages as m on m.MessageID=t.id
    order by m1.ReceivedDate, l1, m2.ReceivedDate, l2, m3.ReceivedDate, <BR>         l3, m4.ReceivedDate, l4

    -- Check depth: if it doesn't exceed 3, then we do not want to drill <BR>    -- down further
    declare @maxCurrentDepth int
    select @maxCurrentDepth=max(Depth)-@rootDepth from @tt
    
    -- Drill down or simply select the whole bunch
    if (@isDrillDown <> 1 or @maxCurrentDepth < 4)
    begin
        insert into @RtnValue select * from @tt
    end
    else
    begin
        declare crs cursor read_only fast_forward for select Id, Depth, <BR>                Subject from @tt
        open crs
        declare @nextId uniqueidentifier, @nextDepth int, <BR>                @nextSubject varchar(256)
        while 1=1
        begin
            fetch next from crs into @nextId, @nextDepth, @nextSubject
            if @@fetch_status <> 0 break
            insert into @RtnValue values (@nextId, @nextDepth, @nextSubject)
            if (@nextDepth-@rootDepth = 4)
            begin
                insert into @RtnValue select * from <BR>                                     BrowseMessages(@nextId, @nextDepth, 1)
            end
        end
        close crs
        deallocate crs
    end
    return
end
        

It works (it retrieved those nodes at levels 5 and 6):

select * from BrowseMessages(NULL, 0, 1)



Id                                   Depth       Subject            
------------------------------------ ----------- -------------------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1           Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2           Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2           Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3           Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4           Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4           Message 1-2-1-2
298D6A02-5BB6-4EDD-8030-7E15CD16A021 5           Message 1-2-1-2-1
CBCE69FF-19C1-47E7-842A-4944BA146BAA 5           Message 1-2-1-2-2
3C521C3A-AA40-49D0-83FA-E5101913A0E3 6           Message 1-2-1-2-2-1
50D2CA2B-A484-482F-A41F-ADFF1A636BE3 5           Message 1-2-1-2-3
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3           Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2           Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3           Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3           Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1           Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2           Message 2-1
        
        

What's next?

Tune-up this code for your particular needs. Say, if you have a few root nodes and deep hierarchy, consider increasing the number of joins. If you have thousands of root nodes with just one level of children, consider joining only first three level instead of four. Add paging and filtering. Use other helpful fields you have in you hierachy tables. Use proper indexing. If you anticipate extremely deep trees and 4x32=128 levels is not enough (4 is the number of joins, and SQL Server 2000 limits the number of nesting calls or nesting levels to 32), increase the number of joins or do not use recursive functions.

Enjoy!

License

This article, along with any associated source code and files, is licensed under The MIT License

Share

About the Author

Sergey Sorokin
Web Developer
Canada Canada
No Biography provided

You may also be interested in...

Comments and Discussions

 
Generalsimplicity Pin
LucidObscurity20-May-08 10:00
memberLucidObscurity20-May-08 10:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150603.1 | Last Updated 18 Oct 2006
Article Copyright 2006 by Sergey Sorokin
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid