SQL code to traverse self-referencing tables with parent-child relationships

Alexei Fimine

4.60/5 (5 votes)

Jul 7, 2009

GPL3

1 min read

48769

SQL code to traverse self-referencing tables with parent-child relationships. Recursive SQL written without special proprietary commands.

Introduction

Often our solutions require implementing a self referencing table, where records are related to the records of the same table, creating a parent-child based hierarchy. In the most generic way, it is achieved through a field “PARENT_ID” which has an N to 1 relationship with the Primary Key field in the table. For example, the folder (directory) structure in your PC is structured this way.

In the normal programming setting where we would use a binary tree, reading data from this type of a structure involves recursive traversal. However, there is no recursive SQL in conventional terms. There are special commands in various versions of SQL (such as WITH in SQL 2005) which facilitate traversing parent-child relations. However, often, we may need as much independence from the DBMS as possible.

Background

While the idea is not new, I was surprised that I could not find a ready solution which would be simple and did not use special functions of a particular DBMS. After searching the web and analyzing some similar solutions, I came up with my own.

While I cannot present it in the way I used it, here I present the adapted generic version, which can be modified for use in any setting.

Using the code

It's just SQL which works in MS SQL Server 2000. This article is for intermediate to advanced programmers, and thus you should be able to know where to plug the code. Moreover, this article presents an idea rather than plug and play code.

-- While your table may be whatever you want it to be
-- but at least it needs two fields: FOLDER_ID int and PARENT_ID int
-- name them whatever you want though…
-- here is a sample table:

CREATE TABLE FOLDER
(
  FOLDER_ID int,
  FOLDER_DESCRIPTION char(128),
  PARENT_ID int
)

-- This stored procedure will “recursively” traverse
-- the Folder table and select all children based on the
-- the id of the Root that we pass in. We say “recursively”
-- in quotes because it’s not literally recursion
-- but it does exactly what regular recursion would do in traversing a tree;

CREATE  PROCEDURE [dbo].[p_FOLDERSelectTree]
      @FOLDER_ROOT_ID int -- this one accepts the root
                          -- FOLDER ID (the father of all children...)
      -- in fact, we can have multiple Roots such
      -- as @FOLDER_ROOT_ID_1 int, @FOLDER_ROOT_ID_2 int etc…
      -- then, we’ll have results for as many “trees” as many roots we have.

AS

SET NOCOUNT ON
-- Declare variables used for logic:

CREATE TABLE #CHILDREN(FOLDER_ID int );
-- the temp table to collect all the FOLDERs that
-- “grow” from the Root. If you are willing to use this
-- in MS Access, you will need to create a “real” table
-- on the spot and then drop it.

DECLARE @NewInsertCount int;
-- counter to pass the number of records inserted the last itiration
 

-- insert the root into our Temp Table):
INSERT INTO #CHILDREN(FOLDER_ID)
SELECT FOLDER_ID FROM FOLDER WHERE FOLDER_ID = @FOLDER_ROOT_ID;
-- here we could simply insert @FOLDER_ROOT_ID instead
-- of selecting from FOLDER. However, this makes sure that
-- the Root exists. It becomes more clear from the next line
-- where we initiate the counter - if it is 0 then no root
-- was inserted and thus there is no point of entering the loop
-- and it will go straight to the return:

SET @NewInsertCount = @@ROWCOUNT;

-- the tree traversal:

WHILE @NewInsertCount > 0
-- enter the loop if there was something
-- inserted in the previous statement;

BEGIN
    -- I am not going to describe all the SQL, but the idea is
    -- that we insert into #CHILDREN all Folder Ids which are “children” to
    INSERT INTO #CHILDREN(FOLDER_ID) 
        SELECT FOLDER_ID FROM FOLDER WHERE EXISTS
               (SELECT FOLDER_ID  FROM #CHILDREN
                WHERE FOLDER.PARENT_ID = #CHILDREN.FOLDER_ID)
         AND  NOT EXISTS
              (SELECT FOLDER_ID FROM #CHILDREN
               WHERE FOLDER.FOLDER_ID = #CHILDREN.FOLDER_ID);    
    
         SET @NewInsertCount = @@ROWCOUNT;
         -- if the value is 0 then there were no new
         -- children inserted thus no "grandchildren" expected;
END

SELECT * FROM #CHILDREN ORDER BY FOLDER_ID; -- that's what we're looking for.
-- this temp table should contain
-- all “children” and sub children of the root.

Now, whenever you need to use the tree from another Stored Procedure, you have to do the following:

 -- a temporary table to hold children records (dependants);
CREATE TABLE #CHILDREN (FOLDER int); 

INSERT INTO #CHILDREN EXEC p_FOLDERSelectTree 
       @FOLDER _ROOT_ID = @IN_FOLDER_ID; 
-- get all the dependants, by passing the ID
-- of the Root, i.e. the record of which you
-- are willing to get all the dependents;