
Introduction
I've often wondered how to use recursion in SQL. This article shows how to use Common Table Expressions (CTE or CTE's) to do this.
Background
SQL Server 2005 introduces a neat little thingy called CTE's. These can allow to you to do most of the things that derived tables can - but it also brings with it a more powerful capability - you can now write recursive queries in SQL using set-based logic. This capability is best explained with an example, so let's get down to it right away.
Printing an employee hierarchy
Look at the Employees table in the Northwind database. You will see that each employee has a ReportsTo column value set to an Employee ID. Let's say, your task is to print out an employee hierarchy for Employee ID 5. What do you do? Resort to ADO.NET? With SQL Server 2005, you can use CTE's to do the same. Look at the code below:
WITH CTE_EMPLOYEE_HEIRARCHY AS
(
SELECT E.EmployeeID, E.ReportsTo AS Supervisor, E.FirstName, E.LastName
FROM EMPLOYEES E WHERE E.EmployeeID = 5
UNION ALL
SELECT E1.EmployeeID, E1.ReportsTo AS Supervisor, E1.FirstName, E1.LastName
FROM CTE_EMPLOYEE_HEIRARCHY
JOIN EMPLOYEES E1
ON E1.ReportsTo = CTE_EMPLOYEE_HEIRARCHY.EmployeeID
)
SELECT * FROM CTE_EMPLOYEE_HEIRARCHY
Let's look at each little block. The 'WITH CTE_EMPLOYEE_HEIRARCHY
' simply names your CTE as CTE_EMPLOYEE_HEIRARCHY
. The first SELECT
statement is the 'anchor'.
SELECT E.EmployeeID, E.ReportsTo AS Supervisor, E.FirstName, E.LastName
FROM EMPLOYEES E WHERE E.EmployeeID = 5
It kicks off the recursive query by returning an employee row with ID = 5.
Next, the second SELECT
gets the employees that report to EMP ID 5, but that's only in the first run. Notice that this SELECT
statement refers to the CTE itself:
SELECT E1.EmployeeID, E1.ReportsTo AS Supervisor, E1.FirstName, E1.LastName
FROM CTE_EMPLOYEE_HEIRARCHY
JOIN EMPLOYEES E1
ON E1.ReportsTo = CTE_EMPLOYEE_HEIRARCHY.EmployeeID
So this statement will return all the employees who report to Employee ID = 5. In this example, there's only one level of reportees (6,7, and 9). But if Employee IDs 6,7, or 9 would have had more reportees, in the next recursion step, the CTE name would be replaced by Employee IDs 6,7, and 9, and you would get the next level of reportees.
So where does the recursion stop? By default, the recusion stops at a recursive level of 100 (configurable), or when an empty result set is returned, whichever is earlier - as in the case of the above example, when the second recursive call to the CTE name would get resolved to 6,7, and 9, but the resultant empty set of reportees would be empty because 6,7, and 9 are the humble software devs who have nobody to delegate work to.