12,549,161 members (45,498 online)
alternative version

8K views
9 bookmarked
Posted

# Calculating Tree Cost in Microsoft SQL Server

, 2 Jan 2014 CPOL
 Rate this:
This article explains database schema design for storing tree structure and an algorithm to compute its cost.

## Introduction

Tree structures are one of the most popular data structures used in today's world. They are not only effective in organizing data but provide an efficient lookup. Implementing trees using programming languages like C, C++, Java, C#.NET and many other languages is not uncommon. However, implementing tree like structure in database is not very popular. Here, I am first going to present a simple database schema design to store trees having N child nodes (commonly termed as N-ary trees). Then I will explain a technique to calculate cost of the tree. Readers can extend this structure by implementing it with various attributes as per their needs. I hope that you will find it helpful.

## Background

A tree is a hierarchical structure which is derived from a single node called as root node. Each node (except leaf nodes) can have upto N children (in N-ary tree) . Nodes that do not have any children are termed as leaf nodes. In this tutorial, we assume a tree where each node is associated with a weight. Each link in a tree is associated with a link weight and a node's "cost" is determined as:

`Node cost j = Node weight j + Sum of(Link weight of child i * Node weight of child i)`

## Designing the Schema

The database schema consists of two tables viz. Nodes and Links. Nodes table is designed as shown below:

Each node consists of `NodeWeight` which is used for calculation of its parent node's cost. Cost is calculated using the above formula.

Links table is designed as below:

`LinkWeight` defines the cost of link connecting the node with its parent. The `Links` table shown above consists of two tree structures. For simiplicity, I have shown the tree structure in the below image for its pictorial representation.

## Motivation

Tree structures similar to which are described above are very useful in storing hierarchical data in databases. A very popular application of such schema would be Bill of Material. However, to avoid stereotyping of this application to bill of materials, I will explain a very general design. Such database structures can also be traversed using common tabular expressions (CTE) however, on large databases CTE can incur high computation cost. The scheme explained in this article is lightweight since it uses `Stack` as the data structure to traverse the tree.

## Using the Code

The algorithm shown below for calculating tree cost can be divided into two sections. The procedure accepts a node ID as input and calculates cost for that Node. Passing root node ID will give cost for its subtree; for example, passing ID of F will generate cost for subtree rooted at F. Let's start by creating temporary tables to store intermediate data.

### Extracting Rooted Tree

```CREATE PROCEDURE GetTreeCost
(
@NodeID INT
)
AS
BEGIN
CREATE TABLE #TREE
(
RecordID INT IDENTITY(1,1),
ParentNodeID INT,
NodeID INT,
NodeWeight DECIMAL(18,2),
LevelID INT,
ProcessingComplete BIT
)

CREATE TABLE #STACK
(
ParentNodeID INT,
NodeID INT,
LevelID INT
)```

First section can be labeled as "Tree formation". Given a node id, we first extract tree structure rooted at that node in a temporary table `#TREE`.

```INSERT INTO #STACK
SELECT -1, @NodeID, NodeWeight FROM Nodes WHERE NodeID = @NodeID

INSERT INTO #TREE
SELECT -1, @NodeID, NodeWeight, 1, 0, 0 FROM Nodes WHERE NodeID = @NodeID

WHILE((SELECT COUNT(*) FROM #STACK) > 0)
BEGIN
SELECT TOP 1 @CurrentNodeID = NodeID, _
@CurrentParentID = ParentNodeID, @CurrentLevelID = LevelID FROM #STACK

DELETE FROM #STACK WHERE NodeID = @CurrentNodeID AND _
ParentNodeID = @CurrentParentID AND LevelID = @CurrentLevelID

IF ((SELECT COUNT(*) FROM Links L WHERE L.ParentNodeID = @CurrentNodeID) > 0)
BEGIN
INSERT INTO #STACK
SELECT @CurrentNodeID, L.NodeID, @CurrentLevelID + 1
FROM Links L WHERE L.ParentNodeID = @CurrentNodeID

INSERT INTO #TREE(ParentNodeID, NodeID, _
SELECT @CurrentNodeID, L.NodeID, P.NodeWeight, _
INNER JOIN Nodes P ON L.NodeID = P.NodeID WHERE L.ParentNodeID = @CurrentNodeID
END
END```

To extract the tree structure for a given node ID, a simple logic is used which is built on the `Stack` abstraction. Initially, we add the passed node ID in Stack. Since this node is not child of any node, parent ID for this record is kept as `-1`. Finally, a level is maintained for tree nodes which is `0` for root node in stack as well as tree.

Now, we iterate until there are elements present in `Stack`. The algorithm for extracting tree can be formulated as below:

1. Pop the top node from Stack (`select `and `delete`)
2. Check whether selected node is a parent node using the `Links` table
3. If no, then continue
4. Else, add children of the node in Tree as well as push them on Stack with `LevelID = CurrentLevelID + 1`
5. Continue

After extracting tree, `#TEMP` looks as shown in the below figure:

`ProcessingComplete` field is used during cost computation phase.

### Calculating Tree Cost

Having tree ready with us in the `#TREE` temporary table, we can now proceed with calculating cost of the tree. Readers are advised to refer to the formula stated above for calculating tree cost. Code for cost computation looks somewhat complicated at first, but it is eventually a simple algorithm.

```-- read from tree starting from leaf
WHILE((SELECT COUNT(*) FROM #TREE WHERE ProcessingComplete = 0) > 0)
BEGIN
DECLARE @MaxLevelID INT
DECLARE @RecordID INT
DECLARE @CurrentNodeWeight DECIMAL(18,2)

SELECT @MaxLevelID = MAX(LevelID) FROM #TREE WHERE ProcessingComplete = 0

SELECT @RecordID = RecordID, @CurrentNodeID = NodeID, _
@CurrentParentID = ParentNodeID, @CurrentNodeWeight = NodeWeight,
@UpLinkWeight = LinkWeight FROM #TREE WHERE LevelID = @MaxLevelID AND ProcessingComplete = 0

IF @CurrentParentID <> -1 -- this is not the top level Node
BEGIN
UPDATE #TREE SET NodeWeight = NodeWeight + @CurrentNodeWeight * @UpLinkWeight
WHERE NodeID = @CurrentParentID
END

UPDATE #TREE SET ProcessingComplete = 1 WHERE RecordID = @RecordID
END```

`ProcessingComplete` field is marked when a row is processed. The algorithm proceeds in a Bottom up manner and processes nodes level wise. We first process the leaf nodes (nodes which are at lowest level in tree, i.e., having highest level ID). Since they are at leaf level, they do not have any children. We can compute the cost for its parent using the formula given above. The cost update is done at the following line:

```UPDATE #TREE SET NodeWeight = NodeWeight + @CurrentNodeWeight * @CurrentQuantity
WHERE NodeID = @CurrentParentID```
After adding a node's cost to its parent, we mark the node as `Processed`. We continue this until all the nodes in tree are processed. Since we compute cost in bottom up manner, cost computation reaches intermediate nodes when all the nodes below its level are "Processed".

Finally, we get a fully calculated tree structure as shown in the below image:

Final cost for node 5 (E) can be seen in above figure as `106.69` which is calculated from its children.

## Points of Interest

Tree structures are very simple to design and efficient to store data. A popular application of such kind of tree structure is Bill of Materials. Using CTE approach to deal with such structures is popular, however, this handcrafted traversal of the structure makes it lightweight and cost effective.

## History

• v1.0.0 | Initial version

## Share

 Student United States
I am currently a Graduate Student at North Carolina State University

## You may also be interested in...

 Pro Pro

 First Prev Next
 Seems like Huffman Coding leiyangge3-Jan-14 1:03 leiyangge 3-Jan-14 1:03
 Re: Seems like Huffman Coding mihirj3-Jan-14 1:36 mihirj 3-Jan-14 1:36
 Last Visit: 31-Dec-99 18:00     Last Update: 22-Oct-16 7:00 Refresh 1

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Web02 | 2.8.161021.1 | Last Updated 2 Jan 2014