Click here to Skip to main content
13,558,882 members
Click here to Skip to main content
Add your own
alternative version


66 bookmarked
Posted 17 Aug 2002
Licenced MIT

Modeling Hierarchies

, 17 Aug 2002
Rate this:
Please Sign up or sign in to vote.
A solution to the problem of storing hierarchical data in a relational database.

The Dogma

Storing hierarchical data in a relational database seems to be a common problem.

Although much has been written on the topic, the technique I am describing here is not widely known. It seems most authors have accepted the basic recursive pointer model, shown in Figure 1 as optimal.

Figure 1. Solution with a recursive pointer

Some authors, such as Martin Fowler or David C. Hay improved this model in various ways. For example, they separate the structure from the data as shown in Figure 2. They go as far as introducing knowledge level to the model.

But as analysts they seem more interested in dealing with the trivial and achieving theoretical beauty of the model rather then proposing its optimal physical implementation.

Figure 2. Separating the structure from the data

Others, such as Ken Henderson, the author of the Guru's Guide to Transact SQL don't question the model - instead, they emphasize on techniques to work with it.

The Real Life Problem

That was not what I needed to solve my problem. I was developing a data exchange kernel.

Each data exchange logged hundreds of events. Classical logging with a flat table was not appropriate. Not only were there too many events but also they were on different level of abstraction.

For example when the order confirmation identifier could not be matched to the original order this was a business logic error. But when the recipient’s public key could not be found this was purely technical problem.

In addition, on a daily basis many data exchanges with different customers occurred. Having them all in one log would not provide a localized chronological overview.

Even advanced filtering would not be sufficient to make flat log user friendly in this situation.

Therefore I decided to implement hierarchical logging. Each data exchange invocation would be logged as single entry on the top level. When you would expand it, you could explore the details. You could then drill-down to get even more detailed logging information.

To highlight usability again, the user would not be happy if he had to drill-down entire tree to find out if an error occurred.

Thus errors had to be propagated from lower levels to higher levels. When data exchange had an error or a warning on the lowest level (but the error caused parent operation to fail and transaction to rollback!), the parent icon in the tree control indicated that children have errors, etc.

So on the top level you could see which data exchanges were successful and which failed. For effective troubleshooting you could then drill-down the hierarchy to locate the cause of an error.

A search option was also required. When performed on top it needed to search top level and all child levels, but only in given sub-tree.

If classic recursive pointer model (or its mutation) would be used then all these requirements would result in expensive and complex queries with self-joins and perhaps temporary tables to retrieve the data.

The Solution

To enable execution of these operations with single joins the data model was extended as shown in Figure 3. 

Figure 3. Introducing Depth field

With it I introduce redundancy to the model. For each parent I store all of its children (and grand children) with appropriate depth for each child. Figure 4 shows how I store sample hierarchy to my table.












































Figure 4. Storing hierarchical data with depth

One might ask about the combinatorial explosion? When we store all children for each parent, surely we introduce a lot of overhead?

We do introduce some overhead, but the database does not grow exponentially. For each child we simply have to store all the parent relationships. That means that for each record on the 5th level, we need 4 entries in the Structure table. No more. And how many levels can you have in a tree control while still letting the user feel in control?


Following are samples of how you may leverage the new structure. In samples, I will use very simple SQL (no INNER JOIN, etc.) to show that with the new structure it is possible to work effectively even with simple SQL server engines.

I will use T-SQL to increase the clarity and readability of code, when necessary.

Get Root Nodes

To get the root node simply find all nodes that do not have any entries in the structure table acting as children.

FROM [table]
  (SELECT * FROM structure WHERE structure.childId=[table].keyid)

Implementing Search

You would like to make a search for certain data throughout hierarchy starting with (but not including) parent. Here is how you can do it.

DECLARE @parent_node int
SET @parent_node=1
DECLARE @searched_data varchar(2)
SET @searched_data='e'
FROM [table],structure
  [table].Data=@searched_data AND
  [table].KeyId=structure.ChildId AND

Implementing Propagation

Suppose you would like to implement propagation. In my description of the hierarchical log I mentioned that all errors are propagated from lower levels to higher levels.

Following sample demonstrates similar problem. Instead of checking if given parent has children with errors it checks if given parent has children that contain certain sub-string. It returns a number > 0 if it yes and 0 if no.

DECLARE @parent_node int
SET @parent_node=2
DECLARE @searched_data varchar(2)
SET @searched_data='e'
FROM [table],structure
  [table].Data=@searched_data AND
  [table].KeyId=structure.ChildId AND


I needed a solution for real life problem, but was unable to find it in the existing engineering arsenal.

Describe model is not an optimal solution for all cases, but in my case the new structure offered so many benefits over the classic model, that the price of overhead is reasonable.

Here are some other cases, where the model will work for you:

  • When you need to execute joins with entire sub-trees,
  • When you need to perform an operation on a sub-tree, for example, calculate the bill of material, and
  • When you need to propagate data downward from the parent to the children or upwards from the children to the parents.

I do hope you find it as helpful as I did. :-)


This article, along with any associated source code and files, is licensed under The MIT License


About the Author

Tomaž Štih
Founder Wischner Ltd
United Kingdom United Kingdom
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralOr you can just use a single column and Binary Trees... Pin
adamfowleruk12-Jan-09 23:42
memberadamfowleruk12-Jan-09 23:42 
GeneralHierarchical Database Pin
manojmmj29-Jun-05 1:32
membermanojmmj29-Jun-05 1:32 
QuestionHave you seen the modified preorder alternative? Pin
dross128-Jul-03 9:10
memberdross128-Jul-03 9:10 
GeneralMaintenance Issues Pin
Brett Smith10-Jun-03 1:28
memberBrett Smith10-Jun-03 1:28 
While the solution you have come up with may suite your particular needs right now you have made a major assumption. The levels you need will never be greater than 5. If this is true & the real world never changes then the design seems solid.

BUT ... life never sits still for long. If it does change, then the person taking over maintenance in the future is going to have to do a lot of work to, eg, go to six levels. Rebuild the table, take the system off-line, modify all the code to handle the sixth level, etc.

The advantage over the recursive model is that it needs no further maintenance once set up. As with all generic code, it's harder to write in the first place, and can take longer to run, but the long term costs are often much smaller.

Consider the real world situation of Companies & their owners. The recursive model allows for a company to own another to own another, etc. Each company is on the same table with a key, "Owner", pointing to the same table. There can be many 10's of links of ownership - even circular references. To try and have a horizontal link for all the possible parents, when the majority doesn't have any is terribly waisteful and no simpler than the recursive model.

Personally I like to model my database designs around the real world without limitations, which all to easily change. This then often highlights logical errors. Another example are Hospital Patient & Relatives. They can appear to be two entities, until you consider that a relative can also be a patient (and visa versa). As such the real entity is Person, where Relative & Patient are attributes of the Entity Person. In this case we again see the recursive relationship.

Clearly in these examples the recursion is via a many to many linkage table.

I quite accept your assumption for particular situations. I just don't think this design can be considered a generic model and wanted to highlight to other readers how the fundamental assumption of 5 levels has a huge bearing on the outcome.

GeneralRe: Maintenance Issues - Not as bad as it seems. Pin
James Simpson21-Dec-04 4:55
memberJames Simpson21-Dec-04 4:55 
Generalwonderful! Pin
Adamsun22-Aug-02 23:20
memberAdamsun22-Aug-02 23:20 
GeneralAn Interesting Approach Pin
Marc Clifton18-Aug-02 4:00
memberMarc Clifton18-Aug-02 4:00 
GeneralRe: An Interesting Approach Pin
wayward18-Aug-02 6:37
memberwayward18-Aug-02 6:37 
GeneralRe: An Interesting Approach Pin
tstih19-Aug-02 2:01
membertstih19-Aug-02 2:01 
GeneralRe: An Interesting Approach Pin
Martinac22-Aug-02 0:05
memberMartinac22-Aug-02 0:05 
GeneralRe: An Interesting Approach Pin
Marc Clifton25-Aug-02 12:09
memberMarc Clifton25-Aug-02 12:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04-2016 | 2.8.180515.1 | Last Updated 18 Aug 2002
Article Copyright 2002 by Tomaž Štih
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid