Click here to Skip to main content
Click here to Skip to main content

Modeling Hierarchies

By , 17 Aug 2002
 

The Dogma

Storing hierarchical data in a relational database seems to be a common problem.

Although much has been written on the topic, the technique I am describing here is not widely known. It seems most authors have accepted the basic recursive pointer model, shown in Figure 1 as optimal.

Figure 1. Solution with a recursive pointer

Some authors, such as Martin Fowler or David C. Hay improved this model in various ways. For example, they separate the structure from the data as shown in Figure 2. They go as far as introducing knowledge level to the model.

But as analysts they seem more interested in dealing with the trivial and achieving theoretical beauty of the model rather then proposing its optimal physical implementation.

Figure 2. Separating the structure from the data

Others, such as Ken Henderson, the author of the Guru's Guide to Transact SQL don't question the model - instead, they emphasize on techniques to work with it.

The Real Life Problem

That was not what I needed to solve my problem. I was developing a data exchange kernel.

Each data exchange logged hundreds of events. Classical logging with a flat table was not appropriate. Not only were there too many events but also they were on different level of abstraction.

For example when the order confirmation identifier could not be matched to the original order this was a business logic error. But when the recipient’s public key could not be found this was purely technical problem.

In addition, on a daily basis many data exchanges with different customers occurred. Having them all in one log would not provide a localized chronological overview.

Even advanced filtering would not be sufficient to make flat log user friendly in this situation.

Therefore I decided to implement hierarchical logging. Each data exchange invocation would be logged as single entry on the top level. When you would expand it, you could explore the details. You could then drill-down to get even more detailed logging information.

To highlight usability again, the user would not be happy if he had to drill-down entire tree to find out if an error occurred.

Thus errors had to be propagated from lower levels to higher levels. When data exchange had an error or a warning on the lowest level (but the error caused parent operation to fail and transaction to rollback!), the parent icon in the tree control indicated that children have errors, etc.

So on the top level you could see which data exchanges were successful and which failed. For effective troubleshooting you could then drill-down the hierarchy to locate the cause of an error.

A search option was also required. When performed on top it needed to search top level and all child levels, but only in given sub-tree.

If classic recursive pointer model (or its mutation) would be used then all these requirements would result in expensive and complex queries with self-joins and perhaps temporary tables to retrieve the data.

The Solution

To enable execution of these operations with single joins the data model was extended as shown in Figure 3. 

Figure 3. Introducing Depth field

With it I introduce redundancy to the model. For each parent I store all of its children (and grand children) with appropriate depth for each child. Figure 4 shows how I store sample hierarchy to my table.

Table

Key

Data

1

a

2

b

3

c

4

d

5

e

6

f

Structure

ParentKey

ChildKey

Depth

1

2

1

1

3

1

1

4

2

1

5

3

1

6

1

3

4

1

3

5

2

4

5

1

Figure 4. Storing hierarchical data with depth

One might ask about the combinatorial explosion? When we store all children for each parent, surely we introduce a lot of overhead?

We do introduce some overhead, but the database does not grow exponentially. For each child we simply have to store all the parent relationships. That means that for each record on the 5th level, we need 4 entries in the Structure table. No more. And how many levels can you have in a tree control while still letting the user feel in control?

Benefits

Following are samples of how you may leverage the new structure. In samples, I will use very simple SQL (no INNER JOIN, etc.) to show that with the new structure it is possible to work effectively even with simple SQL server engines.

I will use T-SQL to increase the clarity and readability of code, when necessary.

Get Root Nodes

To get the root node simply find all nodes that do not have any entries in the structure table acting as children.

SELECT *
FROM [table]
WHERE NOT EXISTS
  (SELECT * FROM structure WHERE structure.childId=[table].keyid)

Implementing Search

You would like to make a search for certain data throughout hierarchy starting with (but not including) parent. Here is how you can do it.

DECLARE @parent_node int
SET @parent_node=1
DECLARE @searched_data varchar(2)
SET @searched_data='e'
SELECT *
FROM [table],structure
WHERE
  [table].Data=@searched_data AND
  [table].KeyId=structure.ChildId AND
  structure.ParentId=@parent_node

Implementing Propagation

Suppose you would like to implement propagation. In my description of the hierarchical log I mentioned that all errors are propagated from lower levels to higher levels.

Following sample demonstrates similar problem. Instead of checking if given parent has children with errors it checks if given parent has children that contain certain sub-string. It returns a number > 0 if it yes and 0 if no.

DECLARE @parent_node int
SET @parent_node=2
DECLARE @searched_data varchar(2)
SET @searched_data='e'
SELECT COUNT(*)
FROM [table],structure
WHERE
  [table].Data=@searched_data AND
  [table].KeyId=structure.ChildId AND
  structure.ParentId=@parent_node

Summary

I needed a solution for real life problem, but was unable to find it in the existing engineering arsenal.

Describe model is not an optimal solution for all cases, but in my case the new structure offered so many benefits over the classic model, that the price of overhead is reasonable.

Here are some other cases, where the model will work for you:

  • When you need to execute joins with entire sub-trees,
  • When you need to perform an operation on a sub-tree, for example, calculate the bill of material, and
  • When you need to propagate data downward from the parent to the children or upwards from the children to the parents.

I do hope you find it as helpful as I did. :-)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Tomaž Štih
Chief Technology Officer
Slovenia Slovenia
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralOr you can just use a single column and Binary Trees...memberadamfowleruk12 Jan '09 - 23:42 
I've come up with a good way to do hierarchies using a single string column to store a unique Binary Tree ID for the category.
 
So this tree has the following Binary Tree IDs (BTIs):-
 
0 All
01 Mountaineering
011 Mountaineering > Navigation
0110 Mountaineering > Assessment
010 Orienteering
 

(A 0 means 'same level' a 1 means 'child')
The advantage of this method is to find any Mountaineering result you just do:-
SELECT * FROM categories WHERE bti LIKE '011%';
 
(01 being mountaineering, the extra 1 meaning 'child').
 
A full explanation can be found here:-
http://web.me.com/adamfowleruk/adamslife/Blog/Entries/2009/1/13_A_Strategy_for_Hierarchical_Searches_in_a_single_SQL_entity_table.html[^][^]
 
Hope this helps.
GeneralHierarchical Databasemembermanojmmj29 Jun '05 - 1:32 
Hello,
I'm in a project where i need to design table to maintain hierarchy.I am using vb.net as front end & MS SQL SERVER 2000 as backend.
Below is the problem stated:
I have to maintain data of employees right from CEO,Business Unit head, Practice Manager, Program Manager,Project Manager, Project Leader,Developer(this is the hierarchy).i need to generate reports,some other functionalities to perform.Some of the Reports are:List Of High Performer Employees,Budget Report,Attrition Report,so on. Functionalities include: Nominate for training, issue incentives, etc.CEO can view reports of all Business Unit heads,Practice Manager under each Business Unit head,so on. Business Unit head can view reports & work on only employees under him, he cant view reports of employees under another Business Unit head.So the same rule applies to all levels.
I thought of having a table "EMPLOYEE", having empid,reporting_manager_id,practice_manager_id,program_manager_id,so on. But many fields will have null value(Ex:If record is of Business Unit head then only his id,CEO_id will have value rest others practice_manager_id,program_manager_id,project_manager_id, so on will be null).I thought this to be a odd idea.
please give the idea for table design,queries to fetch value.
Kindly guide me,thanks in advance

QuestionHave you seen the modified preorder alternative?memberdross128 Jul '03 - 9:10 
I do have been fumbling all over this topic for a while now... I work with a few applications that need to store hierarchies, usually a user defined category structure. I had always used the adjacency (parentID, selfjoining) model, until I came to a large institution that wanted something similar, yet more complex. The main problem was how often I needed to return a tree branch as an entire tree. Until now, I had always been fetching the entire tree from the db and spitting it out, which was easy. Now I have to fetch only a branch, which may be 1 level down or 15 levels down.
 
The following solution can be confusing, and it does have some of the issues related to yours (inserting a node requires an update on every node in the tree, but SQL databases are much more optimized for that than recursion (which would be needed if I were to accomplish the above using the adjacency model).
 
Check out these articles:
 
http://www.intelligententerprise.com/001020/celko.shtml
 

http://www.sitepoint.com/article/1105
 
What I did was use a combination of both techniques. I use the left and right fields (mainly for selecting branches), but I maintain a parentID for ease of inserts. My front end code (currently using Flash Remoting, but that's another story!), only has to pass a parentID when inserting a node. A trigger in SQL, however, does the LEFT and RIGHT update for all nodes. It actually works pretty good, and I'm getting ready to performance test it (seeing how long an insert takes with 50,000 nodes should be interesting). The Celko article is very helpful as it contains a lot of the select and model conversion code.
GeneralMaintenance IssuesmemberBrett Smith10 Jun '03 - 1:28 
While the solution you have come up with may suite your particular needs right now you have made a major assumption. The levels you need will never be greater than 5. If this is true & the real world never changes then the design seems solid.
 
BUT ... life never sits still for long. If it does change, then the person taking over maintenance in the future is going to have to do a lot of work to, eg, go to six levels. Rebuild the table, take the system off-line, modify all the code to handle the sixth level, etc.
 
The advantage over the recursive model is that it needs no further maintenance once set up. As with all generic code, it's harder to write in the first place, and can take longer to run, but the long term costs are often much smaller.
 
Consider the real world situation of Companies & their owners. The recursive model allows for a company to own another to own another, etc. Each company is on the same table with a key, "Owner", pointing to the same table. There can be many 10's of links of ownership - even circular references. To try and have a horizontal link for all the possible parents, when the majority doesn't have any is terribly waisteful and no simpler than the recursive model.
 
Personally I like to model my database designs around the real world without limitations, which all to easily change. This then often highlights logical errors. Another example are Hospital Patient & Relatives. They can appear to be two entities, until you consider that a relative can also be a patient (and visa versa). As such the real entity is Person, where Relative & Patient are attributes of the Entity Person. In this case we again see the recursive relationship.
 
Clearly in these examples the recursion is via a many to many linkage table.
 
I quite accept your assumption for particular situations. I just don't think this design can be considered a generic model and wanted to highlight to other readers how the fundamental assumption of 5 levels has a huge bearing on the outcome.
 
Cheers
GeneralRe: Maintenance Issues - Not as bad as it seems.memberJames Simpson21 Dec '04 - 4:55 
Great article, very well explained ( I was actually going to write on the same subject )
 
This is still a recursive model. A hierarchy is a hierarchy and is recursive, the method mentioned above is a different approach to representing the same type of data.
 
You don't loose much overhead if you implement both parent/child and the sequence table. but what you do gain is the ability to find leaf nodes, find all the children, siblings etc...
 
To maintain the sequence table a couple of triggers on the hierarchy table can help that:
 
Append leaf node needs to generate its sequence records (one for each level)
 
Delete Node needs to remove each sequence record associated with itself.
 
Move Node is a bit trickier, you will need to update the sequence for each node and its child node.
 
A Well indexed table and a couple of good triggers makes this a very strong method for working with hierarchy data.
 
Just my two cents.
 
James
 
James Simpson
Web Developer
imebgo@hotmail.com
 
P S - This is what part of the alphabet would look like if Q and R were eliminated
Mitch Hedberg

Generalwonderful!memberAdamsun22 Aug '02 - 23:20 
Poke tongue | ;-P ;P
 
YOU WILL WHEN YOU BELIEVE!
GeneralAn Interesting ApproachmemberMarc Clifton18 Aug '02 - 4:00 
I'm not sure I would have handled the propagation of errors to the top level the same way. Why not create a separate error table that indicated the child key and the root key? This way you could easily determine if an error needed to be flagged at the root level. The user interface could then pop up with a list of errors, and the user could either view the error or you could open up the tree to the specific error level.
 
I disagree that the parent needs a link to all grandchildren, greatgrandchildren, etc., keys. This makes managing the structure a nightmare. I had to implement a similar structure for as part of a satellite design tool. I needed to track thousands of components and sub-components, allowing the engineer to customize sub-assemblies. Your architecture would create millions of entries at the top level to track all the sub-children at the top level assembly.
 
I also had to implement a dynamic "attribute-value" system for this architecture. You might find that usefull also--each child points to a data descriptor that indicates the type of kind of data fields the key contains. This abstracts the data interface, so now you can have "status" messages, "warning" messages, "error" messages, etc. Then, instead of searching for an error based on some value in the string (which will eventually get you in trouble and becomes very slow), you could search for a particular data type.
 
With all that said, it's nice to see someone putting some thought into a very complex issue--one that, as you indicated in your article, has a lot of theory behind it but not a lot of real world implementation, as far as I've seen.
 
Interestingly, the first diagram, where you have the key referencing itself, is not allowed in some database implementations (as I recall, Oracle allows it, but Access doesn't. I don't know about SQL Server--I felt the second diagram was a better implementation anyways).
 
Sorry for the long reply!
 
Marc
GeneralRe: An Interesting Approachmemberwayward18 Aug '02 - 6:37 
If you are doing more queries than updates, then Joe Celko's "SQL for Smarties" book has a very elegant nested set based solution to storing trees in a database. It basically involves giving each tree node a left and right number. These are numbered depth wise in incrementing order.
 
To search for children you query (Pseudo SQL):
SELECT *
FROM nodes n1
WHERE left_index
BETWEEN parent.left_index and parent.right_index)
 
and to count children:
SELECT (parent.right_index - parent.left_index) / 2
 
Again, no recursion, no massive tables of nodes.
 
We have used it very sucessfully to implement permission hierarchies.

GeneralRe: An Interesting Approachmembertstih19 Aug '02 - 2:01 
: I'm not sure I would have handled the propagation of
: errors to the top level the same way. Why not create
: a separate error table that indicated the child key
: and the root key?
 
I agree with you to some extent. Your solution is an optimization and would work better in this particular case, but you lose genericity. For example it does not offer solution to the search sub-tree problem and it needs one table for each data you would like to propagate.
 
: I disagree that the parent needs a link to
: all grandchildren, greatgrandchildren, etc.,
: keys. This makes managing the structure a nightmare.
 
For each new item you need (item depth-1) entries in the structure table. As long as you need a tree to show to the user this works, because showing more then 8 level tree to the user makes no sense anyway. But when you would like to nest 100 levels in the tree (could be when the consumer of the tree is a machine which processes it) then for each item in 100th level you need 99 entries in the structure table.
Still, the growth is not exponential.
 
Managing such a tree is not so complex. For adding you can in general reuse all grand-parent to parent entries in the structure table and just add one more.
 
For modifying you delete with criteria (where ChildId=your id) and add. It depends on where the stress is coming from - from the data feed or from the select statements to query the data. Mine is the latter. If it was the first then this would be considered expensive.
 
But, to conclude, I agree with your point that each model is only as good as effective it is in solving the problem it was supposed to solve. Smile | :)
 
Kind Regards,
Tomaz
GeneralRe: An Interesting ApproachmemberMartinac22 Aug '02 - 0:05 
I really appreciated your comment and it was obvious that you faced - and solved - the same problem.
 
I would like to know more about the dynamic "attribute-value" system you are talking about. The problem I have is what probably happens each time one tries to store a hierarchical structure of objects in a relationnal database. These objects inherit from the same base object, but may have different attributes.
 
Can you tell me how is you "data descriptor", and if it can handle different data types (and not only status, warning or error messages).

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 18 Aug 2002
Article Copyright 2002 by Tomaž Štih
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid