Click here to Skip to main content
13,192,207 members (32,535 online)
Click here to Skip to main content
Add your own
alternative version

Stats

64.8K views
55 bookmarked
Posted 23 Oct 2006

How to manipulate hierarchical information in flat relational database tables

, 27 Jan 2017
Rate this:
Please Sign up or sign in to vote.
This article describes a technique to quickly retrieve and present hierarchical information from a flat relational database table.

History of Algorithm

When I was at TCS (TATA Consultancy Services) in Mumbai India in 2002 I was setting up a Linux COE (Center of Excellence) for GE (General Electric) account.  This position was non billable.  The purpose was to create delivery capability for Linux within this customer account.  One challenge I faced was that coming up with Linux capability required a document management software to be purchased to store the documents in an easily accessible manner that would help create, store and disemminate knowledge to other people on how to deliver and execute Linux based projects.  However since I had no budget I could not purchase any document management software.  So with free ASP, Notepad, IIS, SQL Server Express and Gimp I created a website to hold documents.  The first system I created was simple.  The parent folders or categories and documents were shown on the home page.  Clicking on a folder or category name or document opened it up in the next page.  This was horrible and slow.  So I racked my brains for a couple of months on how to do it better.  Finally I came up with this algorithm which was 1.10.8 based.  Wrote the horrible ASP ultra complicated code in Notepad (no budget for Visual Studio license) built the working document management website.  All the other COE's started using my website too as they liked it and all needed a Document Management system which they had no budget to buy.

Then in 2006 I quit and tried to pursue a Masters in Computer Science at Iowa State University in Ames Iowa.  One of the requirements that I had to publish for my masters thesis a unique computer science invention/idea to get the degree.  So I asked my Indian PhD professor if I can use this old 2002 idea of mine.  He made one of his PhD students search to see if the idea was already done.  The student found a published paper by another professor which said the same 1.2.10.8 idea.  Recently looking back at this algorithm article I found a big flaw in it and posted the correct way which is ZZZA way.  The point to be noted for which I am saying all this is I see comments left on the article with a rating of 1 by assume non PHD professors claiming that this algorithm is no good.  It actually is very good is published by real PHD Computer Science people published in reputable journals and not my untrained non PHD brain.  So please think a little hard before posting these kinds of comments.

Introduction

A simple table in a relational database is the Employees table. This has an employee ID, and a Reports to ID which is an employee ID. The normal way to fill your tree with this hierarchical information is to query for the root nodes, i.e., nodes with no parents; in this case, null in Reports to. Then, subsequently query the table for all children of the node, i.e., Reports to ID equals the root employee ID. Keep doing this until you come to leaf nodes with empty results. I will describe a better technique that avoids repetitive queries and has only one query to the table.

How to Represent the Data in the Relational Database Side

One part of the technique is to create a new key column in your relational table which has the hierarchical information in it. An example is as below: Table name: TreeTable.

NodeParentMy Key
xNULLA
yNULLB
a1xA.A
a2yB.A
a3a1A.A.A
a4a2B.A.A

As you can see from this example, there are random inserts going on, so the data is out of order. However, with a basic query that orders by the key column, this data will be in hierarchical sequence. An example follows: Select Node, Parent, [My Key] from TreeTable order by [My Key]. This produces the following result set:

NodeParentMy Key
xNULLA
a1xA.A
a3a1A.A.A
yNULLB
a2yB.A
a4a2B.A.A

The symbology is not pertinent here. The data in My Key column just requires that you use symbols for the node that are unique at the level of the tree that it exists at, and these symbols should sort in order according to the sort order of the database. This means that when you insert into a level, you should use the next available symbol, which in this case is the next alphabetical character. The delimiter again could be any symbol; I have chosen '.' for convenience. The key at any level combines the key of the parent before it, as can readily be seen. Now the advantage is this puts almost no load on any existing relational database as it exists today, as sorting is very basic and all have implemented the Order by clause with great efficiency. The next step is how the client processes this information.

When coming to the end of the symbol range in this case alphabet Z you would append to Z the next series of symbols in this case alphabet A which would result in ZA.  Coming to the end of this range you would go ZZA as next series.

Client Processing

The client merely recurses over each row, checking to see if the My Key symbol in the current row starts with the My Key value from the previous row. If it does, it recurses to fill the tree at the next level. If it does not, it falls back down to the level below and checks for a match until it lands up at the root node level and paints it as a root node. A very simple algorithm on the client. And, in fractions of a second, you can fill a treeview or any other type of hierarchical visual control, or also any data structure that you might want to fill. Let's talk about inserting data into this tree or updating it. This is, of course, very trivial. All you need to do on an insert is check what the My Key value of the parent node is and append it to the beginning of the My Key value of the node you are inserting into the table, and append a unique identifier that has not been used at this level of the tree. In my example case, it would be the next highest alphabetical character. It is insignificant really, and could just be the identity value of the current record you are inserting. As simple as that, which is always unique and implemented by a database like SQL Server. Or a sequence in Oracle. An update is merely doing the same thing over; you will change the My Key value if the update changes the parent of the node you are updating, in exactly the same fashion as an insert. You can transform an identity numeric value into an alphabetical value.

Summary

This technique allows you to efficiently present hierarchical data from a flat relational table with just one query to the relational database instead of multiple queries. It is much faster than existing techniques, and you should employ it wherever you have a relational table with hierarchical information in it.

Where do I Use it?

When I invented this technique in 2002, I was working with a huge tree that had unlimited growth potential for which the existing techniques that I knew of did not stand up to. An example of where this would be useful is for the parts explosion of an aircraft, the employees table that holds organizational chart information for a Fortune 500 company or one with many employees, or a scientific work which involves huge trees. It is useful with small trees when you wish to display the tree in its whole in one shot very quickly. I have shown how to insert, update, delete, and move branches so that you can use the system out of the box. But these are merely there for support, and not the reason to use the technique. This should answer the question, "Where do I use it?"

How to select

To select the basic information is very simple and an example is given below:

select * from treetable order by [Key]

Treetable being the name of the table from now on and [Key] being the name of the column containing the key like ZZH.JA.ZZZD for example.

If you want to select a branch in this example everything under the parent key 'A' then:

select * from treetable where [Key] like 'A.%' order by [Key]

How to order the tree nodes by some other column

So you have inserted data and keys into the table and you want to order within a level alphabetically by some other data column or numerically etc.  This is a non trivial task.  So there are multiple approaches.  The easiest one is you use the select order by technique to get the data out to code side.  Then you sort within a key level in code and fill your treeview control.  You can also query the old way which is inefficient but works for each level and sort within that node level all the children by the data column(s).  To do this use the following query:

select * from TreeData order by 
    replace(iif(charindex('.', reverse([Key]), 1) > 0, substring([Key], 1, len([Key]) - 
    charindex('.', reverse([Key]), 1) + 1), [Key] + '@'), '.', '~'), Data

TreeData being the name of the table from now on and [Key] being the name of the column containing the key like ZZH.JA.ZZZD for example.  Data being the column holding in my example alphabetic data a simple list of names.  So what this does is search for the last '.' delimiter symbol and cut it off from the Key giving just the parent node this is then sorted by Data.  So effectively within a node all the node's children are ordered or sorted alphabetically.  To sort the nodes the '.' delimeter or symbol is replaced with '~' delimiter or symbol.  This is because I have chose the capital case alphabet for my symbol set and it is default english collation so '~' will sort or order after A-Z.  For root nodes which have no '.' symbol in them '@' symbol is used because it sorts or oders before A-Z and ~.

To sort of course a single nodes children the SQL becomes much simpler:

select * from TreeData where [Key] like 'A.%' order by 
    replace(iif(charindex('.', reverse([Key]), 1) > 0, substring([Key], 1, len([Key]) - 
    charindex('.', reverse([Key]), 1) + 1), [Key] + '@'), '.', '~'), Data

Essentially you have the parent nodes key and you want to sort the tree nodes under it properly so the addition is the like transact-sql keyword that will pickup all the children.

The limitation of this ordering is that the algorithm to draw it is broken as the 'ZA.X' orders before 'ZA.A.D'.  So this query will not work for drawing the tree by reading row after row of table rows serially.  What needs to be done is that the normal select is used and then code is required client side out of the database to sort within a node level the data by another column.

So the correct and best way is not a shortcut.  You use the standard select algorithm way and instead of filling a treeview you fill a List<>() and a class.  You then in client side C# code order each nodes children by the data you want to sort at.  This is normal code so will not show this traditional code.

How to Create a new root or child node

Creating a new node requires creating a new key.  This can be done client side for a inserting a root node by first querying all root nodes sorted or ordered and picking up the last row from the resulting query.  Look at its alphabet progression and append the next alphabet and make the new key and insert it into the table.

For root nodes use this query:

select [Key] from TreeData where charindex('.', [Key], 1) = 0 order by [Key]

For child nodes for example you want all the child nodes of parent with key 'A' and you want to find out the last value and increment the alphabet to make the new key in code then the base query would be the following:

select iif(charindex('.', substring([Key], len('A.') + 1, len([Key]) - len('A.'))) > 0, 
    substring(substring([Key], len('A.') + 1, len([Key]) - len('A.')), 1, 
    charindex('.', substring([Key], len('A.') + 1, len([Key]) - len('A.'))) - 1), 
    substring([Key], len('A.') + 1, len([Key]) - len('A.'))) as [ChildNodeKey] from TreeData 
    where [Key] like 'A.%' order by [ChildNodeKey]

What this is doing it looks for all Key values starting with 'A.' so all child nodes of the node whose key is 'A'.  Then it removes the key from the child nodes key.  Then it searches for next '.' from start of string and removes that as well.  What is left is only the child nodes without their children.  Now you order it and the last record alphabet needs to be incremented.  Dont forget in code to follow the ZZZA rule that is use the last symbol in the symbol set in my case alphabets and repeat it and append to it.

There is another way to append an Identity column id to the table and have an algorithm calculate the next Key value from the numeric value of the identity for this new record.

select replicate('Z', abs((IDENT_CURRENT(TreeData) + 1) / 26)) + char(64 + ((IDENT_CURRENT(TreeData) + 1) % 26))

So basically takes the current identity value of the table increments it by one.  The result is then divided by the symbol range which in my case is capital A-Z so 26.  This gives the number of Z's at the start of the key.  Then modulo gives the last alphabet to append.  So for example if the identity is 80 then the resulting new row key is 'ZZZB'.  The absolute of 80 divided by 26 being 3 and the remainder modulo being 2.  There is a problem with this way so I do not recommend using Identity to Key conversion as it is wasteful.  The problem is if you are at a branch level like 'ZZG.ZA.J' and identity is some large number and you just want to append next symbol in sequence which would be 'ZZG.ZA.K' then as you can see there would be problems.  A workaround is that you calculate the number of nodes at the parent level and apply the same calculation above to append the correct next value in this case 'K' at the sub level.  So to calculate the number of child nodes of a given node to come up with the number to then use in this calculation to get the sub key at that parent nodes level use the following:

select unique count(*), iif(charindex('.', substring([Key], len('A.') + 1, len([Key]) - len('A.'))) > 0,   substring(substring([Key], len('A.') + 1, len([Key]) - len('A.')), 1,   charindex('.', substring([Key], len('A.') + 1, len([Key]) - len('A.'))) - 1),   substring([Key], len('A.') + 1, len([Key]) - len('A.'))) as [ChildNodeKey] from TreeData   where [Key] like 'A.%' order by [ChildNodeKey]

Also if you are going to do this client side and not in one insert statement then you should probably lock the table for inserts while doing this select and then an insert.  Best way to do this is in a single insert sql statement.

How to delete a node

Deleteing one node only is trivial you just delete the node for the key.  You should never delete a node without deleting its child nodes as orphaned nodes will break the drawing algorithm. To delete a node and all its child nodes then you would use the following statement to delete for example all child nodes of the node with key 'A':

delete from TreeData where [Key] like 'A.%' or [Key] = 'A'

 

How to update a node

Updating a node is trivial as you just update the associated columns for a particular key of that node.  However if you want to update the key itself which means you want to move the node and all its children to another parent node then that too is trivial.  What you would do is take the key of the node you want to update move in the tree and figure out the key of its parent node which is everything preceeding the last dot.  You would remove this and append the key of the parent target node you wish to move the node and all its children to.

WPF C# example code to use the algorithm

Here is some example C# code to use the algorithm.  Essentially what happens in it you loop through the ordered records and for each record you check if its a root node and if it is draw it.  If it is not a root node then you loop through the key values looking for matching key that contains the parent key.  If it does not you go up the parent key heirarchy until you find a match and draw it at the correct level or land up at root and draw it at root node level.

using System;
using System.Collections.Generic;
using System.Data;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;
using static WpfApplication1.TreeAlgorithmDataSet;

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for MainWindow.xaml
    /// </summary>
    public partial class MainWindow : Window
    {
        class Node
        {
            public string Key { get; set; }
            public TreeViewItem ParentTvi { get; set; }

            public Node(string key, TreeViewItem parentTvi)
            {
                Key = key;
                ParentTvi = parentTvi;
            }
        }

        TreeViewItem parentTvi = null;

        public MainWindow()
        {
            InitializeComponent();
        }

        private void Window_Loaded(object sender, RoutedEventArgs e)
        {

            WpfApplication1.TreeAlgorithmDataSet treeAlgorithmDataSet = ((WpfApplication1.TreeAlgorithmDataSet)(this.FindResource("treeAlgorithmDataSet")));
            // Load data into the table TreeData. You can modify this code as needed.
            WpfApplication1.TreeAlgorithmDataSetTableAdapters.TreeDataTableAdapter treeAlgorithmDataSetTreeDataTableAdapter = new WpfApplication1.TreeAlgorithmDataSetTableAdapters.TreeDataTableAdapter();
            treeAlgorithmDataSetTreeDataTableAdapter.Fill(treeAlgorithmDataSet.TreeData);
            System.Windows.Data.CollectionViewSource treeDataViewSource = ((System.Windows.Data.CollectionViewSource)(this.FindResource("treeDataViewSource")));
            treeDataViewSource.View.MoveCurrentToFirst();
            DataTable dt = treeAlgorithmDataSet.Tables["TreeData"];
            EnumerableRowCollection<DataRow> query = from keydata in dt.AsEnumerable()
                                                     orderby keydata.Field<string>("Key")
                                                     select keydata;
            foreach (DataRow tdr in query)
            {
                string key = tdr.Field<string>("Key").ToString();
                TreeViewItem tvi = new TreeViewItem();
                Label lbl = new Label();
                lbl.Content = key + "    " + tdr.Field<string>("Data").ToString();
                tvi.Items.Add(lbl);
                tvi.Tag = new Node(key, parentTvi);
                if (parentTvi == null || key.IndexOf('.') == 0)
                {
                    parentTvi = tvi;
                    tvi.Tag = new Node(key, null);
                    treeView.Items.Add(tvi);
                }
                else
                {
                    bool found = false;
                    while (parentTvi != null && key.Contains(".") && !found)
                    {
                        found = Loop(parentTvi, tvi, key);
                        if(found)
                        {
                            parentTvi = tvi;
                        }
                    }
                    if(!found)
                    {
                        parentTvi = tvi;
                        tvi.Tag = new Node(key, null);
                        treeView.Items.Add(tvi);
                    }
                }
            }
        }

        bool Loop(TreeViewItem parentTvi2, TreeViewItem tvi, string key)
        {
            bool found = false;
            string parentTagKey = ((Node)parentTvi2.Tag).Key;
            if (key.Contains(parentTagKey) + ".")
            {
                tvi.Tag = new Node(key, parentTvi2);
                parentTvi2.Items.Add(tvi);
                parentTvi2 = tvi;
                found = true;
            }
            else
            {
                parentTvi = ((Node)parentTvi2.Tag).ParentTvi;
            }
            return found;
        }
    }
}

The following C# example uses Linq to do the insert for a new record node to the parent node "A".

            DataTable dt = treeAlgorithmDataSet.Tables["TreeData"];
            var insertquery = from keydata in dt.AsEnumerable()
                              where keydata.Field<string>("Key").StartsWith("A.")
                              orderby
                              keydata.Field<string>("Key").IndexOf("A.") > -1 ?
                              keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).IndexOf(".") > 0 ?
                              keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).Substring(0,
                              keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).IndexOf(".") - 1) :
                              keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length) : null
                              select new
                              {
                                  FullKey = keydata.Field<string>("Key"),
                                  Key = keydata.Field<string>("Key").IndexOf("A.") > -1 ?
                                        keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).IndexOf(".") > 0 ?
                                        keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).Substring(0,
                                        keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length).IndexOf(".") - 1) :
                                        keydata.Field<string>("Key").Substring("A.".Length, keydata.Field<string>("Key").Length - "A.".Length) : null
                              };
            string newinsertkey = insertquery.Count() == 0 ? "A.A" : (insertquery.Last().Key.Last<char>() == 'Z' ? insertquery.Last().FullKey + "A" :
                insertquery.Last().FullKey.Substring(0, insertquery.Last().FullKey.Length - 1) + ((char)((int)insertquery.Last().Key.Last<char>() + 1)).ToString());
            DataRow dr = dt.NewRow();
            dr.SetField<string>(dt.Columns["Key"], newinsertkey);
            dr.SetField<string>(dt.Columns["Data"], "Fritjof Capra");
            dt.Rows.Add(dr);
            dt.AcceptChanges();

Modifications

  • Order by another column has a traditional solution now using the algorithm's select for base fill of a List<>().
  • Due to some confusion coming up in the comments I have removed the old numeric 10.8.1 way which had shortcomings.
  • Uploaded new zip with code for insert added in.
  • Minor fix to insert example code.
  • Minor change to fix algorithm code example.
  • Added new example C# WPF example code section.
  • Added new section for CRUD and select operations.
  • The change from numeric values to alphabets solves the problem when looking at the raw data in the order it was inserted.  Numerics would sort 1.1, 10.1, 2.1 this problem is solved with using alphabets.
  • One of the comments was why copy the data, so I have eliminated this and you work directly with the resulset.
  • There was a bug in which wrong nodes were matching, thanks to using the basic StartsWith of the String class. This has been eliminated, and a special function now handles checking if the node is a subnode of the current node.
  • There was a problem with the way nodes were being dragged and dropped, and this was being done incorrectly and has been fixed now.
  • Additional code for using the DataReader has been added for the industrial strength solution.
  • Added sorting of the nodes as the method does not show any sorting as is.
  • Showed how to display a branch of the tree with the same code.
  • There was a problem with using -, as 1-1 was being ordered by SQL Server 2005 after 11. So I have switched to using a dot as the delimiter as 1.1 orders before 11. Please update your code if your current database has the same problem. Please remember, if dot does not work for you, search for another character that will order 1<some char>1 before 11.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Akshay Srinivasan2
Architect
India India
I have been coding since 1984 in a variety of languages. I originally started as a game programmer and then switched to business programming from 1990. I still program games.

You may also be interested in...

Comments and Discussions

 
Question[My vote of 1] Joe Celcko already solve this, but... Pin
Jordi Martínez8-Dec-16 4:53
memberJordi Martínez8-Dec-16 4:53 
AnswerRe: [My vote of 1] Joe Celcko already solve this, but... Pin
Akshay Srinivasan220-Dec-16 7:47
memberAkshay Srinivasan220-Dec-16 7:47 
QuestionFormat Pin
Nelek5-Dec-16 18:56
protectorNelek5-Dec-16 18:56 
AnswerRe: Format Pin
Akshay Srinivasan221-Dec-16 9:40
memberAkshay Srinivasan221-Dec-16 9:40 
QuestionClosure Tables are a more refined approach that maintain full referential integrity. Pin
JR.o5-Dec-16 6:01
memberJR.o5-Dec-16 6:01 
QuestionAnother approach using SQL hierarchical queries, depending on your database server Pin
e.prost30-Nov-16 7:44
membere.prost30-Nov-16 7:44 
AnswerRe: Another approach using SQL hierarchical queries, depending on your database server Pin
Akshay Srinivasan22-Dec-16 4:17
memberAkshay Srinivasan22-Dec-16 4:17 
GeneralDeletion is not complete Pin
Jens Scheidtmann2-Nov-06 23:13
memberJens Scheidtmann2-Nov-06 23:13 
GeneralRe: Deletion is not complete Pin
Akshay Srinivasan23-Nov-06 10:35
memberAkshay Srinivasan23-Nov-06 10:35 
GeneralRe: Deletion is not complete Pin
Jens Scheidtmann4-Nov-06 1:47
memberJens Scheidtmann4-Nov-06 1:47 
GeneralRe: Deletion is not complete Pin
Akshay Srinivasan28-Nov-06 12:14
memberAkshay Srinivasan28-Nov-06 12:14 
GeneralA better approach Pin
RobLans23-Oct-06 7:53
memberRobLans23-Oct-06 7:53 
GeneralRe: A better approach Pin
Akshay Srinivasan223-Oct-06 10:46
memberAkshay Srinivasan223-Oct-06 10:46 
GeneralRe: A better approach Pin
Phoebus[osiacat_ru]23-Oct-06 21:37
memberPhoebus[osiacat_ru]23-Oct-06 21:37 
GeneralRe: A better approach Pin
Akshay Srinivasan224-Oct-06 1:22
memberAkshay Srinivasan224-Oct-06 1:22 
GeneralRe: A better approach Pin
Akshay Srinivasan224-Oct-06 3:50
memberAkshay Srinivasan224-Oct-06 3:50 
GeneralRe: A better approach Pin
Akshay Srinivasan224-Oct-06 3:53
memberAkshay Srinivasan224-Oct-06 3:53 
QuestionCleaner method? Pin
Mark Nischalke23-Oct-06 6:05
memberMark Nischalke23-Oct-06 6:05 
AnswerRe: Cleaner method? Pin
Akshay Srinivasan223-Oct-06 6:17
memberAkshay Srinivasan223-Oct-06 6:17 
GeneralRe: Cleaner method? Pin
Mark Nischalke23-Oct-06 6:51
memberMark Nischalke23-Oct-06 6:51 
GeneralRe: Cleaner method? Pin
Akshay Srinivasan223-Oct-06 10:44
memberAkshay Srinivasan223-Oct-06 10:44 
GeneralRe: Cleaner method? Pin
Mark Nischalke23-Oct-06 11:13
memberMark Nischalke23-Oct-06 11:13 
GeneralRe: Cleaner method? Pin
Akshay Srinivasan223-Oct-06 12:15
memberAkshay Srinivasan223-Oct-06 12:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171017.2 | Last Updated 27 Jan 2017
Article Copyright 2006 by Akshay Srinivasan2
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid