Click here to Skip to main content
Click here to Skip to main content

The Deep Table

By , 4 May 2010
 

Introduction

When working with relational databases, it is almost impossible not to talk about SQL. With this language, a huge variety of operations and queries can be performed on a database and, depending on its implementation and use, in a very efficient way. Nevertheless, although it is clearly a human readable language by design, it soon starts being complicated to be read. For example, the select query:

SELECT id, date, customerId FROM tSales

reflects perfectly the intention of the query, that is:

"want from the table tSales: id, data and customerId"

But, this other query:

SELECT id, date, customerId, tCustomer.name FROM tSales, _
	tCustomer WHERE tSales.customerId = tCustomer.id

Although it is also explicit, somehow upsets the developer's eyes. We specify the way the record from the table tCustomer has to be accessed, while in a given schema it usually would be the only way to do that. In this case, the intention is:

"want from the table tSales: id, data, customerId and the customer's name"

Something like "customer's name" could be implemented if we previously had defined the relationship tSales <-> tCustomer. But not in SQL, since this kind of relationship between tables cannot be included in the database schema, at least in a way that could be used in SELECT queries. But as usual, when the language cannot achieve something, we have the code generation.

Background

This article will only treat the problem of building SELECT statements and more specifically for joining tables. Well, the problem is not new and therefore there are many tools and libraries in different languages that help in this matter. Usually graphical solutions are preferred, but when the database schema becomes large (real world) the graphic itself starts to be another problem, and in general does not provide much better documentation than the generated and "unreadable" query.

Here, it is not really presented a tool for all databases in the market, but the idea of a "deep select" against only one table (deep table), the algorithm to convert it in a standard SQL and the way a graphical interface can use it to facilitate the development as it is made in the attached demo.

The Deep Table

Assuming we have defined table connections in a database, the meaning of the deep table is almost trivial: A table where not only its columns can be selected but also the columns of the connected tables and if these are connected as well, the columns of their connected tables and so on.

For example, having the database schema:

table tISOCountries  (isoA2, isoA3, number, name)
table tCompanies     (id, name, countryId)
table tProducts      (prodID, providerID, name, price)
table tSales         (id, date, customerId, productId, quantity)

We want to be able to express a deep query over the table tSales like:

Deep select on table tSales:

id
date
customer name
customer country name
product name
quantity
product provider name
product provider country name

That is, just a set of "deep columns" where 'customer', 'product', 'provider' and 'country' are the supposed connections, and the last item a real column of a real table. This syntax is pretty readable, at least in English language while in Spanish for example it would be better a reverse order of the words. For example,

 deep column : customer country name
 english     : Customer's country name
 spanish     : name del country del customer = Nombre del pais del cliente

Although this reverse order could be interesting as well, we will keep it in English (don't worry. ;)

A query that can be generated with the previous deep select in SQL92 syntax is:

SELECT
   tSales.id AS id ,
   tSales.date AS date ,
   customer.name AS customer_name ,
   customer_country.name AS customer_country_name ,
   product.name AS product_name ,
   product_provider.name AS product_provider_name ,
   product_provider_country.name AS product_provider_country_name ,
   tSales.quantity AS quantity
FROM
   tSales ,
   tCompanies AS customer ,
   tISOCountries AS customer_country ,
   tProducts AS product ,
   tCompanies AS product_provider ,
   tISOCountries AS product_provider_country
WHERE
   tSales.customerId == customer.id  AND
   customer.countryId == customer_country.isoA2  AND
   tSales.productId == product.prodID  AND
   product.providerID == product_provider.id  AND
   product_provider.countryId == product_provider_country.isoA2

And a demo data result of the query:

Possible result table

Note that the generated column names as well as the table aliases are pretty readable. Not all the systems that generate code have this feature which is very important in the development.

Connections

A connection between two tables is nothing new, for instance SQL92 describes these connections using foreign keys. However, although syntactically and conceptually can be the same as the connection I present here, the use of the foreign key is strictly for Referential Integrity, that is, used as constraint on write operations like INSERT, DELETE and UPDATE. The connections that deep table needs are only thought for SELECT queries, therefore its definition is less restrictive than foreign keys. On the other side, if the foreign keys were given in the schema, there is nothing against using them as additional connections, thus avoiding double definitions in these cases.

Like in foreign keys, a connection can be defined with a set of columns of a table (source table) that reference a record of another table (target table). The following structure defines an entry of a connection:

    connectionName, sourceTable, sourceKey, targetName, targetKey

where connectionName is what we will use in the deep selection (deep columns), and it is defined with two tables and the column(s) that connect both. If the connection needs more than one column, then it is enough to use more entries with the same connectionName, sourceTable and targetTable.

    connectionName, sourceTable, sourceKey  , targetName   , targetKey
       customer      , tSales     , customerId , tCompanies   , id
       product       , tSales     , productId  , tProducts    , prodID
       provider      , tProducts  , providerID , tCompanies   , id
       country       , tCompanies , countryId  , tISOCountries, isoA2

Here is a graphical representation of these connections according to the given schema:

Connections graphically

It is also possible to think about not specifying the sourceTable within a connection, for example:

    connectionName, sourceTable, sourceKey  , targetName   , targetKey
       country       ,            , countryId  , tISOCountries, isoA2

In this case, all tables having a column countryId are connected to the table tISOCountries automatically. This is an "open connection" and this is how it can be represented:

Open connection

Algorithm for Deep Select

The main goal of the "deep table" is to generate a SQL select statement like the one presented above in the section "The deep table". For this generation, we do not even need the real schema of the database, from a set of connections and a "deep select" (set of deep columns of a table), we can generate the SQL as a text like it will be described here.

For the algorithm, it is convenient to define the "deep column" structure as follows:

    baseTable, [ connection1, ..., connectionN, ] columnName

With this structure, the base table is the same for all deep columns, in our example it would be tSales.

If we take a look at the select query of the given example, we see that the problem can be reduced to getting three lists, one for each clause : SELECT, FROM and WHERE. Initializing the FROM list with "baseTable", the rest can be done applying the following algorithm to each deep column:

Deep column reduction algorithm

After having the three lists, building the final SQL is straightforward.

Note that duplicate entries specially in the "FROM" list has to be avoided. Also note that with this algorithm, all aliases will start with the initial alias (baseTable). For instance, it would produce "tSales_customer.name AS tSales_customer_name" instead of "customer.name AS customer_name" as it is given in the example. If you prefer shorter alias names, it is possible to remove the prefixes starting with "baseTable_" during the algorithm.

Deep Schema

If having connections as defined in this article, it is convenient to work with an extended database schema that reflects these connections between tables. In our example, a simple extended schema would be:

table tISOCountries :

   isoA2,
   isoA3,
   number,
   name,

table tCompanies :

   o--o Connection [country] to table [tISOCountries] through (countryId)
   id,
   name,
   countryId,

table tProducts :

   o--o Connection [provider] to table [tCompanies] through (providerID)
   prodID,
   providerID,
   name,
   price,

table tSales :

   o--o Connection [customer] to table [tCompanies] through (customerId)
   o--o Connection [product] to table [tProducts] through (productId)
   id,
   date,
   customerId,
   productId,
   quantity,

It can also be useful for a tool to develop "recursively" all connections of a table, either into a list form or in a more graphic way like using a tree. For example, for the table tSales:

Deep table schema

Here, it has to be taken into account that a recursive connection (table connected to itself) or some other cyclic recursion would generate an infinite list which has to be avoided. At least in the list form while in the tree, it can be accepted if this is loaded dynamically on demand (e.g. when opening a node).

A tool with these elements can generate or even execute a deep SQL just by selecting the desired columns.

Conclusion

Without adding new big concepts, just the deep column as a way to access related information from a table (e.g. "connection connection ... colName") we can perform lots of queries that would result much more complex using directly SQL because of the associated joins. All in a quite natural way (e.g. customer name = customer's name) and producing queries with readable column names (customer_name). It is also a way of seeing a record of a table containing not only values (e.g. "id", "date") but also "objects", where the connections are the objects, for instance "customer name" can be considered as the access to the field "name" of the object "customer" (e.g. customer.name).

Our generated SQL make use of INNER JOINS, note though that this can be considered a matter of implementation. The algorithm can be adapted in order to generate other join syntaxes.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Alejandro Xalabarder
Germany Germany
Member
I have a Telecomunications Engineering degree but since I learnt the first language (Pascal), almost I haven't stopped developing software. Mainly with C and C++ but I have touched many other languages and scripts. I use also to develop for fun (reinventing the wheel, of course!), currently almost all of such programs and tools are written in java (visit www.elxala.de).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralNice concept - but has been done beforememberColin Eberhardt28 Apr '10 - 3:03 
Hi,
 
Nice article, well written and explained.
 
Have you had any experience with Hibernate before? I think you will find that the Hibernate Query Language (HQL) through its implicit join syntax achieves what you are describing in this article:
 
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/queryhql.html#queryhql-joins-forms[^]
 
(I haven't checked, but I am guessing HQL syntax is the same in nHibernate)
 
Regards,
Colin E.
GeneralRe: Nice concept - but has been done before [modified]memberAlejandro Xalabarder28 Apr '10 - 11:05 
Hi Colin,
 
Thank you for your feedback,
 
>>has been done before
I do not work with Hibernate and I did not know about the implicit join from Hibernate
but for sure it has been done before! After reading your comment I googeled a little
and found also this link http://vista.intersystems.com/csp/docbook/DocBook.UI.Page.cls?KEY=TJAV_ImplicitJoin. There must be for sure many similar aproaches. Actually I developed many years ago the same thing for a database called Btrieve (very very old).
It is quite intuitive and I am sure that neither me, Hibernate nor Cache SQL has invented this concept.
 
But, why not having something like this with pure standard SQL ? Even I am not sure if any
standard from SQL not defines already something like this Frown | :( , I am not a SQL standard gurú at all (should I?). Now I am working more with sql and I missed this feature so I found a way to do it, I implemented it (in few days I will release the demo) and I wrote this article, that's it.
 
Anyway, I get your point. I think I will mention in the article some of these aproaches.
 

Sincerely
Alejandro

modified on Wednesday, April 28, 2010 5:22 PM

Generalhas been done before ... but HOWmemberAlejandro Xalabarder28 Apr '10 - 12:37 
Hi again Colin,
 
Well, I've started right now searching "similar aproaches". I am sure they exist
but with the first two tries : your Hibernate's implicit join and my link "Cache SQL"
I have a problem, at least in order to say that they are similar aproaches.
 
Accessing a field with "." from an object is not new, I agree with you, I learnt this
in the very first lessons of Pascal Wink | ;)
 
These two languages seems to work with objects and querying them with a QL systems, all right.
Since they are object oriented this feature is expected, but the system I describe in my
article does not need from any extra structure (connection of two tables cannot be seen as
a new structure). I just do it with plain tables that any SQL engine has. I think is quite different.
 
At least it is not as fast as I expected to find how the implicit join of Hibernate works, I mean how it really works, when is possible to use it etc. Furthermore, I find things like this in formus:
 
"The (N)Hibernate documentation says that it is better to make use of the explicit join syntax and that is working correct. So do not use implicit joins (well, atleast not in version 2.0.0.GA)!"
 
This is dated in 2008, not new but also not so old.
 
Are you developing with Hibernate? can you post a good link that explains the implicit joins ?
GeneralRe: has been done before ... but HOWmemberColin Eberhardt28 Apr '10 - 21:38 
Hi Alejandro,
 
Glad my comment was of interest to you!
 
I have not come across CacheSQL before, but have used Hibernate extensively (albeit a few years back). Yes, you are correct, Hibernate is an O/R mapper, so you must map your tables to objects in order to use the query language (although it does much more than just permit querying of database backed objects!). However, hibernate does not load all the data into memory then query the objects. The query, with its implicit join syntax, is typically translated directly into SQL and executed on the database.
 
I am not aware of any documentation that described how it parses the Hibernate Query Language (HQL). My knowledge of it comes from turning on the debugging in applications and watching the SQL queries executed on the database as a result of my HQL queries.
 
I like your idea of taking the simplest approach, applying the implicit join concept direct to SQL. And, no, this is not present in any form in the ANSI SQL standard.
 
Anyhow, interesting article.
 
Have fun,
Colin E.
GeneralHow deep is HIBERNATE ?memberAlejandro Xalabarder5 May '10 - 12:24 
Hi Colin,
 
now that I've taken a deeper look at "Hibernate's implicit join", I have realized that
it has nothing to do with my deep table. Since I guess you have skills with Hibernate
I suposse that you have read too fast my article. Did you even try to imagine how to
implement with HIBERNATE the sample I present in the article ? (4 connections)
 
I guess it is not possible, as well as other features like self-connection.
 
So my aproach it's definitively not implementing the "Hibernate's implicit join" concept.
Also I would not call it implicit join at all, as you know it exists also the term "SQL implicit join" which is applyied to the SQL92 join style (btw the one I use in the article
in the generated SQL).
 
regards,
Alejandro
QuestionWhere is the source code?memberosy25 Apr '10 - 23:41 
Nice article, btw.
AnswerRe: Where is the source code?memberAlejandro Xalabarder26 Apr '10 - 12:55 
Hi,
I did not plan to post a specific code but a demo
program. It is based on open source so you will be able to see the code anyway. I will attach it to the article in the next days, together with the reference to the sourceforge project and telling which class implements the algorithm of the article.
GeneralDemo program and demo C++ availablememberAlejandro Xalabarder1 May '10 - 11:22 
I've just finished and posted the demos for the article:
one executable (jar file) and one simple C++ sample.
The algorithm described in the article is implemented in the C++ demo in the
file deepTableSample.cpp in the method deepSQL(..)
and in java it is located in then class de.elxala.db.dbMore::resolveConnections
the sources of the java code can be found at Sourceforge gastona
(the zip containing the sources)
 
enjoy them! Wink | ;)

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 4 May 2010
Article Copyright 2010 by Alejandro Xalabarder
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid