The Deep Table

Alejandro Xalabarder

4.27/5 (10 votes)

Apr 25, 2010

CPOL

7 min read

38497

195

A natural way of joining tables using SQL

Introduction

When working with relational databases, it is almost impossible not to talk about SQL. With this language, a huge variety of operations and queries can be performed on a database and, depending on its implementation and use, in a very efficient way. Nevertheless, although it is clearly a human readable language by design, it soon starts being complicated to be read. For example, the select query:

SELECT id, date, customerId FROM tSales

reflects perfectly the intention of the query, that is:

"want from the table tSales: id, data and customerId"

But, this other query:

SELECT id, date, customerId, tCustomer.name FROM tSales, _
	tCustomer WHERE tSales.customerId = tCustomer.id

Although it is also explicit, somehow upsets the developer's eyes. We specify the way the record from the table tCustomer has to be accessed, while in a given schema it usually would be the only way to do that. In this case, the intention is:

"want from the table tSales: id, data, customerId and the customer's name"

Something like "customer's name" could be implemented if we previously had defined the relationship tSales <-> tCustomer. But not in SQL, since this kind of relationship between tables cannot be included in the database schema, at least in a way that could be used in SELECT queries. But as usual, when the language cannot achieve something, we have the code generation.

Background

This article will only treat the problem of building SELECT statements and more specifically for joining tables. Well, the problem is not new and therefore there are many tools and libraries in different languages that help in this matter. Usually graphical solutions are preferred, but when the database schema becomes large (real world) the graphic itself starts to be another problem, and in general does not provide much better documentation than the generated and "unreadable" query.

Here, it is not really presented a tool for all databases in the market, but the idea of a "deep select" against only one table (deep table), the algorithm to convert it in a standard SQL and the way a graphical interface can use it to facilitate the development as it is made in the attached demo.

The Deep Table

Assuming we have defined table connections in a database, the meaning of the deep table is almost trivial: A table where not only its columns can be selected but also the columns of the connected tables and if these are connected as well, the columns of their connected tables and so on.

For example, having the database schema:

table tISOCountries  (isoA2, isoA3, number, name)
table tCompanies     (id, name, countryId)
table tProducts      (prodID, providerID, name, price)
table tSales         (id, date, customerId, productId, quantity)

We want to be able to express a deep query over the table tSales like:

Deep select on table tSales:

id
date
customer name
customer country name
product name
quantity
product provider name
product provider country name

That is, just a set of "deep columns" where 'customer', 'product', 'provider' and 'country' are the supposed connections, and the last item a real column of a real table. This syntax is pretty readable, at least in English language while in Spanish for example it would be better a reverse order of the words. For example,

 deep column : customer country name
 english     : Customer's country name
 spanish     : name del country del customer = Nombre del pais del cliente

Although this reverse order could be interesting as well, we will keep it in English (don't worry. ;)

A query that can be generated with the previous deep select in SQL92 syntax is:

SELECT
   tSales.id AS id ,
   tSales.date AS date ,
   customer.name AS customer_name ,
   customer_country.name AS customer_country_name ,
   product.name AS product_name ,
   product_provider.name AS product_provider_name ,
   product_provider_country.name AS product_provider_country_name ,
   tSales.quantity AS quantity
FROM
   tSales ,
   tCompanies AS customer ,
   tISOCountries AS customer_country ,
   tProducts AS product ,
   tCompanies AS product_provider ,
   tISOCountries AS product_provider_country
WHERE
   tSales.customerId == customer.id  AND
   customer.countryId == customer_country.isoA2  AND
   tSales.productId == product.prodID  AND
   product.providerID == product_provider.id  AND
   product_provider.countryId == product_provider_country.isoA2

And a demo data result of the query:

Note that the generated column names as well as the table aliases are pretty readable. Not all the systems that generate code have this feature which is very important in the development.

Connections

A connection between two tables is nothing new, for instance SQL92 describes these connections using foreign keys. However, although syntactically and conceptually can be the same as the connection I present here, the use of the foreign key is strictly for Referential Integrity, that is, used as constraint on write operations like INSERT, DELETE and UPDATE. The connections that deep table needs are only thought for SELECT queries, therefore its definition is less restrictive than foreign keys. On the other side, if the foreign keys were given in the schema, there is nothing against using them as additional connections, thus avoiding double definitions in these cases.

Like in foreign keys, a connection can be defined with a set of columns of a table (source table) that reference a record of another table (target table). The following structure defines an entry of a connection:

    connectionName, sourceTable, sourceKey, targetName, targetKey

where connectionName is what we will use in the deep selection (deep columns), and it is defined with two tables and the column(s) that connect both. If the connection needs more than one column, then it is enough to use more entries with the same connectionName, sourceTable and targetTable.

    connectionName, sourceTable, sourceKey  , targetName   , targetKey
       customer      , tSales     , customerId , tCompanies   , id
       product       , tSales     , productId  , tProducts    , prodID
       provider      , tProducts  , providerID , tCompanies   , id
       country       , tCompanies , countryId  , tISOCountries, isoA2

Here is a graphical representation of these connections according to the given schema:

It is also possible to think about not specifying the sourceTable within a connection, for example:

    connectionName, sourceTable, sourceKey  , targetName   , targetKey
       country       ,            , countryId  , tISOCountries, isoA2

In this case, all tables having a column countryId are connected to the table tISOCountries automatically. This is an "open connection" and this is how it can be represented:

Algorithm for Deep Select

The main goal of the "deep table" is to generate a SQL select statement like the one presented above in the section "The deep table". For this generation, we do not even need the real schema of the database, from a set of connections and a "deep select" (set of deep columns of a table), we can generate the SQL as a text like it will be described here.

For the algorithm, it is convenient to define the "deep column" structure as follows:

    baseTable, [ connection1, ..., connectionN, ] columnName

With this structure, the base table is the same for all deep columns, in our example it would be tSales.

If we take a look at the select query of the given example, we see that the problem can be reduced to getting three lists, one for each clause : SELECT, FROM and WHERE. Initializing the FROM list with "baseTable", the rest can be done applying the following algorithm to each deep column:

After having the three lists, building the final SQL is straightforward.

Note that duplicate entries specially in the "FROM" list has to be avoided. Also note that with this algorithm, all aliases will start with the initial alias (baseTable). For instance, it would produce "tSales_customer.name AS tSales_customer_name" instead of "customer.name AS customer_name" as it is given in the example. If you prefer shorter alias names, it is possible to remove the prefixes starting with "baseTable_" during the algorithm.

Deep Schema

If having connections as defined in this article, it is convenient to work with an extended database schema that reflects these connections between tables. In our example, a simple extended schema would be:

table tISOCountries :

   isoA2,
   isoA3,
   number,
   name,

table tCompanies :

   o--o Connection [country] to table [tISOCountries] through (countryId)
   id,
   name,
   countryId,

table tProducts :

   o--o Connection [provider] to table [tCompanies] through (providerID)
   prodID,
   providerID,
   name,
   price,

table tSales :

   o--o Connection [customer] to table [tCompanies] through (customerId)
   o--o Connection [product] to table [tProducts] through (productId)
   id,
   date,
   customerId,
   productId,
   quantity,

It can also be useful for a tool to develop "recursively" all connections of a table, either into a list form or in a more graphic way like using a tree. For example, for the table tSales:

Here, it has to be taken into account that a recursive connection (table connected to itself) or some other cyclic recursion would generate an infinite list which has to be avoided. At least in the list form while in the tree, it can be accepted if this is loaded dynamically on demand (e.g. when opening a node).

A tool with these elements can generate or even execute a deep SQL just by selecting the desired columns.

Conclusion

Without adding new big concepts, just the deep column as a way to access related information from a table (e.g. "connection connection ... colName") we can perform lots of queries that would result much more complex using directly SQL because of the associated joins. All in a quite natural way (e.g. customer name = customer's name) and producing queries with readable column names (customer_name). It is also a way of seeing a record of a table containing not only values (e.g. "id", "date") but also "objects", where the connections are the objects, for instance "customer name" can be considered as the access to the field "name" of the object "customer" (e.g. customer.name).

Our generated SQL make use of INNER JOINS, note though that this can be considered a matter of implementation. The algorithm can be adapted in order to generate other join syntaxes.