14,218,216 members

# SQL Wizardry Part Eight - Tally Tables

Rate this:
21 Jan 2014CPOL
A description of the best way to create tally tables, and how to use them

## Introduction

In the SQL world, there is a common acronym. That is RBAR, which stands for 'row by agonising row'. SQL is set based, that means that you operate on sets of data by defining rules to make clear what those sets are. Any time you move away from this, and process data using procedural code, that works one row at a time, you are at great risk of slowing your processes down.

The alternative to this quite often, is what's called a 'tally table'. This is just a table that has a sequence of numbers in it. I've used such a table in previous articles, and today I'm going to dig in to the code that creates it, and talk about some additional uses for such a table, beyond the ones I've already shown you.

## Creating the table

Here is the first block of SQL from the sample file:

```WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^2 or 10000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^2 or 100000000 rows

select 'E1', count(1) from e1 union all
select 'E2', count(1) from e2 union all
select 'E4', count(1) from e4 union all
select 'E8', count(1) from e8
;```

It returns the following:

```E1	10
E2	100
E4	10000
E8	100000000```

The comma in SQL is short hand for a cross join. This is not a join in terms of connecting based on a common value, it simply takes every value in table a, and for each value, returns every value in table b. The effect of this is clearly exponential. And so, we start with a union all block that creates 10 rows. Then we do a cross join, which returns 10 * 10, or, 100 rows. A cross join on this table returns 100 * 100, or 10000 rows. Cross joining this final table returns 100000000 rows, which should be enough in most situations. The final select shows this, by returning the row names and the number of rows in each.

Note that this is SUPER fast. There's all sorts of stuff on the web about ways to create tally tables, please just trust me that this is the most efficient way of doing so.

The second block of SQL looks like this:

```WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^2 or 10000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^2 or 100000000 rows
E(N) as(select row_number() over (order by n) from e8)

select distinct top 100 e8.n as 'E8', e.n as e from e, e8
;```

The trouble with the first block, is that we ended up with a table that had 100000000 1s in it. Now we use row_number() to create a sequence of numbers, which is what we need to make this really useful. The select does select 100 values from e8, but usually, I just cut back on the number of tables created, if I need less than 10000 rows. So, in this case, getting rid of E4, E8 and the top statement would have been more efficient.

## Generating a binary sequence

The simplest thing we can do with a tally table, is to use the sequence with some maths to generate a sequence other than 1,2,3,4, etc. Here is a binary sequence.

```;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E(N) as(select row_number() over (order by n) from e2)

select power(2, n) from e where n < 30
;```

## Date ranges

Another thing we may need to do from time to time, is generate a date range. If you need a range that is not inclusive of all days, we can mix this example with the example about to create a sequence of numbers that does not increase by 1, or add a where clause to skip values we want to skip, if the rule is not purely numeric. Here's the simple version. Note, you don't NEED to return the day, month and year, I'm simply doing so here for the purpose of illustration

```DECLARE @BeginDate DATE = '2011-01-01', @EndDate DATE = '2012-06-30'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^2 or 10000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^2 or 100000000 rows
E(N) as(select row_number() over (order by n) from e8)

FROM E
WHERE N <= DATEDIFF(DD, @BeginDate, @EndDate) + 1
;```

## Finding missing dates

So, what do we do now that we can generate a date range ? Anyone following these articles will know, I like to use the AdventureWorks database to get data to work against. Please use it now and run this:

```DECLARE @DateStart DATETIME = '2005-07-01 00:00:00.000'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4)

right join E t on dateadd(dd, t.N, @DateStart) = o.OrderDate
where o.orderdate is null and dateadd(dd, t.N, @DateStart) < getdate()```

We could pull the lowest date out in our SQL, but it probably won't change, so it's more efficient to get it and hard code it ( assuming this became a proc that was used regularly ). We do a couple of things here. We use an outer join to get null values to identify where the join failed. We use the tally table to generate dates. Adventureworks does use DateTime just for dates, and I am assuming that time component is always empty, this is easy to fix, but it would clutter the example. The only other thing we do, is make sure we don't count past 'today', because we don't care that we didn't get an order in the future.  The end result is the days on which no orders were received

Another thing we can do to make this even more useful, is to cull values that are not relevant to us. This version only returns days we didn't get an order, that are not a Saturday or Sunday.

```DECLARE @DateStart DATETIME = '2005-07-01 00:00:00.000'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4)

right join E t on dateadd(dd, t.N, @DateStart) = o.OrderDate
where o.orderdate is null
and dateadd(dd, t.N, @DateStart) < getdate()
and datepart(dw, dateadd(dd, t.N, @DateStart)) not in (1,7)```

## Find substring positions

Having to deal with a unit of data ( a string ) as a group of items, is always bad news in SQL and is best avoided. But, often we're called to work on existing systems and cannot change them. The following SQL will return the positions of a delimiter in a string. It would be trivial to make the delimiter multichar if we wanted to. Just change the 1 in the substring to len(@delimiter).

```DECLARE @val varchar(max) = 'Christian;Donna;Hannah;Calvin;Jenny;Joe', @delimiter char(1) = ';'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4)

SELECT n AS [Index]
FROM E
WHERE n <= LEN(@val)
AND SUBSTRING(@val, n, 1) = @delimiter
ORDER BY N
go```

## Extracting substrings

Now it's simple to write code that returns those substrings, instead of positions. Here's the SS2012 version:

```DECLARE @val varchar(max) = 'Christian;Donna;Hannah;Calvin;Jenny;Joe', @delimiter char(1) = ';'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4),
ind(N) as
(
SELECT n AS [Index]
FROM E
WHERE n <= LEN(@val)
AND SUBSTRING(@val, n, 1) = @delimiter
),
words(start, [end]) as
(
select lag(n, 1, -1) over (order by n) + 1, n - (lag(n, 1, -1)  over (order by n)) - 1 from ind
)

select substring(@val, start, [end]) from words as Names
GO```

If you run this, you'll find it does not emit the last string. This is because it works by finding the delimiters. There's two ways around this that I can see. First, always put a delimiter on the end of the string. Second, treat the end of the string like a delimiter in the first place. This SQL takes the second approach, and returns the delimiter length past the last index as the final position ( so we skip back to remove the delimiter, and end up with the full string ). Thanks to Duke Carey for pointing this out )

```DECLARE @val varchar(max) = 'Christian;Donna;Hannah;Calvin;Jenny;Joe', @delimiter char(1) = ';'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4),
ind(N) as
(
SELECT n AS [Index]
FROM E
WHERE n <= LEN(@val) + len(@delimiter)
AND (
SUBSTRING(@val, n, 1) = @delimiter or n = len(@val) + len(@delimiter)
)
),
words(start, [end]) as
(
select lag(n, 1, -1) over (order by n) + 1, n - (lag(n, 1, -1)  over (order by n)) - 1 from ind
)

select substring(@val, start, [end]) from words as Names
GO```

and here's a version for older SQL Server versions

```DECLARE @val varchar(max) = 'Christian;Donna;Hannah;Calvin;Jenny;Joe', @delimiter char(1) = ';'

;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),                          -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*100^2 or 10000 rows
E(N) as(select row_number() over (order by n) from e4),
ind(N, r) as
(
SELECT n AS [Index], row_number() over (order by n)
FROM E
WHERE n <= LEN(@val)
AND SUBSTRING(@val, n, 1) = @delimiter
),
words(start, [end]) as
(
select isnull(ind1.n, -1) + 1, isnull(ind2.n,len(@val) + 1)  from ind ind1 full join ind ind2 on ind1.r = ind2.r-1
)

select substring(@val, start, [end] - start) from words as Names```

## Conclusion

I've kind of changed my mind what I'd like to have called this series. I think a better name is 'Thinking in SQL', because if you're used to writing procedural code, it can be a challenge to stop thinking that way, and to think in terms of set based logic. A tally table is simply a way of using the database engine to do repetitive tasks the way it does things best, by working in sets. The next time you're tempted to write code that increases a counter one step at a time, either to count how often something is done, or to apply a sequence of numbers to an operation, stop and think if perhaps a tally table is a better way to solve your problem. I have considered creating a table based function to create my tally tables, but, a table based function creates a table in tempdb, which is exactly the thing I am setting out to avoid, by creating them the way that I do. If you have any further insight you'd like to offer on this possibility, I'm all ears, but for now I'm playing it safe and creating them on the fly.

## Share

 Software Developer (Senior) Australia
Programming computers ( self taught ) since about 1984 when I bought my first Apple ][. Was working on a GUI library to interface Win32 to Python, and writing graphics filters in my spare time, and then building n-tiered apps using asp, atl and asp.net in my job at Dytech. After 4 years there, I've started working from home, at first for Code Project and now for a vet telemedicine company. I owned part of a company that sells client education software in the vet market, but we sold that and I worked for the owners for five years before leaving to get away from the travel, and spend more time with my family. I now work for a company here in Hobart, doing all sorts of Microsoft based stuff in C++ and C#, with a lot of T-SQL in the mix.

 First Prev Next
 Going to need to ponder this one Chris Maunder23-Oct-16 16:11 Chris Maunder 23-Oct-16 16:11
 Very good article. Lee Ludden6-Mar-14 9:20 Lee Ludden 6-Mar-14 9:20
 Re: Very good article. Christian Graus6-Mar-14 11:02 Christian Graus 6-Mar-14 11:02
 My vote of 5 TinTinTiTin21-Jan-14 23:27 TinTinTiTin 21-Jan-14 23:27
 Very interesting article Duke Carey21-Jan-14 7:28 Duke Carey 21-Jan-14 7:28
 Re: Very interesting article Christian Graus21-Jan-14 7:46 Christian Graus 21-Jan-14 7:46
 Wow - thanks. I missed that. I will update the code to solve that, later today. An easy fix would be to append the delimiter, but I'll probably do that AND write a section on other ways to do it ( I have some ideas already ). Christian Graus My new article series is all about SQL !!!
 Re: Very interesting article Duke Carey21-Jan-14 8:12 Duke Carey 21-Jan-14 8:12
 Re: Very interesting article Christian Graus21-Jan-14 10:46 Christian Graus 21-Jan-14 10:46
 Re: Very interesting article Duke Carey22-Jan-14 2:38 Duke Carey 22-Jan-14 2:38
 Re: Very interesting article Christian Graus22-Jan-14 10:22 Christian Graus 22-Jan-14 10:22
 Last Visit: 26-Jun-19 2:53     Last Update: 26-Jun-19 2:53 Refresh 1