![]() |
Database »
Database »
SQL Server
Intermediate
License: The Code Project Open License (CPOL)
Dynamic Pivoting in Sql ServerBy Scott ClewellPivoting is great for data analisis, but not all data pivots nicely. You can use T-Sql to solve this problem. |
SQL, SQL Server (SQL 2000, SQL 2005, SQL CE), DBA
|
||||||||
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Pivoting data is sometimes necessary to create BI data sets. It can be easy to do, and it also can be a hard task to complete, especially if you do not know how wide your data is going to be when it pivots out. Using T-Sql, you can dynamically create the pivot queries instead of spend hours trying to find the values that need to be pivoted, or wirting out a long agerate query.
I have two examples: One pivots data from system tables and counts the number of columns it has with a certin data type. The second one finds the average number of days between dates on a table that is used for activity tracking. I'll break down the code and explain what is going on. If you are not fimilar with the pivot and unpivot syntaxes, read up on those before going on.
In this example, we are using the following system tables: sys.columns, sys.objects, and sys.types. We are going to find the number of columns that that tables have with the data types which they use. We could hard-code all of the data-types, then have a bunch of unused columns with 0 in them for every table, or we could dynamically make a list of the data-types that are used, the pivot from there.
First, we'll select the data into a global temporary table (##Tables).
if object_id('tempdb..##Tables') is not null
drop table ##Tables
select
c.Name As ColumnName
,t.Name As DataType
,o.Name As TableName
Into ##Tables
from sys.columns c
join sys.objects o on
o.object_id = c.object_id
And o.type_desc = 'USER_TABLE'
join sys.types t on
t.system_type_id = c.system_type_id
What we have in ##Tables is a list of all of the columns with their data type and parent table.
Next, a cursor is used and goes through the distinct data-types that we have in ##Table. We add each data-type that is in the table into @DtSql, which will be used later.
Declare @sql varchar(max)
,@DtSql varchar(max)
,@DataType varchar(100)
,@c bit
Set @DtSql = ''
Set @c = 0
Declare data_type cursor
for Select Distinct DataType
From ##Tables
Order By 1
Open data_type
fetch next from data_type
into @DataType
While @@Fetch_Status = 0
Begin
if(@c = 1)
Set @dtSql = @DtSql + ',[' + @DataType + ']'
else
Begin
Set @DtSql = @DtSql + '[' + @DataType + ']'
Set @c = 1
End
Fetch next from data_type
into @DataType
End
close data_type
deallocate data_typeFinally, we wirte out @Sql with the pivot syntax, and pivot on the list of values in @DtSql. We execute the @Sql and get the results.
Set @Sql = 'Select TableName, ' + @DtSql
Set @Sql = @Sql + '
From
(Select ColumnName, DataType, TableName From ##Tables) p
Pivot
(
Count(ColumnName)
For DataType in (' + @DtSql + ' )
) as pvt'
print @sql
exec(@Sql)
The final results will be similar to the below table.
| TableName | Bit | Varchar | Int | DateTime |
| Dates | 0 | 10 | 4 | 2 |
| Test | 3 | 0 | 2 | 0 |
The seniro for this next example is: You have a table that tracks activitys when accounts log on. You are asked to find the average number of days between logins for the accounts (and your ignorantly nod your head, not knowning the mess your about to get int).
The problem: There is no way to hard code the values to pivot on, making the pivot syntax difficult, if not unuseable. The complexity of finding average number of days between activities can sound easy, not realizing that your data is stored vertically, in this situation, and you need to have it pivot horozontally, then unpivot once you have the days between activities. Sounds confusing, eh?
Here is a small break down of what the tables will be doing:
They will start out vertically, with 1 activity per row:
| Id | Date |
| 1 | 20080701 |
| 1 | 20080705 |
| 1 | 20080710 |
| 2 | 20080615 |
| 2 | 20080627 |
| 2 | 20080702 |
| 2 | 20080711 |
| 2 | 20080714 |
Next, the activities will be laid out horozontally:
| Id | 5 | 4 | 3 | 2 | 1 |
| 1 | 20080701 | 20080705 | 20080710 | NULL | NULL |
| 2 | 20080615 | 20080627 | 20080702 | 20080711 | 20080714 |
After laying the table out horozontally, we will use the DateDiff function to find the difference in days:
| Id | DateDiff5 | DateDiff4 | DateDiff3 | DateDiff2 |
| 1 | 4 | 5 | 0 | 0 |
| 2 | 12 | 5 | 9 | 3 |
Finally, we unpivot and get the averages.
Now lets get into the code.
For this example, we're going to be using a small data set. The following will create a golbal temporary table and populate it.
--temp is the table we're going to use
if object_id('tempdb..##Temp') is not null
drop table ##Temp
Create Table ##Temp
(
Id int
,Date datetime
)
--Insert test values
Insert Into ##Temp
select 1, '20080701'
union all select 1, '20080705'
union all select 1, '20080710'
union all select 2, '20080615'
union all select 2, '20080627'
union all select 2, '20080702'
union all select 2, '20080711'
union all select 2, '20080714'
Next, we define the parameters that will be used, then find the maximum number of activities that happened for a single id. This number is how wide the table will be. We find that number by a simple subquery.
--Declare variables
Declare @Sql varchar(max)
,@Counter int
,@Max int
--Set counter to the most activity
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
Set @Max = @Counter
After we get the maximum width, the table will be created. A while loop is used to create columns with the counter number as the column name. The sql statement is dynamically created, then executed.
--Create a table with the width of the most activity
Set @Sql = 'Create Table ##Activity ( id int '
While @Counter > 0
Begin
Set @Sql = @Sql + ',[' + cast(@counter as varchar) + '] datetime'
Set @Counter = @Counter - 1
End
Set @Sql = @Sql + ')'
--Make sure table doesn't exist
if object_id('tempdb..##Activity') is not null
Drop Table ##Activity
Exec (@sql)Once the table is created, it wil now get populated by the following code. The code will look into the table with the data (##temp), and get the minimum date that has not been inserted yet. It will insert all values for the id's at once, which is done by sub-quering. The queries are executed in the while loop. After this code runs, the ##Activity table will look like the second one in the breakdown. Notice that we reset the counter back to the maximum width. This will be done before any step that loops.
--Insert id's into the table that was created
Insert Into ##Activity (id)
Select distinct Id From ##Temp
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Loop through and insert dates
While @Counter > 0
Begin
If (@Counter = @Max)
Begin
Set @Sql = 'Update ##Activity Set [' + Cast(@Counter as varchar)+ '] '
Set @Sql = @Sql + '=FirstActivity from ##Activity a '
Set @Sql = @Sql + 'Left Join (Select a.Id, min(Date) as FirstActivity From
##Activity a '
Set @Sql = @Sql + 'Left Join ##Temp t on a.Id = t.Id '
Set @Sql = @Sql + 'Group By a.id) t on a.id = t.Id'
Exec(@Sql)
Set @Counter = @Counter - 1
End
Else
Begin
Set @Sql = 'Update ##Activity Set [' + Cast(@Counter as varchar) + '] '
Set @Sql = @Sql + '= Activity from ##Activity a '
Set @Sql = @Sql + 'Left Join (Select a.Id, Min(Date) as Activity '
Set @Sql = @Sql + 'From ##Activity a Left Join ##Temp t on '
Set @Sql = @Sql + 'a.id = t.id where Date > '
Set @Sql = @Sql + '[' + Cast(@Counter + 1 as varchar) + '] '
Set @Sql = @Sql + 'group by a.id) t on a.id = t.id'
Exec(@Sql)
Set @Counter = @Counter - 1
End
End
Next, we use the DateDiff function to find the difference between days in the horozontal table. The Coalesce function is used if the date is null (indicating that width is shorter for that id) and sets the difference to 0. Coalescing the nulls to 0 will be accounted for when we do the final average. As usual, a loop is used to loop through all of the columns that have been dynamically created. Since the Insert Into syntax is being used, there will be another table created.
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Create Table with days between activity
Set @Sql = 'Select id '
While @Counter > 1
Begin
Set @Sql = @Sql + ',Coalesce(DateDiff(dd,'
Set @Sql = @Sql + '[' + Cast(@Counter as varchar) + '], '
Set @Sql = @Sql + '[' + Cast(@Counter - 1 as varchar) + ']),0) '
Set @Sql = @Sql + 'As [DateDiff' + Cast(@Counter as varchar) + ']'
Set @Counter = @Counter - 1
End
Set @Sql = @Sql + ' into ##Averages from ##Activity'
if object_id('tempdb..##Averages') is not null
drop table ##Averages
Exec(@sql)
The final part to this puzzle is to unpivot the data. Since we have a table that is in the right format, we can use the unpivot syntax. While loops are used twice to go through the columns: one time for the upper unpivot syntax, and one time for the lower unpivot syntax. The unpivot query is a sub-query of the Avg query. In the query that gets the average, a where clause is used to average only the columns that have a value greater than 0.
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Unpivot
Set @Sql = 'Select id, Avg(Cast(avgs as float))'
Set @sql = @Sql + ' from (select id, DiffDays,avgs '
Set @Sql = @Sql + ' from (select id'
while @Counter > 1
Begin
Set @Sql = @Sql + ',[datediff' + Cast(@Counter as varchar) + ']'
Set @Counter = @counter - 1
End
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
Set @Sql = @Sql + ' from ##averages) a unpivot (avgs for diffdays in ('
While @Counter > 1
Begin
if(@counter = @Max)
Begin
Set @Sql = @Sql + '[datediff' + Cast(@counter as varchar) + ']'
Set @Counter = @Counter - 1
End
Else
Begin
Set @Sql = @Sql + ',[datediff' + cast(@Counter as varchar) + ']'
Set @Counter = @Counter - 1
End
End
Set @Sql = @Sql + ')) as unpvt) f where avgs > 0 group by id'
Exec (@Sql)After all of that has executed, the results will turn out to be:
| Id | AverageDays |
| 1 | 4.5 |
| 2 | 7.5 |
If you haven't noticed yet, we haven't used the pivot syntax. Why is that?
Since we were building the table horozontally as we went on, we didn't need to use the pivot statement. Also, the pivot syntax wouldn't have helped much because we would have ended up doing the same steps with the population and the finding the difference in days. The unpivot statement was used because we have a table that was in a good format for that statement.
The evil software vendors don't always make data sets that can be used for quick and easy BI analisis that your evil boss wants.
Thanks for reading this far (I know it's quite a big article). Good luck!
if object_id('tempdb..##Tables') is not null
drop table ##Tables
select
c.Name As ColumnName
,t.Name As DataType
,o.Name As TableName
Into ##Tables
from sys.columns c
join sys.objects o on
o.object_id = c.object_id
And o.type_desc = 'USER_TABLE'
join sys.types t on
t.system_type_id = c.system_type_id
Declare @sql varchar(max)
,@DtSql varchar(max)
,@DataType varchar(100)
,@c bit
Set @DtSql = ''
Set @c = 0
Declare data_type cursor
for Select Distinct DataType
From ##Tables
Order By 1
Open data_type
fetch next from data_type
into @DataType
While @@Fetch_Status = 0
Begin
if(@c = 1)
Set @dtSql = @DtSql + ',[' + @DataType + ']'
else
Begin
Set @DtSql = @DtSql + '[' + @DataType + ']'
Set @c = 1
End
Fetch next from data_type
into @DataType
End
close data_type
deallocate data_type
print @DtSql
Set @Sql = 'Select TableName, ' + @DtSql
Set @Sql = @Sql + '
From
(Select ColumnName, DataType, TableName From ##Tables) p
Pivot
(
Count(ColumnName)
For DataType in (' + @DtSql + ' )
) as pvt'
print @sql
exec(@Sql)
--temp is the table we're going to use
if object_id('tempdb..##Temp') is not null
drop table ##Temp
Create Table ##Temp
(
Id int
,Date datetime
)
--Insert test values
Insert Into ##Temp
select 1, '20080701'
union all select 1, '20080705'
union all select 1, '20080710'
union all select 2, '20080615'
union all select 2, '20080627'
union all select 2, '20080702'
union all select 2, '20080711'
union all select 2, '20080714'
--Declare variables
Declare @Sql varchar(max)
,@Counter int
,@Max int
--Set counter to the most activity
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
Set @Max = @Counter
--Create a table with the width of the most activity
Set @Sql = 'Create Table ##Activity ( id int '
While @Counter > 0
Begin
Set @Sql = @Sql + ',[' + cast(@counter as varchar) + '] datetime'
Set @Counter = @Counter - 1
End
Set @Sql = @Sql + ')'
--Make sure table doesn't exist
if object_id('tempdb..##Activity') is not null
Drop Table ##Activity
Exec (@sql)
--Insert id's into the table that was created
Insert Into ##Activity (id)
Select distinct Id From ##Temp
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Loop through and insert dates
While @Counter > 0
Begin
If (@Counter = @Max)
Begin
Set @Sql = 'Update ##Activity Set [' + Cast(@Counter as varchar)+ '] '
Set @Sql = @Sql + '=FirstActivity from ##Activity a '
Set @Sql = @Sql + 'Left Join (Select a.Id, min(Date) as FirstActivity From
##Activity a '
Set @Sql = @Sql + 'Left Join ##Temp t on a.Id = t.Id '
Set @Sql = @Sql + 'Group By a.id) t on a.id = t.Id'
Exec(@Sql)
Set @Counter = @Counter - 1
End
Else
Begin
Set @Sql = 'Update ##Activity Set [' + Cast(@Counter as varchar) + '] '
Set @Sql = @Sql + '= Activity from ##Activity a '
Set @Sql = @Sql + 'Left Join (Select a.Id, Min(Date) as Activity '
Set @Sql = @Sql + 'From ##Activity a Left Join ##Temp t on '
Set @Sql = @Sql + 'a.id = t.id where Date > '
Set @Sql = @Sql + '[' + Cast(@Counter + 1 as varchar) + '] '
Set @Sql = @Sql + 'group by a.id) t on a.id = t.id'
Exec(@Sql)
Set @Counter = @Counter - 1
End
End
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Create Table with days between activity
Set @Sql = 'Select id '
While @Counter > 1
Begin
Set @Sql = @Sql + ',Coalesce(DateDiff(dd,'
Set @Sql = @Sql + '[' + Cast(@Counter as varchar) + '], '
Set @Sql = @Sql + '[' + Cast(@Counter - 1 as varchar) + ']),0) '
Set @Sql = @Sql + 'As [DateDiff' + Cast(@Counter as varchar) + ']'
Set @Counter = @Counter - 1
End
Set @Sql = @Sql + ' into ##Averages from ##Activity'
if object_id('tempdb..##Averages') is not null
drop table ##Averages
Exec(@sql)
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
--Unpivot
Set @Sql = 'Select id, Avg(Cast(avgs as float))'
Set @sql = @Sql + ' from (select id, DiffDays,avgs '
Set @Sql = @Sql + ' from (select id'
while @Counter > 1
Begin
Set @Sql = @Sql + ',[datediff' + Cast(@Counter as varchar) + ']'
Set @Counter = @counter - 1
End
--Reset Counter
Set @Counter = (Select Max(cnt)
From (Select Count(*) as cnt From ##temp group by Id) t)
Set @Sql = @Sql + ' from ##averages) a unpivot (avgs for diffdays in ('
While @Counter > 1
Begin
if(@counter = @Max)
Begin
Set @Sql = @Sql + '[datediff' + Cast(@counter as varchar) + ']'
Set @Counter = @Counter - 1
End
Else
Begin
Set @Sql = @Sql + ',[datediff' + cast(@Counter as varchar) + ']'
Set @Counter = @Counter - 1
End
End
Set @Sql = @Sql + ')) as unpvt) f where avgs > 0 group by id'
Exec (@Sql)
7/14/2008 - Published Article.
| You must Sign In to use this message board. | ||||||||
|
||||||||
|
||||||||
|
||||||||
|
||||||||
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 14 Jul 2008 Editor: |
Copyright 2008 by Scott Clewell Everything else Copyright © CodeProject, 1999-2009 Web21 | Advertise on the Code Project |