Click here to Skip to main content
Click here to Skip to main content

Tagged as

Correlated Sub-Query vs. Case Statement

, 7 Mar 2010
Rate this:
Please Sign up or sign in to vote.
A simple case study to compare the efficiency of using a Sub-Query vs a Case statement to perform data aggregation.

Introduction

This is a simple case study to compare the efficiency of using a correlated Sub-Query or a case statement to perform data aggregation.

Background

Here is a common scenario: I have a news web-site that displays stories grouped by the usual categories:

  • World news
  • Local news
  • Sport
  • Lifestyle
  • Etc.

When a user leaves the site, the number of page views for each news item are stored in the database.

 INSERT PageHits( News_ID, HitDate, NoHits )
   VALUES(@News_ID, @Date, @Hits)

Now, the powers that control the pay-check have asked for some reporting to be done on this information. Namely, they want to know the number of page hits by news category for each date in a range.

Date Local News World News Local Sport World Sport Business Lifestyle Crime Weather Total Hits
2010-02-25 1 2 2 4 0 0 0 0 9
2010-02-26 1 36 2 0 0 5 0 0 44
2010-02-27 35 10 0 0 0 2 0 0 47

To demonstrate the steps required, let's build a simple sample database named NewsSite:

  USE Master
  GO

  CREATE DATABASE NewsSite
  Go

  -- Now add some tables...

  USE NewsSite
  GO

  CREATE TABLE Category(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Name VARCHAR(30) UNIQUE NOT NULL
  )
  GO

  CREATE TABLE NewsItem(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Cat_ID INT NOT NULL FOREIGN KEY REFERENCES Category(ID),
    ItemDate DATETIME NOT NULL DEFAULT GETDATE(),
    Name VarChar(100) NOT NULL
  )
  GO

  CREATE TABLE NewsHits(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Date DateTime Not Null Default GETDATE(),
    News_ID INT NOT NULL FOREIGN KEY REFERENCES NewsItem(ID),
    Hits INT NOT NULL
  )
  GO

Now we add some dummy data (I will skip some of the boring stuff, but you will get the picture...

USE NewsSite
GO

-- Category Table
INSERT Category(Name)
  VALUES('Local News')
INSERT Category(Name)
  VALUES('World News')
INSERT Category(Name)
  VALUES('Local Sport')
INSERT Category(Name)
  VALUES('World Sport')
INSERT Category(Name)
  VALUES('Business')
INSERT Category(Name)
  VALUES('Lifestyle')
INSERT Category(Name)
  VALUES('Crime')
INSERT Category(Name)
  VALUES('Weather') 

-- NewsItem Table
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(1, '2/27/2010','Man Bites Dog')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(1, '2/27/2010','Rat Bites Cat')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(3, '2/27/2010','Rugby League Player Not in Trouble')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(4, '2/27/2010','NFL Player Goes to Jail')
-- ...
-- ...
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(5, '2/27/2010','USD Plummets')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(7, '2/27/2010','Murder in East LA')
INSERT NewsItem(Cat_ID, ItemDate, ItemTitle)
  VALUES(8, '2/27/2010','Chile Tsunami Warning') 

-- NewsHits Table
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 1, 2)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 3, 1)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/25/2010', 7, 4)
-- ...
-- ...
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/27/2010', 2, 2)
INSERT NewsHits(Date, News_ID, Hits)
  VALUES('2/26/2010', 4, 7)
Go

The Correlated Sub_Query

Here it is in all it's glory - avert your eyes if they are easily damaged:

USE NewsSite
GO

-- Outer Query
SELECT nh.Date,  CASE ISNULL(ln.LocalNews, 0)
                  WHEN 0 THEN 0
                   ELSE ln.LocalNews
                   END AS LocalNews,
                CASE ISNULL(wn.WorldNews, 0)
                   WHEN 0 THEN 0
                   ELSE wn.WorldNews
                   END AS WorldNews,
                CASE ISNULL(ls.LocalSport, 0)
                   WHEN 0 THEN 0
                   ELSE ls.LocalSport
                   END AS LocalSport,
                CASE ISNULL(ws.WorldSport, 0)
                   WHEN 0 THEN 0
                   ELSE ws.WorldSport
                   END AS WorldSport,
                CASE ISNULL(b.Business, 0)
                   WHEN 0 THEN 0
                   ELSE b.Business
                   END AS Business,
                CASE ISNULL(l.Lifestyle, 0)
                   WHEN 0  THEN 0
                   ELSE l.Lifestyle
                   END AS LifeStyle,
                CASE ISNULL(c.Crime, 0)
                   WHEN 0  THEN 0
                   ELSE c.Crime
                   END AS Crime,
                CASE ISNULL(w.Weather, 0)
                   WHEN 0  THEN 0
                   ELSE w.Weather
                   END AS Weather,
                SUM(nh.Hits) As TotalHits
                
 FROM NewsHits nh
  -- Inner Queries...
  LEFT JOIN (SELECT nhh.Date, SUM(nhh.Hits) As LocalNews
              FROM NewsHits nhh
                JOIN NewsItem nii
                  ON nhh.News_ID = nii.ID
                JOIN Category cc
                  ON nii.Cat_ID = cc.ID
              WHERE cc.Name = 'Local News'
              GROUP BY nhh.Date) ln
  ON nh.Date = ln.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits)  As WorldNews
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'World News'
                GROUP BY nhh.Date) wn
  ON nh.Date = wn.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As LocalSport
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Local Sport'
                GROUP BY nhh.Date) ls
  ON nh.Date = ls.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As WorldSport
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'World Sport'
                GROUP BY nhh.Date) ws
  ON nh.Date = ws.Date
 
  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Business
               FROM NewsHits nhh
                 JOIN NewsItem nii
                   ON nhh.News_ID = nii.ID
                 JOIN Category cc
                   ON nii.Cat_ID = cc.ID
               WHERE cc.Name = 'Business'
               GROUP BY nhh.Date) b
  ON nh.Date = b.Date

  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Crime
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Crime'
                GROUP BY nhh.Date) c
  ON nh.Date = c.Date

  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As Weather
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Weather'
                GROUP BY nhh.Date) w
  ON nh.Date = w.Date

  LEFT JOIN ( SELECT nhh.Date, SUM(nhh.Hits) As LifeStyle
                FROM NewsHits nhh
                  JOIN NewsItem nii
                    ON nhh.News_ID = nii.ID
                  JOIN Category cc
                    ON nii.Cat_ID = cc.ID
                WHERE cc.Name = 'Lifestyle'
                GROUP BY nhh.Date) l
  ON nh.Date = l.Date

WHERE nh.Date  BETWEEN '2/24/2010' AND '2/28/2010'
GROUP BY nh.Date, ln.LocalNews, wn.WorldNews, ls.LocalSport,
         ws.WorldSport, b.Business, c.Crime, w.Weather,
         l.LifeStyle
GO

Ugly, isn't it? And according to my more learned friends, not a particularly brutal version of the species!

It is pretty simple to follow, the outer query returns the results of the sub-queries for each date in the range. Run the sample script and you will see output similar to the table above. I was quite happy to use this (or something similar but worse looking) in a production environment when it was kindly pointed out that, in this case at least, I may be better served by using a simpler CASE statement construct.

The CASE Statement

USE NewsSite
Go

SELECT nh.[Date],
       SUM(CASE WHEN c.Name = 'Local News'
                THEN nh.Hits ELSE 0 END) AS 'Local News',
       SUM(CASE WHEN c.Name = 'World News'
                THEN nh.Hits ELSE 0 END) AS 'World News',
       SUM(CASE WHEN c.Name = 'Local Sport'
                THEN nh.Hits ELSE 0 END) AS 'Local Sport',
       SUM(CASE WHEN c.Name = 'World Sport'
                THEN nh.Hits ELSE 0 END) AS 'World Sport',
       SUM(CASE WHEN c.Name = 'Business'
                THEN nh.Hits ELSE 0 END) AS 'Business',
       SUM(CASE WHEN c.Name = 'Lifestyle'
                THEN nh.Hits ELSE 0 END) AS 'Lifestyle',
       SUM(CASE WHEN c.Name = 'Crime'
                THEN nh.Hits ELSE 0 END) AS 'Crime',
       SUM(CASE WHEN c.Name = 'Weather'
                THEN nh.Hits ELSE 0 END) AS 'Weather',
       SUM(nh.Hits) As [Total Hits]
FROM NewsHits nh
  JOIN NewsItem ni
    ON ni.ID = nh.News_ID
  JOIN Category c
    ON ni.Cat_ID = c.ID
WHERE nh.[Date] BETWEEN '2/24/2010' AND '2/28/2010'
GROUP BY nh.[Date]
GO

That's a lot better, easier on the eye, and easier to understand at a glance.

The clincher comes when examining the execution times of the queries:

SP Results CSQ CASE
Number of INSERT, DELETE and UPDATE statements 0 0
Rows affected by INSERT, DELETE and UPDATE statements 0 0
Number of SELECT statements 1 1
Rows returned by SELECT statements 3 3
Number of Transactions 0 0
Network Statistics    
Number of server roundtrips 1 1
TDS packets sent from client 3 1
TDS packets received from server 1 1
Bytes sent from client 8748 1990
Bytes received from server 435 445
Time Statistics    
Client processing time (ms) 1.2 4.3
Total execution time (ms) 9.6 5.2
Wait time on server replies (ms) 8.4 0.9

The CASE statement runs nearly twice as fast as the Correlated Sub-Query example (9.6 ms vs 5.5 ms).

The CSQ spent 1.2 ms processing on the client and then 8.4 sec waiting on the server, whereas the CASE example spends 4.2 ms processing on the client and then a miniscule 0.9 ms waiting for the server.

This may not be significant in this small example, but in a situation where there may be thousands or more records, the CSQ starts to fade into the distance in this race.

Using a similar structure on a large dataset, the CASE example was more than 100X faster than the CSQ. Granted, after adding an index or two, I was able to reduce this to about 16:1.

Don't Jump Too Fast...

While I was able to substitute a different method in this case, it may not always be possible because of joined table structures, etc. and you may be 'forced' to use the Correlated Sub-Query instead. If you do go down this path, remember to set the appropriate indexes to ensure your query runs as efficiently as possible.

Points of Interest

This article came about because of a short discussion in the General Database Forum, and shows how the first "good' idea you may have when it comes to a solution may not always be the best one.

Oh, and CodeProject is a great technical resource, frequented by some talented and generous people - thanks to Mycroft Holmes and i.jrussell for pointing me down this path.  

History

  • Version 1

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Andy_L_J
Other
Australia Australia
Born in New Zealand and transported to Australia for crimes against good taste in the eighties.
 
Employed by large Manufacturing Co to Coordinate Production Processes, Integrate new technologies, Implement new, in-house MRP/ERP.
 
Out of my depth, challenged and love it.
 
Summer Soccer: http://www.pipekingsharkssummersoccer.com.au
Follow on   Twitter

Comments and Discussions

 
GeneralPivoting Pinmemberchristopherliu30-Mar-10 11:41 
GeneralRe: Pivoting PinmemberAndy_L_J2-Apr-10 13:22 
GeneralRe: Pivoting PinmemberChristopher Liu6-Apr-10 6:01 
GeneralRe: Pivoting PinmemberAndy_L_J6-Apr-10 19:59 
GeneralI didn't know you could do that with a CASE statement. PinmemberAshaman19-Mar-10 2:58 
GeneralMy vote of 1 PinmemberSandeep Mewara8-Mar-10 1:54 
GeneralRe: My vote of 1 PinmemberAndy_L_J8-Mar-10 7:13 
GeneralRe: My vote of 1 [modified] PinmemberTheGuru378-Mar-10 15:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140827.1 | Last Updated 8 Mar 2010
Article Copyright 2010 by Andy_L_J
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid