65.9K
CodeProject is changing. Read more.
Home

Use SQL Server's Ranking Functions Wisely

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.88/5 (5 votes)

May 2, 2013

CPOL

3 min read

viewsIcon

40950

Beware of What We Ask Them to Do

Introduction

SQL Server's Ranking functions can be very useful and handy if we use them in a right way and thoughtfully; otherwise, they become evil very fast! In this tip I will talk about my experience with NTILE function but the same story can be expanded to all other ranking functions.

Background

  After spending several hours on why a script was generating different results against same sets of data, I discovered the evil. It was a set of unwise NTILE function calls. For those who may not be familiar with this function, please check this MSDN page. All rank functions contain an over clause which tells the function which column(s) to be used as reference.

The Problem

Here is an example for NTILE and how it becomes evil -

Imagine the following table of data (named MyTable):

Name Cost
C 1
B 1
D 4
A 1
F 3
E 2

I picked simle names A to F in order to make it easier to follow the scenario.

Now assume we have the following script:

SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost]) AS [Rank] FROM [MyTable]

Running the script agains MyTable you will have the following result set:

Name Cost Rank
C 1 1
B 1 2
A 1 3
E 2 4
F 3 5
D 4 6

As you can see, NTILE does exactly what it is asked to do. Let's run the same script against the same table with slightly different data order:

Name Cost
B 1
A 1
E 2
C 1
D 4
F 3

As you can see, the tables contents are the same but only the order of data has changed. After running the very same script, you will get the following result set:

Name Cost Rank
B 1 1
A 1 2
C 1 3
E 2 4
F 3 5
D 4 6

Now, let's compare the rankings in two result sets side by side:

Result 1 Result 2
Name Rank Rank
A 3 2
B 2 1
C 1 3
D 4 4
E 5 5
F 6 6

 Can you see the evil? Running the same script against the same data sets with different data order results in totally different outputs. NTILE by itself is not an evil but the way we assumed it would work was wrong. In script we are asking it to order the Cost column and then group them into 6 groups. The only order it considers is Cost because that's what we asked for. Who to blame?!

How to fix it?

When we use rank functions we need to be very careful. We need to consider many things and check what we are asking them to do wisely. Rank functions can receive several orders. What we need to do is that we need to make sure all required orderings are in place. That way we can make sure the data order does not affect our results. In this case in particular, one way to fix the discrepancy is to add a new order to our NTILE function:

SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost], [Name]) AS [Rank] FROM [MyTable]

 We are adding Name column to the order clause. It does not matter if it as ASC or DESC; what matters is that we will get the same results if we run it against any data set with the same data but different order.

Name Cost Rank
A 1 1
B 1 2
C 1 3
E 2 4
F 3 5
D 4 6