SQL Server's Ranking functions can be very useful and handy if we use them in a right way and thoughtfully; otherwise, they become evil very fast! In this tip I will talk about my experience with
NTILE function but the same story can be expanded to all other ranking functions.
After spending several hours on why a script was generating different results against same sets of data, I discovered the evil. It was a set of unwise
NTILE function calls. For those who may not be familiar with this function, please check this MSDN page. All rank functions contain an over clause which tells the function which column(s) to be used as reference.
Here is an example for
NTILE and how it becomes evil -
Imagine the following table of data (named
I picked simle names
F in order to make it easier to follow the scenario.
Now assume we have the following script:
SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost]) AS [Rank] FROM [MyTable]
Running the script agains
MyTable you will have the following result set:
As you can see,
NTILE does exactly what it is asked to do. Let's run the same script against the same table with slightly different data order:
As you can see, the tables contents are the same but only the order of data has changed. After running the very same script, you will get the following result set:
Now, let's compare the rankings in two result sets side by side:
|Result 1||Result 2|
Can you see the evil? Running the same script against the same data sets with different data order results in totally different outputs.
NTILE by itself is not an evil but the way we assumed it would work was wrong. In script we are asking it to order the
Cost column and then group them into 6 groups. The only order it considers is
Cost because that's what we asked for. Who to blame?!
How to fix it?
When we use rank functions we need to be very careful. We need to consider many things and check what we are asking them to do wisely. Rank functions can receive several orders. What we need to do is that we need to make sure all required orderings are in place. That way we can make sure the data order does not affect our results. In this case in particular, one way to fix the discrepancy is to add a new order to our
SELECT [Name], [Cost], NTILE(6) OVER(ORDER BY [Cost], [Name]) AS [Rank] FROM [MyTable]
We are adding Name column to the order clause. It does not matter if it as
DESC; what matters is that we will get the same results if we run it against any data set with the same data but different order.