Click here to Skip to main content
Click here to Skip to main content

Calculating simple running totals in SQL Server

By , 15 Dec 2011
 

Introduction

One typical question is, how to calculate running totals in SQL Server. There are several ways of doing it and this article tries to explain a few of them.

Test environment

First we need a table for the data. To keep things simple, let's create a table with just an auto incremented id and a value field.

--------------------------------------------------------------------
-- table for test 
--------------------------------------------------------------------
CREATE TABLE RunTotalTestData (
   id    int not null identity(1,1) primary key,
   value int not null
);

And populate it with some data:

--------------------------------------------------------------------
-- test data
--------------------------------------------------------------------
INSERT INTO RunTotalTestData (value) VALUES (1);
INSERT INTO RunTotalTestData (value) VALUES (2);
INSERT INTO RunTotalTestData (value) VALUES (4);
INSERT INTO RunTotalTestData (value) VALUES (7);
INSERT INTO RunTotalTestData (value) VALUES (9);
INSERT INTO RunTotalTestData (value) VALUES (12);
INSERT INTO RunTotalTestData (value) VALUES (13);
INSERT INTO RunTotalTestData (value) VALUES (16);
INSERT INTO RunTotalTestData (value) VALUES (22);
INSERT INTO RunTotalTestData (value) VALUES (42);
INSERT INTO RunTotalTestData (value) VALUES (57);
INSERT INTO RunTotalTestData (value) VALUES (58);
INSERT INTO RunTotalTestData (value) VALUES (59);
INSERT INTO RunTotalTestData (value) VALUES (60);

The scenario is to fetch a running total when the data is ordered ascending by the id field.

Correlated scalar query

One very traditional way is to use a correlated scalar query to fetch the running total so far. The query could look like:

--------------------------------------------------------------------
-- correlated scalar
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

When this is run, the results are:

id   value   running total
--   -----   -------------
1    1       1
2    2       3
3    4       7
4    7       14
5    9       23
6    12      35
7    13      48
8    16      64
9    22      86
10   42      128
11   57      185
12   58      243
13   59      302
14   60      362

So there it was. Along with the actual row values, we have a running total. The scalar query simply fetches the sum of the value field from the rows where the ID is equal or less than the value of the current row. Let us look at the execution plan:

600_CorrelatedScalarPlan.jpg

What happens is that the database fetches all the rows from the table and using a nested loop, it again fetches the rows from which the sum is calculated. This can also be seen in the statistics:

Table 'RunTotalTestData'. Scan count 15, logical reads 30, physical reads 0...

Using join

Another variation is to use join. Now the query could look like:

--------------------------------------------------------------------
-- using join 
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;

The results are the same but the technique is a bit different. Instead of fetching the sum for each row, the sum is created by using a GROUP BY clause. The rows are cross joined restricting the join only to equal or smaller ID values in B. The plan:

600_SubsetJoinPlan.jpg

The plan looks somewhat different and what actually happens is that the table is read only twice. This can be seen more clearly with the statistics.

Table 'RunTotalTestData'. Scan count 2, logical reads 31...

The correlated scalar query has a calculated cost of 0.0087873 while the cost for the join version is 0.0087618. The difference isn't much but then again it has to be remembered that we're playing with extremely small amounts of data.

Using conditions

In real-life scenarios, restricting conditions are often used, so how are conditions applied to these queries. The basic rule is that the condition must be defined twice in both of these variations. Once for the rows to fetch and the second time for the rows from which the sum is calculated.

If we want to calculate the running total for odd value numbers, the correlated scalar version could look like the following:

--------------------------------------------------------------------
-- correlated scalar, subset
--------------------------------------------------------------------
SELECT a.id, a.value, (SELECT SUM(b.value)
                       FROM RunTotalTestData b
                       WHERE b.id <= a.id
                       AND b.value % 2 = 1)
FROM  RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

The results are:

id   value   runningtotal
--   -----   ------------
1    1       1
4    7       8
5    9       17
7    13      30
11   57      87
13   59      146

And with the join version, it could be like:

--------------------------------------------------------------------
-- with join, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id        <= a.id
AND   a.value % 2  = 1
AND   b.value % 2  = 1
GROUP BY a.id, a.value
ORDER BY a.id;

When actually having more conditions, it can be quite painful to maintain the conditions correctly. Especially if they are built dynamically.

Calculating running totals for partitions of data

If the running total needs to be calculated to different partitions of data, one way to do it is just to use more conditions in the joins. For example, if the running totals would be calculated for both odd and even numbers, the correlated scalar query could look like:

--------------------------------------------------------------------
-- correlated scalar, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, (SELECT SUM(b.value) 
                               FROM RunTotalTestData b
                               WHERE b.id <= a.id
                               AND b.value%2 = a.value%2)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

The results:

even   id   value   running total
----   --   -----   -------------
0      2    2       2
0      3    4       6
0      6    12      18
0      8    16      34
0      9    22      56
0      10   42      98
0      12   58      156
0      14   60      216
1      1    1       1
1      4    7       8
1      5    9       17
1      7    13      30
1      11   57      87
1      13   59      146

So now the partitioning condition is added to the WHERE clause of the scalar query. When using the join version, it could be similar to:

--------------------------------------------------------------------
-- with join, partitioning
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(b.Value)
FROM   RunTotalTestData a,
       RunTotalTestData b
WHERE b.id      <= a.id
AND   b.value%2  = a.value%2
GROUP BY a.value%2, a.id, a.value
ORDER BY a.value%2, a.id;

With SQL Server 2012

SQL Server 2012 makes life much more simpler. With this version, it's possible to define an ORDER BY clause in the OVER clause.

So to get the running total for all rows, the query would look:

--------------------------------------------------------------------
-- Using OVER clause
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.id;

The syntax allows to define the ordering of the partition (which in this example includes all rows) and the summary is calculated in that order.

To define a condition for the data, it doesn't have to be repeated anymore. The running total for odd numbers would look like:

--------------------------------------------------------------------
-- Using OVER clause, subset
--------------------------------------------------------------------
SELECT a.id, a.value, SUM(a.value) OVER (ORDER BY a.id)
FROM   RunTotalTestData a
WHERE a.value % 2 = 1
ORDER BY a.id;

And finally, partitioning would be:

--------------------------------------------------------------------
-- Using OVER clause, partition
--------------------------------------------------------------------
SELECT a.value%2, a.id, a.value, SUM(a.value) OVER (PARTITION BY a.value%2 ORDER BY a.id)
FROM   RunTotalTestData a
ORDER BY a.value%2, a.id;

What about the plan? It's looking very different. For example, the simple running total for all rows looks like:

600_OverOrderBy.jpg

And the statistics:

Table 'Worktable'. Scan count 15, logical reads 85, physical reads 0...
Table 'RunTotalTestData'. Scan count 1, logical reads 2, physical reads 0...

Even though the scan count looks quite high at first glance, it isn't targeting the actual table but a worktable. The worktable is used to store intermediate results which are then read in order to create the calculated results.

The calculated cost for this query is now 0.0033428 while previously with the join version, it was 0.0087618. Quite an improvement.

References

History

  • December 16, 2011: Created.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Mika Wendelius
Architect
Finland Finland
I've been a programmer since mid 80's using languages like assembler, C/C++, PL/I (mainframe environment), pascal, VB (I know, I know, no comments please) and C# and utilizing different techniques and tools.
 
However I'm specialized in databases and database modeling. Mostly I have used products like Oracle (from version 6), SQL Server (from version 4.2), DB2 and Solid Server (nowadays an IBM product).
 
For the past 10+ years my main concerns have been dealing with different business processes and how to create software to implement and improve them. At my spare time (what ever that actually means) I'm also teaching and consulting on different areas of database management, development and database oriented software design.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberhadeed147229-Dec-12 0:05 
very easy Solution...
GeneralRe: My vote of 5mvpMika Wendelius29-Dec-12 0:07 
Thank you Smile | :)
The need to optimize rises from a bad design.My articles[^]

QuestionExcellent. but side questionmemberfronjm0513-Nov-12 14:41 
In reference to what you said here:
 
"The scenario is to fetch a running total when the data is ordered ascending by the id field."
 
I am having a hard time finding an example online that will return a RANDOM set of data, with a running total like this. So based on your example, is it even possible to SELECT top 3 RANDOM records from your table, where the running total does not exceed xx? Thank you.
AnswerRe: Excellent. but side questionmvpMika Wendelius29-Dec-12 0:10 
By using aggregates, not as far as I know. Aggregates are feeded by the results from the select so they do not restrict the rows, just control the calculation.
 
However, you could use a Table Valued Function for that. For example, have a look at Using Table-Valued Functions in SQL Server[^]
The need to optimize rises from a bad design.My articles[^]

QuestionSome SuggessionsmemberAsif Rehman30-Oct-12 5:02 
Its an excellent article. I've already rated it 5.
Here are some suggestions
1. Plan images are very fade and of poor quality. Very difficult to analyze for reader.
2. At the end of the article, comparison of all approaches in cross table type with extra field like performance % would make this article even more useful for the reader.
 
3. Please write next article as soon as possible as I cannot wait for your next article. Big Grin | :-D
AnswerRe: Some SuggessionsmvpMika Wendelius30-Oct-12 6:37 
Thanks for your comments, they are much appreciated Smile | :)
 
At least for the bullets 1&2 I'll try to update few of my articles in the near future. I've got very good suggestions so I'll try to make the modifications as soon as possible.
 
Thanks again!
The need to optimize rises from a bad design.My articles[^]

GeneralMy vote of 5mvpLuc Pattyn9-Feb-12 1:25 
Good to know.
 
Thumbs Up | :thumbsup:
GeneralRe: My vote of 5mvpMika Wendelius9-Feb-12 7:22 
Thanks Smile | :)
The need to optimize rises from a bad design.My articles[^]

GeneralRe: My vote of 5mvpLuc Pattyn9-Feb-12 8:02 
You're welcome.
 
Smile | :)
Luc Pattyn [My Articles] Nil Volentibus Arduum

Fed up by FireFox memory leaks I switched to Opera and now CP doesn't perform its paste magic, so links will not be offered. Sorry.

GeneralMy vote of 5memberandrew458218-Dec-11 8:26 
Excellent
GeneralRe: My vote of 5memberMika Wendelius18-Dec-11 8:30 
Thank you Smile | :)
The need to optimize rises from a bad design.My articles[^]

GeneralMy vote of 5mvpthatraja15-Dec-11 17:39 
Good one. BTW please use the attribute lang=xml in all pre Tags(only for SQL blocks)
GeneralRe: My vote of 5memberMika Wendelius15-Dec-11 18:59 
Thanks Smile | :) I'll check the pre tags.
The need to optimize rises from a bad design.My articles[^]

GeneralRe: My vote of 5memberMika Wendelius16-Dec-11 21:27 
Hi,
 
I checked the pre tags and all the SQL blocks use lang="SQL" and all the results lang="text". Do you have trouble in seeing them correctly?
The need to optimize rises from a bad design.My articles[^]

GeneralRe: My vote of 5mvpthatraja20-Dec-11 6:19 
Mika Wendelius wrote:
Do you have trouble in seeing them correctly?
Still all text displaying in plain text format. For example keywords must be in blue color in code block. Check this Tip/Trick to see that Record Count of Tables in SQL Server[^]
thatraja

My Dad had a Heart Attack on this day so don't...
Pompeyboy3 here
| Nobody remains a virgin, Life screws everyone Sigh | :sigh:

GeneralRe: My vote of 5memberMika Wendelius20-Dec-11 6:38 
This is really odd. If I open this in IE9, no coloring but everything is fine in Chrome. But what's odd is that if I open it in IE again, everything is fine Confused | :confused: Can you confirm this behaviour?
The need to optimize rises from a bad design.My articles[^]

GeneralRe: My vote of 5mvpthatraja20-Dec-11 7:35 
Did you change anything? Now the syntax coloring is fine.
thatraja

My Dad had a Heart Attack on this day so don't...
Pompeyboy3 here
| Nobody remains a virgin, Life screws everyone Sigh | :sigh:

GeneralRe: My vote of 5memberMika Wendelius20-Dec-11 7:47 
nope, nothing done. Actually I see this problem randomly with other articles also Confused | :confused:
Perhaps time to see the good doctor Unsure | :~
Smile | :)
The need to optimize rises from a bad design.My articles[^]

GeneralRe: My vote of 5mvpthatraja20-Dec-11 7:50 
Mika Wendelius wrote:
Perhaps time to see the good doctor Unsure | :~
You or me or both? Big Grin | :-D
thatraja

My Dad had a Heart Attack on this day so don't...
Pompeyboy3 here
| Nobody remains a virgin, Life screws everyone Sigh | :sigh:

GeneralRe: My vote of 5memberMika Wendelius20-Dec-11 8:04 
Certainly me Big Grin | :-D
The need to optimize rises from a bad design.My articles[^]

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130617.1 | Last Updated 15 Dec 2011
Article Copyright 2011 by Mika Wendelius
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid