Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

SQL GROUP By and the Column 'name' is invalid in the select list because... error

4.49/5 (19 votes)
2 Jul 2016CPOL2 min read 236.9K  
Why do I get an error Column 'name' is invalid in the select list because t is not contained in either an aggregate function or the GROUP BY clause.? Well, it's pretty simple when you think about what GROUP BY does.

Introduction

We often get questions asking how to fix the SQL error "Column 'name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." but it's not an error in the SQL - it's an error in what you are trying to do. Let's look at GROUP BY and see what it does, and why you get the error.

Background

If we have a couple of tables:

Transactions

ID    UserID    TimeStamp                   Value
1          1    2016-01-01 14:37:00.000    123.87
2          1    2016-01-01 15:43:00.000    -84.15
3          1    2016-01-02 09:41:00.000     92.34
4          2    2016-01-05 12:14:00.000     37.88
5          3    2016-01-31 17:27:00.000    102.44
6          1    2016-02-02 00:00:00.000    453.92
7          2    2016-02-14 00:00:00.000     87.32

Users

ID    UserName
1     Joe White
2     Mike Green
3     Sarah Brown
4     Gillian Black

We can easily do queries on these to give us activity reports:

SQL
SELECT u.UserName, t.TimeStamp, t.Value FROM Transactions t JOIN Users u ON u.ID = t.UserID

And we get:

UserName     TimeStamp                   Value
Joe White    2016-01-01 14:37:00.000    123.87
Joe White    2016-01-01 15:43:00.000    -84.15
Joe White    2016-01-02 09:41:00.000     92.34
Mike Green   2016-01-05 12:14:00.000     37.88
Sarah Brown  2016-01-31 17:27:00.000    102.44
Joe White    2016-02-02 00:00:00.000    453.92
Mike Green   2016-02-14 00:00:00.000     87.32

But what if we want a balance for each user instead?

That's not complicated - that's what GROUP BY is there for

So we use a GROUP BY:

SQL
SELECT u.UserName, SUM(t.Value) AS Balance
FROM Transactions t
JOIN Users u ON u.ID = t.UserID
GROUP BY u.UserName

And we get a balance for all active users:

UserName     Balance
Joe White     585.98
Mike Green    125.20
Sarah Brown   102.44

What the GROUP BY has done is collect all the related rows (in this case all those with the same UserName) together and then apply an Aggregate Function (in this case SUM) to each group:

UserName     TimeStamp                   Value
Joe White    2016-01-01 14:37:00.000    123.87 
Joe White    2016-01-01 15:43:00.000    -84.15 
Joe White    2016-01-02 09:41:00.000     92.34 
Joe White    2016-02-02 00:00:00.000    453.92 
                                        ------ 
                                        585.98 

Mike Green   2016-01-05 12:14:00.000     37.88 
Mike Green   2016-02-14 00:00:00.000     87.32 
                                        ------ 
                                        125.20 

Sarah Brown  2016-01-31 17:27:00.000    102.44 
                                        ------ 
                                        102.44

So each group has got two unique elements: the UserName ("Joe White", "Mike Green", "Sarah Brown") and the Balance (585.98, 125.20, 102.44)

But...each group also has a bunch of TimeStamp values. What happens if we try to use that as well?

SQL
SELECT u.UserName, t.TimeStamp, (t.Value) AS Balance
FROM Transactions t
JOIN Users u ON u.ID = t.UserID
GROUP BY u.UserName

Try it, and you get an error: 

Msg 8120, Level 16, State 1, Line 1
Column 'Transactions.TimeStamp' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Why? Think about it for a moment, and it's pretty obvious: Which TimeStamp value do you want SQL to use? It's got a choice of 5 for Joe White, 2 for Mike Brown, and only one for poor Sarah! It can't decide between them, because it doesn't know what information you are trying to get, and it doesn't want to give you the wrong info.

You can only list columns which are in the GROUP BY list, or in a function which returns a single value for a range of rows - called an "Aggregate function" because it aggregates (or combines) information.

For SQL Server, the Aggregate functions are:

AVG             Average of values
CHECKSUM_AGG    Returns the checksum of the values
COUNT	        Number of items
COUNT_BIG	    Number of items as a BIGINT
GROUPING	    }
GROUPING_ID	    } - See https://msdn.microsoft.com/en-us/library/ms178544.aspx
MAX	            Largest value
MIN             Smallest value
STDEV           }
STDEVP          } - Returns the statistical standard deviation
SUM             Total
VAR             }
VARP            } - Returns the statistical variance

If you try to use any other function, you will get an error.

So think about your data, think about what you are trying to report on, and find the appropriate function.

Don't think "oh, that's easy! I can just add to the GROUP BY list!"

SQL
SELECT u.UserName, t.TimeStamp SUM(t.Value) AS Balance
FROM Transactions t
JOIN Users u ON u.ID = t.UserID
GROUP BY u.UserName, t.TimeStamp

Because that creates more groups, not allows you to use more columns in your SELECT LIST.

When you use more than one grouping field, SQL creates a group for each of the rows with distinct values in any of the named fields: in our case it will put each row in a separate group as there are no two identical timestamp values!

History

2016-07-02 First version

2012-08-23 Spelling error: "agregate" for "aggregate". Six times. I got it right once!:O

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)