In the last blog post, we have discussed about COALESCE function in SQL Server. You can read that article here. In this article we will go over UNION and UNION ALL Operators in SQL Server.
In this article, we will discuss about following:
- Purpose of UNION and UNION ALL.
- Differences between UNION and UNION ALL.
- Differences between JOIN and UNION.
UNION and UNION ALL operators in SQL Server are used to combine the result sets of two or more SELECT queries.
Let’s understand what is meant by this with an example. We have 2 tables here, tblIndiaCustomers and tblUSCustomers. Both of these tables have identical columns – Id, Name and Email.
Now if we issue a select query against these 2 tables and execute them, we will get 2 separate result sets. We will get 2 rows each from tblIndiaCustomers and tblUSCustomers.
Now we want to combine both of these result sets into one result set. How do we do that? We can use UNION or UNION ALL Operators. Let’s first use UNION ALL and see what will happen. When we execute the query, it is going to combine the rows from tblndiaCustomers with rows from tbUSCustomers. So while using UNION ALL, we are combining the result sets from these two queries. Here in output, we will get 4 rows.
Then instead of using UNION ALL, just use UNION and see what will happen. Now we will get only 3 rows. While looking at the output, we can see that the the duplicate record – Name: Thomas, Email: T@T.Com is eliminated. While using UNION ALL, we will get all the rows including duplicates. While using UNION Operator, the output is sorted as well.
What are the differences between UNION and UNION ALL Operators?
- UNION removes duplicate rows, whereas UNION ALL doesn’t.
- UNION have to perform distinct sort to remove duplicates, which makes it less faster than UNION ALL.
The distinct sort operation happening in UNION is time consuming. If you want to see that, you can turn on the Estimated Execution Plan in SQL Server Management Studio(SSMS). To do that, you can either press CTRL+L or you can click on Display Estimated Execution Plan icon in SSMS.
When we use UNION and click on this icon, we can see that the distinct sort is actually taking 63% of time.
On the other hand, when we use UNION ALL and click on Display Estimated Execution Plan, there is no distinct sort at all.
So for this reason, UNION is a little bit slower than UNION ALL.
Other important point to ponder is that for UNION and UNION ALL to work, the Number, Data types and the Order of the columns in the SELECT statements should be same. This makes sense as well. For example, write the query like below.
SELECT Id, Name From tblIndiaCustomers
SELECT Id, Name, Email From tblUSCustomers
In the above query, the first SELECT statement is giving 2 columns and the second SELECT statement is giving 3 columns. How can we combine them?! We can’t do that. So while executing the query, we will get an error.
It is not enough to have equal number of columns, but they should be in the right order as well. If we write query like below, it doesn’t make any sense.
SELECT Name, Email, Id From tblIndiaCustomers
SELECT Id, Name, Email From tblUSCustomers
How can we combine Id with Email?! So while executing the query, it is trying to convert Name to integer and fails.
So while using UNION and UNION ALL Operators, the number of columns have to be same, data types have to be same and they have to be in same order.
Sorting the results of UNION and UNION ALL
- ORDER BY clause should be used only on the last SELECT statement in the UNION query.
While executing the above query, we will get the results sorted by Name. If we use ORDER BY clause in the first query, let’s see what is going to happen. While executing the query, it will give an error stating Incorrect syntax near the keyword ‘UNION’.
This makes sense as well. How does UNION ALL work? It will take the rows from first table and combine them with the rows from second table. It doesn’t make sense to take sorted rows from first table and combine. When they are combined, the result will again become unsorted. That’s why, after getting all the results, sort them using ORDER BY clause. So ORDER BY should be with the last SELECT statement.
Differences between JOIN and UNION
- UNION combines the result set of two or more select queries into a single result set which includes all the rows from all the queries in the UNION, where as JOIN retrieves data from two or more tables based on logical relationships between the tables.
- In short, UNION combines rows from 2 or more tables, where JOIN combines columns from 2 or more tables.
Reference: Arun Ramachandran (http://BestTEchnologyBlog.Com)