This post discusses several types of subqueries, and how each can be switched to another form, such as a join.
In this puzzle, we’re going to learn how to rewrite a subquery using inner joins. Knowing about a subquery versus inner join can help you with interview questions and performance issues. Though subqueries have unique abilities, there are times when it is better to use other SQL constructs such as join
s.
By reading this article, you’ll learn about several types of subqueries, and how each can be switched to another form, such as a join
.
Solving puzzles is a great way to learn SQL. Nothing beats practicing what you’ve learned. Once you have figured out the puzzle, post your answer in the comments so we all can learn from one another.
SQL Puzzle Question
A remedy for query confusion…
A coworker just learned about subqueries and wrote some SQL to retrieve employee names and birthdates from the AdventureWorks
database. The problem is, they want to change it and now it’s hard to read!
Can you help them simplify the following SQL?
SELECT E.HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID),
(SELECT LastName
FROM Person.Person P2
WHERE P2.BusinessEntityID = E.BusinessEntityID),
E.BirthDate
FROM HumanResources.Employee E
WHERE (SELECT PersonType
FROM Person.Person T
WHERE T.BusinessEntityID = E.BusinessEntityID) = 'EM'
ORDER BY HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID)
What statement would you write to make it easier to read, and perhaps run more efficiently?
Subquery versus Inner Join Answer
Before we begin, let’s talk about the existing query… what is it?
You’ll see the query combines data from two different tables. It is doing so using subqueries in both the FROM
and WHERE
clauses.
SELECT E.HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID)</span>,
(SELECT LastName
FROM Person.Person P2
WHERE P2.BusinessEntityID = E.BusinessEntityID)</span>,
E.BirthDate
FROM HumanResources.Employee E
WHERE (SELECT PersonType
FROM Person.Person T
WHERE T.BusinessEntityID = E.BusinessEntityID)</span> = 'EM'
ORDER BY HireDate,
(SELECT FirstName
FROM Person.Person P1
WHERE P1.BusinessEntityID = E.BusinessEntityID)</span>
Also, you see that each subquery’s WHERE
clause is restricting the rows returned to those equal to the Employee.BusinessEntityID
. This is what people call a correlated subquery.
I also want to point out that the queries in the FROM
must return a single value (scalar). If they don’t, then an error is thrown.
As you can imagine, this is dangerous, as it can be difficult to guarantee a query return at most one row. I know were safe in this case I know this as the matching condition occurs between each table’s primary keys.
Subquery versus Inner Join – Converting the Query
If I was writing this query, I would use an INNER JOIN
. Here is the query I would write:
SELECT E.HireDate,
P.FirstName,
P.LastName,
E.BirthDate
FROM HumanResources.Employee E
INNER JOIN Person.Person P
ON P.BusinessEntityID = E.BusinessEntityID
WHERE P.PersonType = 'EM'
ORDER BY E.HireDate,
P.FirstName
Which is Easier to Read?
This shouldn’t be much of a debate, the INNER JOIN
is much shorter, and I think to the point. The join
clause speaks for itself. You know it is relating two tables together; whereas, with the subquery, it is so apparent.
The INNER JOIN
version is easier to maintain.
Also, with the subquery method, you’ll see much of the code repeated. This may not seem like a big deal, now, but if you ever have to change the query, it is, as now when you make a change you need to be sure to make the same change in several locations.
SubQuery or Inner Join? Which is More Efficient?
Here is the query plan for the sub query version:
I’ve highlighted the effect of the four subqueries. For this statement, each query results in a nested loop. These aren’t good.
It means if you have ten rows in two tables each, then you need to on average, iterate through the second table 50 times (100/2) for each row in the first to find a match. This means, instead of a seek taking two or three operations to find a match, the match could take upward of 100 * 50 = 500 seeks to find.
Nested loops are a fact of life, but less is better.
And here is the version for the INNER JOIN
:
SET SHOWPLAN_ALL ON
SELECT E.HireDate,
P.FirstName,
P.LastName,
E.BirthDate
FROM HumanResources.Employee E
INNER JOIN Person.Person P
ON P.BusinessEntityID = E.BusinessEntityID
WHERE P.PersonType = 'EM'
ORDER BY E.HireDate,
P.FirstName
Here, you see there is only one nested loop. For sure, that is better than four.
After observing both the SQL and query plans for each set of statements, you can see that INNER JOIN
is superior in several ways; however, check out that simplified plan!
The query’s true task is to combine columns from two tables; this is what INNER JOINS
excel at. Sure, there are time subqueries make sense, and can be used to do things you can’t with joins, but in this case, it doesn’t make sense to use one.
Hello my name is Kris. I’m here because I am passionate about helping non-techie people to overcome their fear of learning SQL.
I know what it is like to not know where to start or whether the time spent learning is worth the effort. That is why I am here to help you to:
- Get started in an easy to follow step-by-step manner.
- Use your time wisely so you focus on what is important to learn to get the most value from your time.
- Answer your questions. Really! Just post a comment and I’ll respond. I’m here to help.
It wasn’t long ago that I was helping a colleague with some reporting. She didn’t know where to start and soon got overwhelmed and lost as she didn’t know SQL.
I felt really bad, as she was under pressure to get some summary information to her boss, the built-in reports were falling short, and to make them better would require her to know SQL. At that time that seemed impossible! It in dawned on me, it doesn’t have to be that way.
Then I discovered a way for anyone with the desire to easily learn SQL. I worked with my co-worker, started to teach her what I learned and soon she was able to write reports and answer her boss’ questions without getting stressed or ploughing hours into manipulating data in Excel.
It hasn’t always been easy. Sometimes the information seems abstract or too conceptual. In this case I’ve found out that a visual explanation is best. I really like to use diagrams or videos to explain hard-to-grasp ideas.
Having video, pictures, and text really help to reinforce the point and enable learning.
And now I want to help you get the same results.
The first step is simple, click here http://www.essentialsql.com/get-started-with-sql-server/