Before we start discussing what a subquery is, we need to have a sample database schema.
Diagram was created with Vertabelo
What is a subquery?
A subquery is a
SELECT statement with another SQL statement, like in the example below:
where id in
where provider_id = 156);
Subqueries are further classified as either a correlated subquery or a nested subquery. They are usually constructed in such a way to return:
- a table:
from (select product_category, avg(price) as average_price
group by product_category) average;
- or a value :
where value > (select avg(value)
from purchase )
Nested subqueries are subqueries that don't rely on an outer query. In other words, both queries in a nested subquery may be run as separate queries.
This type of subquery could be used almost everywhere, but it usually takes one of these formats:
WHERE [NOT] IN (subquery)
where city in (select city
The example subquery returns all clients that are from the same city as the product providers.
IN operator checks if the value is within the table and retrieves the matching rows.
Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.
Let's start with a simple example. The inner query calculates the average value and returns it. In the outer query’s
where clause, we filter only those purchases which have a value greater than the inner query’s returned value.
Subquery Correlated in Where Clause
from purchase p1
where date > '2013-07-15'
and value > (select avg(value)
from purchase p2
where p1.date = p2.date)
The query returns purchases after
15/07/2014 with a total price greater than the average value from the same day.
The equivalent example, but with joining tables.
from purchase p1, purchase p2
where p1.date = p2.date
and p1.date > '2013-07-15'
group by p1.id
having p1.value > avg(p2.value);
This example can also be written as a
select statement with a subquery correlated in a
The subquery returns the table that contains the average value for each purchase for each day. We join this result with the
Purchase table on column '
date' to check the condition
date > '15/07/2014'.
from purchase, (select date, avg(value) as average_value
where date > '2013-07-15'
group by date) average
where purchase.date = average.date
and purchase.date > '2013-07-15'
and purchase.value > average.average_value;
Usually, this kind of subquery should be avoided because indexes can't be used on a temporary table in memory.
When a subquery is used, the query optimizer performs additional steps before the results from the subquery are used. If a query that contains a subquery can be written using a
join, it should be done this way.
Joins usually allow the query optimizer to retrieve the data in a more efficient way.
You can find an extended article as well as more examples here.