JOIN Instead of Repeating a Subquery






4.73/5 (4 votes)
Rather than use multiple subqueries to extract TOP 1 data, this shows you how to use a single TOP 1 JOIN.
There is sometimes a need to
JOIN
to a table, but only include the first result that matches the JOIN
condition so the result records aren't duplicated by the JOIN
. The typical way to do that is to use a subquery. However, if the fields on the table are used in more than one place, that can become cumbersome to add the same subquery several times, as in the following example:
-- Some example table variables with sample data.
DECLARE @FirstTable table(FirstName varchar(20))
DECLARE @SecondTable table
(ID int,
FirstName varchar(20),
LastName varchar(20),
FirstNameHashCode varchar(20),
FirstNameFrequency int)
INSERT INTO @FirstTable(FirstName) VALUES('Billy')
INSERT INTO @FirstTable(FirstName) VALUES('Super')
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(1, 'Billy', 'Mays', '0XB$22', 22)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(2, 'Billy', 'TheKid', '0XB$22', 22)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(3, 'Super', 'Man', 'SJ3JD', 40)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(4, 'Super', 'Friends', 'SJ3JD', 40)
-- Multiple subqueries.
SELECT
(
-- Subquery is returning one column.
SELECT TOP 1
ST.FirstNameHashCode
FROM @SecondTable AS ST
WHERE
-- Match condition.
ST.FirstName = FT.FirstName
),
(
-- The same subquery, except it's returning a different column.
SELECT TOP 1
ST.FirstNameFrequency
FROM @SecondTable AS ST
WHERE
-- Match condition.
ST.FirstName = FT.FirstName
),
(
-- The query appears a third time, with the randomly chosen row.
SELECT TOP 1
ST.ID
FROM @SecondTable AS ST
WHERE
-- Match condition.
ST.FirstName = FT.FirstName
)
-- You'd have to repeat the entire subquery for each new column returned.
FROM @FirstTable AS FT
A better (at least in non-contrived cases) way to do this is to use what I call a "TOP 1 JOIN
":
-- Some example table variables with sample data.
DECLARE @FirstTable table(FirstName varchar(20))
DECLARE @SecondTable table
(ID int,
FirstName varchar(20),
LastName varchar(20),
FirstNameHashCode varchar(20),
FirstNameFrequency int)
INSERT INTO @FirstTable(FirstName) VALUES('Billy')
INSERT INTO @FirstTable(FirstName) VALUES('Super')
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(1, 'Billy', 'Mays', '0XB$22', 22)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(2, 'Billy', 'TheKid', '0XB$22', 22)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(3, 'Super', 'Man', 'SJ3JD', 40)
INSERT INTO @SecondTable(ID, FirstName, LastName, FirstNameHashCode, FirstNameFrequency)
VALUES(4, 'Super', 'Friends', 'SJ3JD', 40)
-- Uses a single JOIN instead of multiple subqueries.
SELECT
-- Three different columns are used from SecondTable.
ST.FirstNameHashCode,
ST.FirstNameFrequency,
ST.ID
-- More columns could be added without changing the below JOIN.
FROM @FirstTable AS FT
JOIN @SecondTable AS ST
-- Step 1: match condition is here so the query isn't slow (e.g., so indexes will be used).
-- This step could be skipped, but it would hurt performance.
ON ST.FirstName = FT.FirstName
-- Only include the first record, chosen via "TOP 1".
AND EXISTS
(
SELECT
-- Return some junk data so EXISTS will pass.
0
FROM @SecondTable AS STMany
JOIN
(
-- Step 3: only select first record that matches the condition.
SELECT TOP 1
*
FROM @SecondTable AS STInner
WHERE
-- Match condition.
STInner.FirstName = FT.FirstName
) AS STTop1
-- Step 4: only return record if the record matches the TOP 1 record (via the primary key).
ON STMany.ID = STTop1.ID
WHERE
-- Step 2: find the record associated with step 1 (via the primary key).
STMany.ID = ST.ID
)
There is a simpler version, but this version works with tables that have primary keys composed of more than one column, so it is more generally applicable. The above example code is entirely self-contained, so go ahead and give it a run to see what it returns.