Apply Operator in SQL Server Simplified






4.14/5 (5 votes)
This tip is a brief introduction to Apply operator in SQL server
Introduction
Lately, I used APPLY
in SQL server that will really help you ease out many of the complex scenarios where you can write complex SQL queries in a simple way. This tip will put some light on APPLY
operator and when it's more preferable over regular join
.
Quote:According to definition from https://technet.microsoft.com/en-us/library/ms175156%28v=sql.105%29.aspx:
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.
So, basically it's a row-by-row join
of the left data-set with the right data-set. I will explain its basic syntax and then describe the scenario where APPLY
really helps ease out complexity.
Basically APPLY
has two forms, CROSS APPLY
and OUTER APPLY
. It's similar to JOIN
more or less but can do some more that makes it preferable in some scenarios.
APPLY
allows a correlated sub-query or table-valued function to be part of the FROM
clause.
Background
I read about it from https://technet.microsoft.com/en-us/library/ms175156%28v=sql.105%29.aspx. I just wanted to explain it in a simple way with an example.
Using the Code
-- Here is the code snippet
-- Lets start with a simple join and then do the same by Cross-Apply in subsequent query
SELECT *
FROM Employee E
INNER JOIN Department D ON E.DepartmentID = D.DepartmentID
--Basic syntax of Cross Apply
--The above query can be transformed to Cross apply syntax
SELECT *
FROM Employee E
CROSS APPLY (
SELECT *
FROM Department
WHERE Department.Departmentid = E.DepartmentID
) D
--To get the Departments which could have no employee (LEFT JOIN)
SELECT *
FROM Department E
LEFT JOIN Employee D ON E.DepartmentID = D.DepartmentID
-- The above query can be re-written to Outer apply syntax (same effect as a LEFT JOIN above)
-- The final result-set contains all the selected columns from the left table-expression
-- followed by all the columns of right table expression
SELECT *
FROM Department D
OUTER APPLY (
SELECT *
FROM Employee
WHERE Employee.Departmentid = D.DepartmentID
) E
--Now , you have got the idea of APPLY operator, which is doing exactly the same thing as
--JOINS till now , the next thing that comes into our mind is Why Apply when we
--already do that with JOIN.
--So , the next code snippet will show how can we avoid co-related queries by APPLY
--co-related query
SELECT EmployeeID
,FirstName
,(
SELECT Department.DepartmentID
FROM Department
WHERE DepartmentID = E.DepartmentID
) AS DepartmentID
FROM Employee E
--Apply with Table valued function (that accept Department ID and return its all Employees)
CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment (@DeptID AS INT)
RETURNS TABLE
AS
RETURN (
SELECT *
FROM Employee E
WHERE E.DepartmentID = @DeptID
)
SELECT *
FROM Department
CROSS APPLY fn_GetAllEmployeeOfADepartment(Department.DepartmentID)
-- The above query can not be transformed into Simple Join Query , because in above query
-- we are passing DepartmentID as a parameter to the inner TVF ,which cannot be possible by JOINS
-- because in that we have two independent Result-sets and the
-- execution context of outer query is different from that of inner TVF (table valued func)
-- in other words ,you can not bind a value from the outer query to the function as a parameter..
-- THERE ARE SEVERAL OTHER SCENARIOS WHERE APPLY CAN BE USED TO EASE OUT THE COMPLEXITY OF
-- WRITING COMPLEX QUERIES
-- SUPPOSE WE HAVE 2 TABLES "RUNNERSRECORD" AND "RUNNERS"
--If we want the last 5 entries of running log(RunnerRecord) against every runner
--we can do this with ROW_NUMBER, PARTITION BY and a nested query:
SELECT *
FROM (
SELECT R.RunnerID
,FirstName
,LastName
,Email
,Distance
,EntryDate
,ROW_NUMBER() OVER (
PARTITION BY RR.RUNNERID ORDER BY RR.EntryDate DESC
) ROWID
FROM Runners R
INNER JOIN RunnersRecord rr ON R.RunnerID = RR.RunnerID
) AS T
WHERE T.ROWID <= 5
--The same result can be obtained in a much cleaner way with Apply without specifying the
--partitioning over runnerid because of row-by-row nature of APPLY operator
SELECT *
FROM Runners R -- for every row of outer runners table
CROSS APPLY (
SELECT TOP 5 * -- select top 5 items of record logs
FROM RunnersRecord RR
WHERE R.RunnerID = RR.RunnerID
ORDER BY RR.EntryDate DESC) RR
-- By using TOP inside a CROSS APPLY statement we can select the TOP 5 items for every row of
-- the outer Runners table.
-- Also, we don't have to specify the partitioning in case of Cross Apply - you can see we don't
-- have RowNum/RowID column in the above query result-set because Apply works row-by-row manner
-- As we further move to code snippet, there are other advantages of APPLY operator over normal
-- JOINS like creating multi field expressions ( create field "DayMostTravelled" and
-- "LargestDistance"
SELECT *, (
SELECT TOP 1 RR.EntryDate
FROM RunnersRecord RR
WHERE V.RunnerID = RR.RunnerID ORDER BY Distance DESC) AS DayMostTravelled ,
(
SELECT TOP 1 RR.Distance
FROM RunnersRecord RR
WHERE V.RunnerID = RR.RunnerID ORDER BY Distance DESC
) AS LargestDistance
FROM Runners V
-- The same result-set can be obtained via Cross apply
SELECT *
FROM Runners R
OUTER APPLY (
SELECT TOP 1 RR.EntryDate AS DayMostTravelled,Distance
FROM RunnersRecord RR
WHERE R.RunnerID = RR.RunnerID
ORDER BY Distance DESC
) CA
-- We can also use expression names created by first Outer apply variable to
-- next outer apply variable to form a chaining ..
SELECT R.*, C.AvgDistance, C.TotalDistance, D.DistanceLeft
FROM Runners R
OUTER APPLY (
SELECT Avg(Distance) AS AvgDistance, Sum(Distance) AS TotalDistance
FROM RunnersRecord RR
WHERE R.RunnerID = RR.RunnerID
) C
OUTER APPLY
(
--we can add a second outer apply to make a further calculation on the result of first
--hence perform same row - logic over multiple section
Select 500-C.TotalDistance as 'DistanceLeft'
) D
Conclusion
Apply
operator is just not an operator but an extension to SQL server that really helps out write complex logic in a simple query as discussed above. Comparing performance, it's slow due to row-by-row nature. For multi field expression, it's faster (but not that much). It's also useful while joining TVF
and used whenever you think of row-by-row logic. Hope it helps to understand Apply
operator in a very simplified way!
Thank you! Cheers. :)