Using Table-Valued Functions in SQL Server
This article shows few examples for using Table-Valued Functions.
Introduction
Table-Valued Functions have been around since SQL Server version 2005. Basically a Table-Valued Function is a function that returns a table, thus it can be used as a table in a query.
First sounds like nothing new since a view has been an available mechanism for a much longer time. That’s true, partly. Function has the same benefits as a view when it comes to data protection enforced via privileges or simplifying a query. However, a Table-Valued Function has at least few advantages:
- Parameterization, a function can receive parameters so the logic inside the function can be adjusted better than using traditional predicate pushing.
- Programmability, a view can have certain amount of logic (calculations, case-structures etc.) but it’s still quite column bound so more complex logic is hard or impossible to create.
Creating a Simple Table-Valued Function with (some kind of) Logic
First, let's create a small table to store some data:
CREATE TABLE TrackingItem (
Id int NOT NULL IDENTITY(1,1),
Issued date NOT NULL,
Category int NOT NULL
);
CREATE INDEX X_TrackingItem_Issued ON TrackingItem (Issued);
And then add few rows for test data:
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 0, GETDATE()), 1);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 1, GETDATE()), 2);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 4, GETDATE()), 1);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 4, GETDATE()), 2);
Now, if we would need a result set which would:
- Include all the columns from
TrackingTable
- Include an extra
Modified
(date) column - Not have even numbers in
Category
- The
Modified
-column indicates when the changes into the data have been made - Return only
TrackingItem
-rows having the Id greater than or equal to the parameter passed
The Table-Valued Function could look like this:
CREATE FUNCTION TrackingItemsModified(@minId int)
RETURNS @trackingItems TABLE (
Id int NOT NULL,
Issued date NOT NULL,
Category int NOT NULL,
Modified datetime NULL
)
AS
BEGIN
INSERT INTO @trackingItems (Id, Issued, Category)
SELECT ti.Id, ti.Issued, ti.Category
FROM TrackingItem ti
WHERE ti.Id >= @minId;
UPDATE @trackingItems
SET Category = Category + 1,
Modified = GETDATE()
WHERE Category%2 = 0;
RETURN;
END;
The function defines a new table called @trackingItems
. This is a temporary table stored in the tempdb
. The contents of this table will be return value for the function when the function exits.
First, the function inserts all the desired rows from the TrackingItem
-table to the temporary table. After that, the contents of the temporary table are modified based on the specifications and then returned to the caller.
Using the Function
The next step is to use the function. If we want to select all the rows having Id equal to 2 or more, the query would look like:
SELECT * FROM TrackingItemsModified(2);
And the results:
Id Issued Category Modified
-- ---------- -------- -----------------------
2 2011-03-11 3 2011-03-10 23:46:53.523
3 2011-03-14 1 NULL
4 2011-03-14 3 2011-03-10 23:46:53.523
As the result is a table, it can be used like one. For example, if we want to query all the original tracking items that don’t exist in this subset, the query could be:
SELECT *
FROM TrackingItem ti
WHERE ti.Id NOT IN (SELECT tim.Id
FROM TrackingItemsModified(2) tim)
An the results would be:
Id Issued Category
-- ---------- --------
1 2011-03-10 1
Generating Data
So, Table-Valued Functions can be used to return modified data from one or more tables in the database. But since they are programmable functions, they can also generate data.
One quite common problem is to query all dates from a specified period and then have some results from a table which doesn’t have entries for all the dates. In our test data, there is a row for today and tomorrow but the next few dates are missing. So, if we want to get the amount of tracking items for each day for the next seven days, it wouldn’t be so simple. One typical solution is to create a table that contains all the necessary dates and then use that table in the query. Table-Valued Function can be used as an alternative. If we pass the date range to a function, we can create the necessary data on-the-fly with a simple loop.
CREATE FUNCTION DatesBetween(@startDate date, @endDate date)
RETURNS @dates TABLE (
DateValue date NOT NULL
)
AS
BEGIN
WHILE (@startDate <= @endDate) BEGIN
INSERT INTO @dates VALUES (@startDate);
SET @startDate = DATEADD(day, 1, @startDate);
END;
RETURN;
END;
And the query for the TrackingItem
amounts would be:
SELECT d.DateValue,
(SELECT COUNT(*)
FROM TrackingItem ti
WHERE d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(day, 1, GETDATE()), DATEADD(day, 7, GETDATE())) d
ORDER BY d.DateValue;
So the results would be something like:
DateValue Items
---------- ------
2011-03-12 1
2011-03-13 0
2011-03-14 0
2011-03-15 2
2011-03-16 0
2011-03-17 0
2011-03-18 0
What About Performance
Since this is a procedural approach. The performance won’t be as good as it would be using a good, set-based approach. However, since functions can provide more flexibility from the programming point of view, let’s have a look at larger amounts of data.
First, we’ll ensure that the statistics are fine and then see what happens if we take the same query but for a period of 50 years:
UPDATE STATISTICS TrackingItem WITH FULLSCAN;
SELECT d.DateValue,
(SELECT COUNT(*)
FROM TrackingItem ti
WHERE d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(YEAR, -25, GETDATE()), DATEADD(YEAR, 25, GETDATE())) d
ORDER BY d.DateValue;
The query plan looks like:

Not very good, we’re going to do scans to tracking items in a loop based on the days returned from the function. Execution statistics verify this:
Table 'TrackingItem'. Scan count 18264, logical reads 36528,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table '#24927208'. Scan count 1, logical reads 30,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4087 ms, elapsed time = 4232 ms.
Now, if we add a clustered primary key index to the table (which is a great opportunity in certain situations), we will first re-create the function:
DROP FUNCTION DatesBetween;
CREATE FUNCTION DatesBetween(@startDate date, @endDate date)
RETURNS @dates TABLE (
DateValue date NOT NULL PRIMARY KEY CLUSTERED
)
AS
BEGIN
WHILE (@startDate <= @endDate) BEGIN
INSERT INTO @dates VALUES (@startDate);
SET @startDate = DATEADD(day, 1, @startDate);
END;
RETURN;
END;
So what happens with the exact same query:

The plan has changed since the new index is taken into account, but the approach is still the same.
Statistics:
Table 'TrackingItem'. Scan count 18264, logical reads 36528,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table '#33D4B598'. Scan count 1, logical reads 30,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4165 ms, elapsed time = 4304 ms.
It seems that the problem lies inside the function. Let’s separate the date generation and see how it goes:
SELECT *
FROM DatesBetween(DATEADD(YEAR, -25, GETDATE()), DATEADD(YEAR, 25, GETDATE())) d
The plan:

And the statistics:
Table '#33D4B598'. Scan count 1, logical reads 30,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4103 ms, elapsed time = 4268 ms.
So what’s happening? Few things:
- The statistics we see do not include the whole operation, just getting the results from the function (and in the previous examples, joining the data). There’s absolutely no way that a single scan reading just 30 pages would take 4 seconds. It’s the data generation and inserts to the temporary table that's taking the time.
- A bad plan is chosen simply because the optimizer cannot know how many rows are going to be returned. There’s no statistical data to use since we’re not querying any table (actually this effect would happen in all functions that contain multiple statements). So, the optimizer assumes that the result set is going to be small.
Having a look at the statistics for the Table-Valued Function access verifies this (note the Estimated Number of Rows
):

Unfortunately, there’s no way to tell the optimizer (during the optimization phase) an estimate for the row count. This feature would be a great thing to have in the future.
Let’s take a look other way round: Use small amounts of dates against a big table. First, let’s add a few rows to the TrackingItem
-table. 100’000 rows randomly for the next 50 years:
SET NOCOUNT ON
DECLARE @counter int = 0;
BEGIN
WHILE (@counter < 100000) BEGIN
INSERT INTO TrackingItem (Issued, Category)
VALUES ( DATEADD( day, RAND() * 365 * 50, GETDATE()), 1);
SET @counter = @counter + 1;
END;
END;
And now let’s update the statistics again and query for few days in the far future:
UPDATE STATISTICS TrackingItem WITH FULLSCAN;
SELECT d.DateValue,
(SELECT COUNT(*)
FROM TrackingItem ti
WHERE d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(MONTH, 101, GETDATE()), _
DATEADD(MONTH, 102, GETDATE())) d
ORDER BY d.DateValue;
Results (partial):
DateValue Items
---------- -----
2019-08-11 7
2019-08-12 6
2019-08-13 2
2019-08-14 3
2019-08-15 5
2019-08-16 1
2019-08-17 5
2019-08-18 7
...
The plan:

And the statistics:
Table 'TrackingItem'. Scan count 32, logical reads 97,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
Table '#33D4B598'. Scan count 1, logical reads 2,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 104 ms.
So the plan was quite good when there were relatively small amounts of rows coming out of the function.
When Creating a Table Valued Function in Management Studio
The easiest way to start creating a table valued function (well, actually any function or procedure) is to use SQL Server Management Studio and start creating an anonymous T-SQL block. Let's take an example with the DatesBetween
function. The following will create and fill a table with 10'000 dates:
DECLARE @startDate DATETIME = GETDATE();
DECLARE @endDate DATETIME = DATEADD(day, 9999, GETDATE());
DECLARE @dates
TABLE (
DateValue date NOT NULL
);
BEGIN
WHILE (@startDate <= @endDate) BEGIN
INSERT INTO @dates VALUES (@startDate);
SET @startDate = DATEADD(day, 1, @startDate);
END;
END;
On my test machine, this takes roughly 5 seconds to complete. Not very good at all. The problem is that every time the insert
is made, the database engine returns a message:
(1 row(s) affected)
Even worse, because I had both statistics time
and io
set on, every insert
returns information about the statistics:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#286302EC'. Scan count 0, logical reads 1, physical reads 0,
read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
So I'm getting 10'000 of these messages in the message tab and filling the messages takes time. So if I set both statistics off and use:
SET NOCOUNT ON
in the beginning of the script, the execution is done in less than a second.
This is good to bear in mind when using anonymous blocks in the Management Studio. The same effect won't happen when the T-SQL block is inside a function and the function is called from the Management Studio since only the messages for the calling SQL statement will be returned, not from the inside of the function.
Conclusions
Table-Valued Function is a good alternative for a view or an extra table when parameterization is needed or complex logic is included and especially when the amount of data returning from the function is relatively small.
For example, if the date generator should be able to generate dates for different granularities (days, working days, weekends, etc.), we would possibly need several tables or tagged rows if the traditional approach is used.
History
- March 11, 2011: Created
- March 12, 2011: Minor modifications and spelling corrections
- March 27, 2011: Added discussion about working with Management Studio and anonymous T-SQL blocks
- August 5, 2011: Script added