Click here to Skip to main content
15,867,568 members
Articles / Database Development / SQL Server

Using Table-Valued Functions in SQL Server

Rate me:
Please Sign up or sign in to vote.
4.92/5 (51 votes)
6 Aug 2011CPOL6 min read 586.9K   1.2K   65   33
This article shows few examples for using Table-Valued Functions.

Introduction

Table-Valued Functions have been around since SQL Server version 2005. Basically a Table-Valued Function is a function that returns a table, thus it can be used as a table in a query.

First sounds like nothing new since a view has been an available mechanism for a much longer time. That’s true, partly. Function has the same benefits as a view when it comes to data protection enforced via privileges or simplifying a query. However, a Table-Valued Function has at least few advantages:

  • Parameterization, a function can receive parameters so the logic inside the function can be adjusted better than using traditional predicate pushing.
  • Programmability, a view can have certain amount of logic (calculations, case-structures etc.) but it’s still quite column bound so more complex logic is hard or impossible to create.

Creating a Simple Table-Valued Function with (some kind of) Logic

First, let's create a small table to store some data:

SQL
CREATE TABLE TrackingItem (
   Id       int  NOT NULL IDENTITY(1,1),
   Issued   date NOT NULL,
   Category int  NOT NULL
);
CREATE INDEX X_TrackingItem_Issued ON TrackingItem (Issued);

And then add few rows for test data:

SQL
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 0, GETDATE()), 1);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 1, GETDATE()), 2);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 4, GETDATE()), 1);
INSERT INTO TrackingItem (Issued, Category) VALUES ( DATEADD( day, 4, GETDATE()), 2);

Now, if we would need a result set which would:

  • Include all the columns from TrackingTable
  • Include an extra Modified (date) column
  • Not have even numbers in Category
  • The Modified-column indicates when the changes into the data have been made
  • Return only TrackingItem-rows having the Id greater than or equal to the parameter passed

The Table-Valued Function could look like this:

SQL
CREATE FUNCTION TrackingItemsModified(@minId int)
RETURNS @trackingItems TABLE (
   Id       int      NOT NULL,
   Issued   date     NOT NULL,
   Category int      NOT NULL,
   Modified datetime NULL
) 
AS
BEGIN
   INSERT INTO @trackingItems (Id, Issued, Category)
   SELECT ti.Id, ti.Issued, ti.Category 
   FROM   TrackingItem ti
   WHERE  ti.Id >= @minId; 
   
   UPDATE @trackingItems
   SET Category = Category + 1,
       Modified = GETDATE()
   WHERE Category%2 = 0;
  
   RETURN;
END;

The function defines a new table called @trackingItems. This is a temporary table stored in the tempdb. The contents of this table will be return value for the function when the function exits.

First, the function inserts all the desired rows from the TrackingItem-table to the temporary table. After that, the contents of the temporary table are modified based on the specifications and then returned to the caller.

Using the Function

The next step is to use the function. If we want to select all the rows having Id equal to 2 or more, the query would look like:

SQL
SELECT * FROM TrackingItemsModified(2);

And the results:

Id  Issued      Category  Modified
--  ----------  --------  -----------------------
2   2011-03-11  3         2011-03-10 23:46:53.523
3   2011-03-14  1         NULL
4   2011-03-14  3         2011-03-10 23:46:53.523

As the result is a table, it can be used like one. For example, if we want to query all the original tracking items that don’t exist in this subset, the query could be:

SQL
SELECT *
FROM  TrackingItem ti
WHERE ti.Id NOT IN (SELECT tim.Id
                    FROM   TrackingItemsModified(2) tim)

An the results would be:

Id  Issued      Category
--  ----------  --------
1   2011-03-10  1

Generating Data

So, Table-Valued Functions can be used to return modified data from one or more tables in the database. But since they are programmable functions, they can also generate data.

One quite common problem is to query all dates from a specified period and then have some results from a table which doesn’t have entries for all the dates. In our test data, there is a row for today and tomorrow but the next few dates are missing. So, if we want to get the amount of tracking items for each day for the next seven days, it wouldn’t be so simple. One typical solution is to create a table that contains all the necessary dates and then use that table in the query. Table-Valued Function can be used as an alternative. If we pass the date range to a function, we can create the necessary data on-the-fly with a simple loop.

SQL
CREATE FUNCTION DatesBetween(@startDate date, @endDate date)
RETURNS @dates TABLE (
   DateValue date NOT NULL
) 
AS
BEGIN
   WHILE (@startDate <= @endDate) BEGIN
      INSERT INTO @dates VALUES (@startDate);
      SET @startDate = DATEADD(day, 1, @startDate);
   END;
   
   RETURN;
END;

And the query for the TrackingItem amounts would be:

SQL
SELECT d.DateValue,
       (SELECT COUNT(*)
        FROM   TrackingItem ti
        WHERE  d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(day, 1, GETDATE()), DATEADD(day, 7, GETDATE())) d
ORDER BY d.DateValue;

So the results would be something like:

SQL
DateValue   Items
----------  ------
2011-03-12  1
2011-03-13  0
2011-03-14  0
2011-03-15  2
2011-03-16  0
2011-03-17  0
2011-03-18  0

What About Performance

Since this is a procedural approach. The performance won’t be as good as it would be using a good, set-based approach. However, since functions can provide more flexibility from the programming point of view, let’s have a look at larger amounts of data.

First, we’ll ensure that the statistics are fine and then see what happens if we take the same query but for a period of 50 years:

SQL
UPDATE STATISTICS TrackingItem WITH FULLSCAN;
 
SELECT d.DateValue,
       (SELECT COUNT(*)
        FROM   TrackingItem ti
        WHERE  d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(YEAR, -25, GETDATE()), DATEADD(YEAR, 25, GETDATE())) d
ORDER BY d.DateValue;

The query plan looks like:

Plan1.jpg

Not very good, we’re going to do scans to tracking items in a loop based on the days returned from the function. Execution statistics verify this:

Table 'TrackingItem'. Scan count 18264, logical reads 36528, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
Table '#24927208'. Scan count 1, logical reads 30, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
CPU time = 4087 ms, elapsed time = 4232 ms.

Now, if we add a clustered primary key index to the table (which is a great opportunity in certain situations), we will first re-create the function:

SQL
DROP FUNCTION DatesBetween;
 
CREATE FUNCTION DatesBetween(@startDate date, @endDate date)
RETURNS @dates TABLE (
   DateValue date NOT NULL PRIMARY KEY CLUSTERED
) 
AS
BEGIN
   WHILE (@startDate <= @endDate) BEGIN
      INSERT INTO @dates VALUES (@startDate);
      SET @startDate = DATEADD(day, 1, @startDate);
   END;
 
   RETURN;
END;

So what happens with the exact same query:

Plan2.jpg

The plan has changed since the new index is taken into account, but the approach is still the same.

Statistics:

Table 'TrackingItem'. Scan count 18264, logical reads 36528, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
Table '#33D4B598'. Scan count 1, logical reads 30, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
CPU time = 4165 ms, elapsed time = 4304 ms.

It seems that the problem lies inside the function. Let’s separate the date generation and see how it goes:

SQL
SELECT * 
FROM DatesBetween(DATEADD(YEAR, -25, GETDATE()), DATEADD(YEAR, 25, GETDATE())) d

The plan:

Plan3.jpg

And the statistics:

Table '#33D4B598'. Scan count 1, logical reads 30, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
CPU time = 4103 ms, elapsed time = 4268 ms.

So what’s happening? Few things:

  • The statistics we see do not include the whole operation, just getting the results from the function (and in the previous examples, joining the data). There’s absolutely no way that a single scan reading just 30 pages would take 4 seconds. It’s the data generation and inserts to the temporary table that's taking the time.
  • A bad plan is chosen simply because the optimizer cannot know how many rows are going to be returned. There’s no statistical data to use since we’re not querying any table (actually this effect would happen in all functions that contain multiple statements). So, the optimizer assumes that the result set is going to be small.

Having a look at the statistics for the Table-Valued Function access verifies this (note the Estimated Number of Rows):

stat1.jpg

Unfortunately, there’s no way to tell the optimizer (during the optimization phase) an estimate for the row count. This feature would be a great thing to have in the future.

Let’s take a look other way round: Use small amounts of dates against a big table. First, let’s add a few rows to the TrackingItem-table. 100’000 rows randomly for the next 50 years:

SQL
SET NOCOUNT ON
DECLARE @counter int = 0;
BEGIN
   WHILE (@counter < 100000) BEGIN
      INSERT INTO TrackingItem (Issued, Category) 
      VALUES ( DATEADD( day, RAND() * 365 * 50, GETDATE()), 1);
   
      SET @counter = @counter + 1;
   END;
END;

And now let’s update the statistics again and query for few days in the far future:

SQL
UPDATE STATISTICS TrackingItem WITH FULLSCAN;
 
SELECT d.DateValue,
       (SELECT COUNT(*)
        FROM   TrackingItem ti
        WHERE  d.DateValue = ti.Issued) AS Items
FROM DatesBetween(DATEADD(MONTH, 101, GETDATE()), _
			DATEADD(MONTH, 102, GETDATE())) d
ORDER BY d.DateValue;

Results (partial):

DateValue   Items
----------  -----
2019-08-11  7
2019-08-12  6
2019-08-13  2
2019-08-14  3
2019-08-15  5
2019-08-16  1
2019-08-17  5
2019-08-18  7
...

The plan:

Plan4.jpg

And the statistics:

Table 'TrackingItem'. Scan count 32, logical reads 97, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
Table '#33D4B598'. Scan count 1, logical reads 2, 
      physical reads 0, read-ahead reads 0, lob logical reads 0, 
      lob physical reads 0, lob read-ahead reads 0.
 
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 104 ms.

So the plan was quite good when there were relatively small amounts of rows coming out of the function.

When Creating a Table Valued Function in Management Studio

The easiest way to start creating a table valued function (well, actually any function or procedure) is to use SQL Server Management Studio and start creating an anonymous T-SQL block. Let's take an example with the DatesBetween function. The following will create and fill a table with 10'000 dates:

SQL
DECLARE @startDate DATETIME = GETDATE();
DECLARE @endDate DATETIME = DATEADD(day, 9999, GETDATE());
DECLARE @dates 
   TABLE (
      DateValue date NOT NULL
   );
BEGIN
   WHILE (@startDate <= @endDate) BEGIN
      INSERT INTO @dates VALUES (@startDate);
      SET @startDate = DATEADD(day, 1, @startDate);
   END;
END; 

On my test machine, this takes roughly 5 seconds to complete. Not very good at all. The problem is that every time the insert is made, the database engine returns a message:

(1 row(s) affected)  

Even worse, because I had both statistics time and io set on, every insert returns information about the statistics:

SQL Server Execution Times: 
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table '#286302EC'. Scan count 0, logical reads 1, physical reads 0, 
                   read-ahead reads 0, lob logical reads 0, 
                   lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms. 

So I'm getting 10'000 of these messages in the message tab and filling the messages takes time. So if I set both statistics off and use:

SQL
SET NOCOUNT ON 

in the beginning of the script, the execution is done in less than a second.

This is good to bear in mind when using anonymous blocks in the Management Studio. The same effect won't happen when the T-SQL block is inside a function and the function is called from the Management Studio since only the messages for the calling SQL statement will be returned, not from the inside of the function.

Conclusions

Table-Valued Function is a good alternative for a view or an extra table when parameterization is needed or complex logic is included and especially when the amount of data returning from the function is relatively small.

For example, if the date generator should be able to generate dates for different granularities (days, working days, weekends, etc.), we would possibly need several tables or tagged rows if the traditional approach is used.

History

  • March 11, 2011: Created
  • March 12, 2011: Minor modifications and spelling corrections
  • March 27, 2011: Added discussion about working with Management Studio and anonymous T-SQL blocks
  • August 5, 2011: Script added

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect
Europe Europe
Biography provided

Comments and Discussions

 
QuestionPlease clarify a point for me in your example of TrackingTable Pin
Member 143531702-May-19 15:28
Member 143531702-May-19 15:28 
QuestionCalling TVF by passing UDTT Pin
Member 1434957730-Apr-19 3:58
Member 1434957730-Apr-19 3:58 
GeneralMy vote of 1 Pin
Jeff Moden14-Sep-15 14:41
Jeff Moden14-Sep-15 14:41 
GeneralRe: My vote of 1 Pin
Wendelius14-Sep-15 17:42
mentorWendelius14-Sep-15 17:42 
GeneralRe: My vote of 1 Pin
Jeff Moden15-Sep-15 3:28
Jeff Moden15-Sep-15 3:28 
GeneralMy vote of 5 Pin
Sunasara Imdadhusen28-May-14 1:33
professionalSunasara Imdadhusen28-May-14 1:33 
GeneralRe: My vote of 5 Pin
Wendelius3-Sep-14 7:50
mentorWendelius3-Sep-14 7:50 
GeneralTVF MOD FOR TSQL 2005 Pin
rs473411-Dec-13 6:20
rs473411-Dec-13 6:20 
GeneralMy vote of 5 Pin
Member 102493735-Sep-13 2:28
Member 102493735-Sep-13 2:28 
GeneralMy vote of 5 Pin
CHill6011-Jul-13 7:29
mveCHill6011-Jul-13 7:29 
GeneralMy vote of 5 Pin
csharpbd6-Apr-13 10:44
professionalcsharpbd6-Apr-13 10:44 
QuestionVery good article about Table-Valued functions Pin
Suyaraj M4-Apr-13 3:29
Suyaraj M4-Apr-13 3:29 
AnswerRe: Very good article about Table-Valued functions Pin
Wendelius3-Sep-14 7:52
mentorWendelius3-Sep-14 7:52 
QuestionHow would a CTE compare? Pin
dojohansen8-Nov-12 4:42
dojohansen8-Nov-12 4:42 
GeneralMy vote of 4 Pin
nhjewel13-Feb-12 18:31
nhjewel13-Feb-12 18:31 
GeneralRe: My vote of 4 Pin
Wendelius23-Feb-12 4:02
mentorWendelius23-Feb-12 4:02 
GeneralGood article! Pin
Wonde Tadesse6-Aug-11 7:29
professionalWonde Tadesse6-Aug-11 7:29 
Good article. And deserves 5.
Wonde Tadesse
MCTS

GeneralRe: Good article! Pin
Wendelius6-Aug-11 10:04
mentorWendelius6-Aug-11 10:04 
GeneralMy vote of 5 Pin
TweakBird14-Apr-11 7:33
TweakBird14-Apr-11 7:33 
GeneralRe: My vote of 5 Pin
Wendelius14-Apr-11 8:44
mentorWendelius14-Apr-11 8:44 
GeneralGood stuff again, take another 5! Pin
Nish Nishant27-Mar-11 5:12
sitebuilderNish Nishant27-Mar-11 5:12 
GeneralRe: Good stuff again, take another 5! Pin
Wendelius27-Mar-11 5:46
mentorWendelius27-Mar-11 5:46 
GeneralThis is a really great article about TVFs Pin
WestieBoy14-Mar-11 16:46
WestieBoy14-Mar-11 16:46 
GeneralRe: This is a really great article about TVFs Pin
Wendelius14-Mar-11 18:55
mentorWendelius14-Mar-11 18:55 
GeneralGETDATE() in TVF Pin
WestieBoy14-Mar-11 21:26
WestieBoy14-Mar-11 21:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.