Click here to Skip to main content
15,867,141 members
Articles / Programming Languages / SQL
Article

Use Table-Valued Functions as Arrays in SQL Server

Rate me:
Please Sign up or sign in to vote.
4.77/5 (23 votes)
6 Jun 2007CPOL8 min read 278.1K   801   88   37
Provide array functionality in SQL Server using T-SQL or the CLR

SQL using IN

The SQL IN statement allows returning a set of records based on a variable -- i.e. unknown at design time -- number of keys. However, IN doesn't allow parameters. This leaves us with either fixed lists or dynamic SQL statements. The problem with fixed lists is that they don't provide the flexibility needed at times. However, dynamic SQL requires the statement to be compiled and a new plan to be cached for each new set of keys. The result is that CPU and memory resources are required to compile and cache queries that will never be reused. See my article on parameterized SQL for more details.

It would be nice to be able to pass an array of values as a parameter to a stored procedure or ad hoc statement, but T-SQL doesn't support arrays because it is a set-based language. When I first learned about table variables in SQL 2000, I thought they were the answer. I quickly discovered, however, that you cannot define them as parameters. They only exist as local variables and their scope is limited to the batch where they are declared. However, I did eventually discover I could use table-valued functions to create and return a table variable.

Table-valued functions

Table valued functions have been available since SQL 2000. A table-valued function is a user-defined function that returns a result set as a table variable. The output can be the result of some SELECT statement or a temp table populated within the function. A user-defined function, i.e. table-valued or scalar, takes a set of parameters just like a stored procedure and returns a result without side-effects. DML operations are not allowed. This means we can pass a parameter to a table-valued function that returns a result set without resorting to dynamic SQL. The temp table returned by our function will act as an array of values that can be referenced by the IN clause. Also, because there's no dynamic SQL involved, the function itself will not cause our stored procedure to be recompiled or our ad hoc batch to be compiled every time it is submitted.

SQL
CREATE FUNCTION [dbo].[function_string_to_table]
(
    @string VARCHAR(MAX),
    @delimiter CHAR(1)
)
RETURNS @output TABLE(
    data VARCHAR(256)
)
BEGIN

    DECLARE @start INT, @end INT
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)

    WHILE @start < LEN(@string) + 1 BEGIN
        IF @end = 0 
            SET @end = LEN(@string) + 1

        INSERT INTO @output (data) 
        VALUES(SUBSTRING(@string, @start, @end - @start))
        SET @start = @end + 1
        SET @end = CHARINDEX(@delimiter, @string, @start)
    END

    RETURN

END

The above function takes VARCHAR(MAX) and CHAR(1) as parameters. The VARCHAR is the list of keys, concatenated and sent as a parameter. The CHAR allows for the use of any desired character as a delimiter, i.e. comma, pipe, colon, semi-colon or whatever you may choose. The function uses the @delimiter parameter to parse the @string parameter and insert each delimited value into the defined temp table (@output). Now we can use the function as follows:

SQL
--JUST RETURN THE KEYS
SELECT * FROM dbo.function_string_to_table('key1|key2|key3|key4|key5', '|')

--USE THE FUNCTION AS A SUBQUERY
SELECT * FROM table1 WHERE sID IN (SELECT * 
    FROM dbo.function_string_to_table('key1|key2|key3|key4|key5', '|'))

As an additional benefit, you can now pass an "array" to an SQL server-stored procedure or batch. Take the following stored procedure as an example:

SQL
CREATE PROCEDURE procedure_takes_array
    @string VARCHAR(MAX),
    @delimiter CHAR(1)
AS

    SELECT * FROM table1 WHERE sID IN 
    (SELECT * FROM dbo.function_string_to_table(@string, @delimiter))

GO

This can then be called from a client application as follows:

C#
void GetData(string[] keys)
{
    using (SqlConnection oconn = new SqlConnection
        ("Application Name=SQLArrayTest;Server=(local);
    Database=MaxPreps_v2;Trusted_Connection=Yes;"))
    {
        oconn.Open();

        string SQL = "SELECT * FROM 
        dbo.function_string_to_table(@keys, '|')";
                    
        using (SqlCommand cmd = oconn.CreateCommand())
        {
            cmd.CommandText = sql;
            cmd.CommandType = CommandType.Text;
            cmd.Parameters.Add("@keys", SqlDbType.VarChar, -1);
            cmd.Parameters["@keys"].Value = string.Join("|", keys);

            //execute the command
            SqlDataReader rdr = cmd.ExecuteReader();
            rdr.Close();
        }
    }
}

But this functionality does not come without a price. T-SQL is not very efficient at string manipulation because it is largely a procedural operation, not a set-based operation. The above technique is not an expensive one, other than the cost in time due to the manual splitting of the string. So, it will not rob SQL server of its much-needed resources. However, if your process runs frequently, you may want to test and make sure this method has acceptable performance.

Using the CLR

When I first started writing this article, I debated whether to break this into two separate articles instead of one single article. Even though SQL Server 2005 is almost a year old now, it seems as though any article on the topic of the integrated CLR should provide some background first. I suspect that DBAs are still not comfortable with the idea and so it has not been enabled on most systems. What I found when researching the CLR for this article is that a good understanding of where the CLR can help and where it will hurt will go a long way towards reaping great benefits from this new feature. I can't speak for the security implications at this point in time, but as you will see, the CLR can provide huge benefits in performance over T-SQL operations like the one in the previous section. Just make sure that you understand when and where to use the CLR. For some background, check out this whitepaper: Using CLR Integration in SQL Server 2005.

First, you'll need to make sure the CLR has been enabled on your test server. If it has not already been enabled, run the following script:

SQL
EXEC sp_configure 'show advanced options' , '1';
go
reconfigure;
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;

Now to write our CLR table-valued function we'll use Visual Studio 2005. Follow these steps:

  1. Open VS 2005 and select File –> New –> Project…
  2. Expand the node ( [+] ) next to "Visual C#" and select "Database"
  3. Select "SQL Server Project" and name your project "SQLArray"
  4. Click "OK"
  5. If you created a database reference before, select the reference you want to use for debugging and deployment. If you have not, you will be prompted for this information. In that case, enter the server name and authentication credentials, and select the database you want to use.
  6. To create a user-defined function, select Project –> Add user-defined function…
  7. Name the file "SQLArray" and click "Add"

Now replace the contents of the file with the following code:

C#
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;

public partial class UserDefinedFunctions
{
    [SqlFunction(Name="clrfunction_string_to_table", 
    FillRowMethodName="FillRow", TableDefinition="ID NVARCHAR(50)")]
    public static IEnumerable SqlArray(SqlString str, SqlChars delimiter)
    {
        //return single element array if no delimiter is specified
        if(delimiter.Length == 0)
            return new string[1] { str.Value };


        //split the string and return a string array
        return str.Value.Split(delimiter[0]);
    }

    public static void FillRow(object row, out SqlString str)
    {
        //set the column value
        str = new SqlString( (string) row );
    }
}

First, to point out a few things: The method that will be used for the "main" function must return System.Collections.IEnumerable, so a using statement has been added for the System.Collections namespace. Also, the SqlFunction attribute must define the following values:

  • Name: This is the name of the table-valued function to be created on the specified instance of SQL Server.
  • TableDefinition: This is the definition for the temp table to be returned. NOTE: timestamp and non-unicode string types (char, varchar and text) are not allowed as part of the table definition. Your code will build, but it will not deploy. Instead, you will receive the following error message: "Cannot use 'varchar' column in the result table of a streaming user-defined function."
  • FillRowMethodName: The method to be used to populate the temp table that will be returned by the function. The CLR will use the IEnumerable reference and iterate over the result. For each item in the enumerator, the method defined by this property will be called by the CLR. The signature of the method must include an object that will be used to pass the current item and then an out parameter for each column in the defined temp table.
  • NOTE: Always use SqlType types because they support the INullable interface and this allows you to be consistent with SQL standards.

Now build the project. To deploy it, select Build –> Deploy SqlArray. Now you can call the table-valued function just like you would any other function:

SQL
--JUST RETURN THE KEYS
SELECT * FROM dbo.clrfunction_string_to_table
        ('key1|key2|key3|key4|key5', '|')

--USE THE FUNCTION AS A SUBQUERY
SELECT * FROM table1 WHERE sID IN 
    (SELECT * FROM dbo.clrfunction_string_to_table
        ('key1|key2|key3|key4|key5', '|'))

The result is much faster, as we'll see next.

Test project

The test project is just a simple "load test" application that connects to the database and repeatedly calls the function. It does not use the function to return results from any table, however. The reason for this is to demonstrate the raw difference in performance between the T-SQL and the CLR versions of the function. My test sample was 1000 iterations and below are the results:

SQL
Iterations: 10,000
- T-SQL (avg): 10.3543 sec
- CLR (avg): 7.2886

The CLR version is, on average, several seconds faster. Both versions are certainly viable solutions, though. This means that if you're using SQL 2000 or if the CLR hasn't (or won't) be enabled for your SQL 2005 servers, you can of course use the T-SQL version with little overall difference. As a side note, I accidentally discovered something interesting. Claudius Ceteras pointed out that in my original load test application I had forgotten to reset the StringBuilder object. So, with each iteration of the load test, the string of keys was increasing by as many as 25 new keys. Check out these results below:

SQL
Test Project (1000 Iterations)
- T-SQL 17 min 20 sec 765 ms
- CLR 0 min 3 sec 453 ms

As you can see, the difference is drastic. The comparison uses an unlikely scenario of splitting a string of characters much larger than any application would ever submit to our function. However, it highlights the strength of the CLR over T-SQL when performing string manipulations. The CLR version is much faster than the T-SQL version. There are two reasons for this. First, the CLR is much faster at string manipulation. The second reason for the performance improvement is that SQL Server will generate a temp table for the T-SQL version that involves I/O to tempdb. This means that the function will not return any results until the function has parsed the entire string and the temp table is completely materialized. In contrast, the CLR version will stream the results as they become available. This makes it even more efficient for large result sets because SQL Server doesn't have to wait for the entire result and it doesn't need to keep the entire result in memory. The stream can be discarded as it is read.

Conclusion

While the CLR is obviously the better choice here, you may not have the option of using the CLR on your instance of SQL Server due to the policies of your team. However, this doesn't mean you that can't benefit from the T-SQL version. Either method can conserve both CPU cycles and memory by reducing compiles/recompiles due to differences in values passed to the IN statement. Before I wrote this article, I didn't really see the benefit of using the integrated CLR on SQL Server. Now I intend to further investigate the types of processes that would most benefit from use of the CLR, as well as the implications of using the CLR on SQL Server.

Resources

Updates

5/23/2007

  • Fixed T-SQL bug: last item in delimited list not returned
  • Fixed Test program bug: StringBuilder not reset after each iteration

6/04/2007

  • Updated Test Project section with results and observations after fixing bug in load test bug

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer STG Utah
United States United States
Mark is a jock turned geek who has been programming since early 2000 where he stumbled on excel macros and dug down to discover VBA. Since 2001 he has been working for MaxPreps.com where he can be a geek who writes web applications for jocks. He has been using C# and ASP.Net since 2002.

Check out Mark's blog: www.developMENTALmadness.com

View Mark Miller's profile on LinkedIn

Comments and Discussions

 
Generalfunction_string_to_table Pin
Corporal Agarn3-Sep-10 3:13
professionalCorporal Agarn3-Sep-10 3:13 
GeneralIs it possible in reverse Pin
Arun kumar patro10-Jul-08 1:02
Arun kumar patro10-Jul-08 1:02 
GeneralRe: Is it possible in reverse Pin
Mark J. Miller10-Jul-08 4:20
Mark J. Miller10-Jul-08 4:20 
GeneralRe: Is it possible in reverse Pin
Arun kumar patro10-Jul-08 19:13
Arun kumar patro10-Jul-08 19:13 
GeneralRe: Is it possible in reverse Pin
Mark J. Miller11-Jul-08 4:46
Mark J. Miller11-Jul-08 4:46 
GeneralRe: Is it possible in reverse Pin
Arun kumar patro12-Jul-08 23:01
Arun kumar patro12-Jul-08 23:01 
GeneralQuestion of curiosity Pin
C. L. Phillip18-Mar-08 3:45
C. L. Phillip18-Mar-08 3:45 
GeneralRe: Question of curiosity Pin
Mark J. Miller18-Mar-08 4:57
Mark J. Miller18-Mar-08 4:57 
GeneralArrays and Lists in SQL Server Pin
cfederl6-Jun-07 16:06
cfederl6-Jun-07 16:06 
GeneralRe: Arrays and Lists in SQL Server Pin
Mark J. Miller7-Jun-07 4:00
Mark J. Miller7-Jun-07 4:00 
GeneralGood example Pin
Håkan Nilsson (k)31-May-07 1:56
Håkan Nilsson (k)31-May-07 1:56 
GeneralRe: Good example Pin
Mark J. Miller4-Jun-07 7:12
Mark J. Miller4-Jun-07 7:12 
QuestionTemp table? Pin
Pete Appleton29-May-07 1:28
Pete Appleton29-May-07 1:28 
AnswerRe: Temp table? Pin
Mark J. Miller29-May-07 4:57
Mark J. Miller29-May-07 4:57 
GeneralDEPLOY fails Pin
Sean Ewington25-May-07 5:58
staffSean Ewington25-May-07 5:58 
GeneralRe: DEPLOY fails Pin
Sean Ewington25-May-07 5:59
staffSean Ewington25-May-07 5:59 
GeneralRe: DEPLOY fails Pin
Sean Ewington25-May-07 5:59
staffSean Ewington25-May-07 5:59 
GeneralRe: DEPLOY fails Pin
Sean Ewington25-May-07 6:00
staffSean Ewington25-May-07 6:00 
Questiona question from Buff Daddy Pin
Sean Ewington25-May-07 5:56
staffSean Ewington25-May-07 5:56 
AnswerRe: a question from Buff Daddy Pin
Sean Ewington25-May-07 5:57
staffSean Ewington25-May-07 5:57 
"Buff Daddy,

Instead of using "IN" use a "JOIN"... either a left outer or a right join depending on what you want returned.


select * from table1 RIGHT join (
select data from dbo.function_string_to_table('1|2|3', '|')
) list on data = sID

It will null rows for the unmatched ids.

HTH"
- HoyaSaxa93
AnswerRe: a question from Buff Daddy Pin
Sean Ewington25-May-07 5:58
staffSean Ewington25-May-07 5:58 
General"load test" application Pin
ClaudiusC12-Apr-07 11:15
ClaudiusC12-Apr-07 11:15 
GeneralRe: "load test" application Pin
Mark J. Miller12-Apr-07 11:37
Mark J. Miller12-Apr-07 11:37 
GeneralRe: "load test" application Pin
ClaudiusC12-Apr-07 12:34
ClaudiusC12-Apr-07 12:34 
GeneralRe: "load test" application Pin
Mark J. Miller16-Apr-07 12:14
Mark J. Miller16-Apr-07 12:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.