Click here to Skip to main content
Email Password   helpLost your password?

SQL using IN

The SQL IN statement allows returning a set of records based on a variable -- i.e. unknown at design time -- number of keys. However, IN doesn't allow parameters. This leaves us with either fixed lists or dynamic SQL statements. The problem with fixed lists is that they don't provide the flexibility needed at times. However, dynamic SQL requires the statement to be compiled and a new plan to be cached for each new set of keys. The result is that CPU and memory resources are required to compile and cache queries that will never be reused. See my article on parameterized SQL for more details.

It would be nice to be able to pass an array of values as a parameter to a stored procedure or ad hoc statement, but T-SQL doesn't support arrays because it is a set-based language. When I first learned about table variables in SQL 2000, I thought they were the answer. I quickly discovered, however, that you cannot define them as parameters. They only exist as local variables and their scope is limited to the batch where they are declared. However, I did eventually discover I could use table-valued functions to create and return a table variable.

Table-valued functions

Table valued functions have been available since SQL 2000. A table-valued function is a user-defined function that returns a result set as a table variable. The output can be the result of some SELECT statement or a temp table populated within the function. A user-defined function, i.e. table-valued or scalar, takes a set of parameters just like a stored procedure and returns a result without side-effects. DML operations are not allowed. This means we can pass a parameter to a table-valued function that returns a result set without resorting to dynamic SQL. The temp table returned by our function will act as an array of values that can be referenced by the IN clause. Also, because there's no dynamic SQL involved, the function itself will not cause our stored procedure to be recompiled or our ad hoc batch to be compiled every time it is submitted.

CREATE FUNCTION [dbo].[function_string_to_table]
(
    @string VARCHAR(MAX),
    @delimiter CHAR(1)
)
RETURNS @output TABLE(
    data VARCHAR(256)
)
BEGIN

    DECLARE @start INT, @end INT
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)

    WHILE @start < LEN(@string) + 1 BEGIN
        IF @end = 0 
            SET @end = LEN(@string) + 1

        INSERT INTO @output (data) 
        VALUES(SUBSTRING(@string, @start, @end - @start))
        SET @start = @end + 1
        SET @end = CHARINDEX(@delimiter, @string, @start)
    END

    RETURN

END

The above function takes VARCHAR(MAX) and CHAR(1) as parameters. The VARCHAR is the list of keys, concatenated and sent as a parameter. The CHAR allows for the use of any desired character as a delimiter, i.e. comma, pipe, colon, semi-colon or whatever you may choose. The function uses the @delimiter parameter to parse the @string parameter and insert each delimited value into the defined temp table (@output). Now we can use the function as follows:

--JUST RETURN THE KEYS

SELECT * FROM dbo.function_string_to_table('key1|key2|key3|key4|key5', '|')

--USE THE FUNCTION AS A SUBQUERY

SELECT * FROM table1 WHERE sID IN (SELECT * 
    FROM dbo.function_string_to_table('key1|key2|key3|key4|key5', '|'))

As an additional benefit, you can now pass an "array" to an SQL server-stored procedure or batch. Take the following stored procedure as an example:

CREATE PROCEDURE procedure_takes_array
    @string VARCHAR(MAX),
    @delimiter CHAR(1)
AS

    SELECT * FROM table1 WHERE sID IN 
    (SELECT * FROM dbo.function_string_to_table(@string, @delimiter))

GO

This can then be called from a client application as follows:

void GetData(string[] keys)
{
    using (SqlConnection oconn = new SqlConnection
        ("Application Name=SQLArrayTest;Server=(local);
    Database=MaxPreps_v2;Trusted_Connection=Yes;"))
    {
        oconn.Open();

        string SQL = "SELECT * FROM 
        dbo.function_string_to_table(@keys, '|')";
                    
        using (SqlCommand cmd = oconn.CreateCommand())
        {
            cmd.CommandText = sql;
            cmd.CommandType = CommandType.Text;
            cmd.Parameters.Add("@keys", SqlDbType.VarChar, -1);
            cmd.Parameters["@keys"].Value = string.Join("|", keys);

            //execute the command

            SqlDataReader rdr = cmd.ExecuteReader();
            rdr.Close();
        }
    }
}

But this functionality does not come without a price. T-SQL is not very efficient at string manipulation because it is largely a procedural operation, not a set-based operation. The above technique is not an expensive one, other than the cost in time due to the manual splitting of the string. So, it will not rob SQL server of its much-needed resources. However, if your process runs frequently, you may want to test and make sure this method has acceptable performance.

Using the CLR

When I first started writing this article, I debated whether to break this into two separate articles instead of one single article. Even though SQL Server 2005 is almost a year old now, it seems as though any article on the topic of the integrated CLR should provide some background first. I suspect that DBAs are still not comfortable with the idea and so it has not been enabled on most systems. What I found when researching the CLR for this article is that a good understanding of where the CLR can help and where it will hurt will go a long way towards reaping great benefits from this new feature. I can't speak for the security implications at this point in time, but as you will see, the CLR can provide huge benefits in performance over T-SQL operations like the one in the previous section. Just make sure that you understand when and where to use the CLR. For some background, check out this whitepaper: Using CLR Integration in SQL Server 2005.

First, you'll need to make sure the CLR has been enabled on your test server. If it has not already been enabled, run the following script:

EXEC sp_configure 'show advanced options' , '1';
go
reconfigure;
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;

Now to write our CLR table-valued function we'll use Visual Studio 2005. Follow these steps:

  1. Open VS 2005 and select File �> New �> Project�
  2. Expand the node ( [+] ) next to "Visual C#" and select "Database"
  3. Select "SQL Server Project" and name your project "SQLArray"
  4. Click "OK"
  5. If you created a database reference before, select the reference you want to use for debugging and deployment. If you have not, you will be prompted for this information. In that case, enter the server name and authentication credentials, and select the database you want to use.
  6. To create a user-defined function, select Project �> Add user-defined function�
  7. Name the file "SQLArray" and click "Add"

Now replace the contents of the file with the following code:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;

public partial class UserDefinedFunctions
{
    [SqlFunction(Name="clrfunction_string_to_table", 
    FillRowMethodName="FillRow", TableDefinition="ID NVARCHAR(50)")]
    public static IEnumerable SqlArray(SqlString str, SqlChars delimiter)
    {
        //return single element array if no delimiter is specified

        if(delimiter.Length == 0)
            return new string[1] { str.Value };


        //split the string and return a string array

        return str.Value.Split(delimiter[0]);
    }

    public static void FillRow(object row, out SqlString str)
    {
        //set the column value

        str = new SqlString( (string) row );
    }
}

First, to point out a few things: The method that will be used for the "main" function must return System.Collections.IEnumerable, so a using statement has been added for the System.Collections namespace. Also, the SqlFunction attribute must define the following values:

Now build the project. To deploy it, select Build �> Deploy SqlArray. Now you can call the table-valued function just like you would any other function:

--JUST RETURN THE KEYS

SELECT * FROM dbo.clrfunction_string_to_table
        ('key1|key2|key3|key4|key5', '|')

--USE THE FUNCTION AS A SUBQUERY

SELECT * FROM table1 WHERE sID IN 
    (SELECT * FROM dbo.clrfunction_string_to_table
        ('key1|key2|key3|key4|key5', '|'))

The result is much faster, as we'll see next.

Test project

The test project is just a simple "load test" application that connects to the database and repeatedly calls the function. It does not use the function to return results from any table, however. The reason for this is to demonstrate the raw difference in performance between the T-SQL and the CLR versions of the function. My test sample was 1000 iterations and below are the results:

Iterations: 10,000
- T-SQL (avg): 10.3543 sec
- CLR (avg): 7.2886

The CLR version is, on average, several seconds faster. Both versions are certainly viable solutions, though. This means that if you're using SQL 2000 or if the CLR hasn't (or won't) be enabled for your SQL 2005 servers, you can of course use the T-SQL version with little overall difference. As a side note, I accidentally discovered something interesting. Claudius Ceteras pointed out that in my original load test application I had forgotten to reset the StringBuilder object. So, with each iteration of the load test, the string of keys was increasing by as many as 25 new keys. Check out these results below:

Test Project (1000 Iterations)
- T-SQL 17 min 20 sec 765 ms
- CLR 0 min 3 sec 453 ms

As you can see, the difference is drastic. The comparison uses an unlikely scenario of splitting a string of characters much larger than any application would ever submit to our function. However, it highlights the strength of the CLR over T-SQL when performing string manipulations. The CLR version is much faster than the T-SQL version. There are two reasons for this. First, the CLR is much faster at string manipulation. The second reason for the performance improvement is that SQL Server will generate a temp table for the T-SQL version that involves I/O to tempdb. This means that the function will not return any results until the function has parsed the entire string and the temp table is completely materialized. In contrast, the CLR version will stream the results as they become available. This makes it even more efficient for large result sets because SQL Server doesn't have to wait for the entire result and it doesn't need to keep the entire result in memory. The stream can be discarded as it is read.

Conclusion

While the CLR is obviously the better choice here, you may not have the option of using the CLR on your instance of SQL Server due to the policies of your team. However, this doesn't mean you that can't benefit from the T-SQL version. Either method can conserve both CPU cycles and memory by reducing compiles/recompiles due to differences in values passed to the IN statement. Before I wrote this article, I didn't really see the benefit of using the integrated CLR on SQL Server. Now I intend to further investigate the types of processes that would most benefit from use of the CLR, as well as the implications of using the CLR on SQL Server.

Resources

Updates

5/23/2007

6/04/2007

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralIs it possible in reverse
Arun kumar patro
2:02 10 Jul '08  
Article is very good.
Another query
I pass a table and it reurns Comma seperated string.
is it possible ?

sdfsfs fvvvvvc

GeneralRe: Is it possible in reverse
Mark J. Miller
5:20 10 Jul '08  
Yes and no. You cannot pass a table or table variable to a stored procedure (unless you are using the 2008 SQL Server beta). However, you can return a comma separated string from a table.

DECLARE @csv VARCHAR(MAX)
SELECT @csv = ISNULL(@csv, '') + CASE WHEN @csv IS NULL THEN '' ELSE ',' END + column1 FROM table1 WHERE NULLIF(column1, '') is not null

If you wanted to be able to get the csv from any table, you could use dynamic sql to build a string using the column and table names as variables.

Code responsibly: OWASP.org Mark's blog: developMENTALmadness.blogspot.com

GeneralRe: Is it possible in reverse
Arun kumar patro
20:13 10 Jul '08  
Thanks It solve my problem. It is supper sol to my problem. Previously I try it using while loop. your query gives me good performance. Thanks.

Is it possible to write a function or SP to pass resultof a subquiery which return One column morethen one row.

Ex-

Select Orderid,Ordervalue,(Select itemid from orderdetail where orderid=M.Orderid) From ORDER M.

OID OValue ItemId

O1 123.23 I1,I2,I3
O2 456.12 I2,I3

Order No – O1,O2
Item No – I1,I2,I3


Directly i want to pass subquery into a function. Is it possible?
I want to write some businesslogic in function after that i want to represent in string.

sdfsfs fvvvvvc

GeneralRe: Is it possible in reverse
Mark J. Miller
5:46 11 Jul '08  
You want to return a set of orders, with an additional column which consists of a CSV of the order detail id values? I'm not sure what your goal is. You can, but you may not want to.

I would return two resultsets from the same proceedure.

SELECT * FROM Order WHERE ....

SELECT * FROM OrderDetail D
INNER JOIN Order O ON O.OrderID = D.OrderID
WHERE ... same as first where clause ...

Then, if you are using ADO.NET you have a number of options:
1) Fill a dataset using a DataAdapter. Then you can loop thru the Order records like this:
for(int i = 0; i < myDataSet.Tables[0].Rows.Count;i++)
{
myDataSet.Tables[0].Rows[i];
DataRow[] details = myDataSet.Tables[0].Rows[i].GetRows();
}

I can't remember if you have to define a datarelation or not before you can call GetRows(), but you'd be able to return all your data in one batch and you can navigate it heirarchically.

2) Using a DataReader you can loop thru the order records and store them in a local data structure (collection) and then call the DataReader's NextResult method (or is it NextRecordset?) to read the detail records from the same data reader.

However, if you're set on returning the resultset as a CSV value in a column you could create a scalar function (not a TVF) for the CSV value which returns a VARCHAR value and use it like this:

SELECT OrderId, OrderValue, dbo.GetOrderDetailIds(OrderId) AS ItemIds FROM Order WHERE ....

GetOrderDetailIds would contain the logic to create the CSV value for a single order.

Code responsibly: OWASP.org Mark's blog: developMENTALmadness.blogspot.com

GeneralRe: Is it possible in reverse
Arun kumar patro
0:01 13 Jul '08  
Thanks,
Following query is working for me.

SELECT OrderId, OrderValue, dbo.GetOrderDetailIds(OrderId) AS ItemIds FROM Order WHERE ....

Thanks for your time.
Arun Kumar Patro.

sdfsfs fvvvvvc

GeneralQuestion of curiosity
C. L. Phillip
4:45 18 Mar '08  
why did you choose to take a VArChar(Max) as input and return a VarChar(256) as output ? Why not keep they both as VarChar(Max) ?
GeneralRe: Question of curiosity
Mark J. Miller
5:57 18 Mar '08  
I wrote it to parse short character values. If the function returned columns of type VARCHAR(MAX) it would imply it supported columns of that type. The problem with that is that the function only supports a single character delimiter and doesn't support quoted text. The parsing would have to be more robust (and less efficient) if it actually supported output of VARCHAR(MAX). In truth, 256 characters might even be too many as it is because if data where passed in which were close to 256 characters in length, there is a high likelyhood that the delimiting character could be within a field causing incorrect output.

As for the input, varchar(max) is appropriate so as not to limit the number of "rows" which could be passed to the function.


GeneralArrays and Lists in SQL Server
cfederl
17:06 6 Jun '07  
For the topic of Arrays and Lists in SQL Server, the SQL Server MVPs have done extensive investigations, developed nine different solutions and benchmarked each of the solutions. See Erland Sommarskog's page http://www.sommarskog.se/arrays-in-sql.html


Carl Federl

GeneralRe: Arrays and Lists in SQL Server
Mark J. Miller
5:00 7 Jun '07  
Thanks!! I haven't had time to read the whole article, but I got about half-way through the 2005 article. Next time I update my article, I'll reference Erland's as a must read reference.
GeneralGood example
Hkan Nilsson
2:56 31 May '07  
Very good 5-point article! Easy to read and use of the sample.

I've also been a little confused about using the CLR inside SQL and I'm still a little bit suspicious, because of the CLR integration. Since SQL Server want's all resources by itself the CLR runs inside the SQL Server processes I think. When it's time to upgrade the CLR on none SQL Server machines for a system, then it sounds normal to ugrade it on the SQL Server as well. I think this will require a SQL Server service pack because the CLR for SQL Server should not be installed from Visual Studio.

Håkan Nilsson
-- Swedish developer, DBA, C#, SQL Server, soccer trainer, C64 player

GeneralRe: Good example
Mark J. Miller
8:12 4 Jun '07  
You're correct. SQL Server decides which version of the CLR to load. So it doesn't matter which versions are actually installed on the machine along with SQL Server, SQL Server will only use the version it was built for. So MS will decide which version it uses and it will require a patch to SQL Server to change the CLR version.
QuestionTemp table?
Pete Appleton
2:28 29 May '07  
Thanks for the article - nicely written, and an excellent use of CLR integration.

With SQL 2000, my approach has always been to create a temp table, insert each value into it, and then use that as the parameter list. My experience has been that this is faster than using the delimited-string parameter technique, despite the multiple round trips to DB. Have you done any tests on the temp. table approach? I'd be interested to see somebody else's results with that.

--
What's a signature?

AnswerRe: Temp table?
Mark J. Miller
5:57 29 May '07  
No, I haven't tried that approach. Both processes involve a temp table and inserting the values one by one. Basically, you're trading the overhead string parsing for network round trips. An insert statement is the classic example of a trivial plan which involves little or no envolvement from the SQL optimizer. So your network becomes the variable. If you're testing on your local machine (client on same macine as server) then the comparison will not be accurate unless your production environment actually envolves the same setup. By testing with the client local to the server the client will be using Shared Memory which will have little or no latency for your insert statements.

But I'll give it a shot and let you know how it turns out. I'm going to guess that the network is not going to perform as well as parsing a single string, but I've been wrong before.
GeneralDEPLOY fails
Sean Ewington
6:58 25 May '07  
I cannot seem to deploy the application. Any ideas?

"In quiet and silence, the truth is made clear."
- Mike DiRenzo
GeneralRe: DEPLOY fails
Sean Ewington
6:59 25 May '07  
The project is configured to deploy to AdventureWorks on the (local) default instance of SQL Server using Windows Authentication. You'll need to make sure that all these exist in order to deploy.

These are not requirements, just what I deemed to be the most common demoniator to allow for deployment. You can deploy the assembly to any SQL 2005 installation (not SQL 2000 - this is a requirement as 2000 does not support the CLR) and to any database for which you have adequate permissions.

If this does not solve the problem you'll have to provide an error message or more details than just "I can't deploy".
- Mark J. Miller
GeneralRe: DEPLOY fails
Sean Ewington
6:59 25 May '07  
I got the deploy to work but not from within VS 2005. I use SQL 2005 MMC to do it. choose a database, select Programmability, select Assemblies. Right click Assemblies, choose new, browse to your assembely (dll), select it.

Once added, right click the assembly, choose Script Assembly as, choose Create, I chose Clipboard and copied the code into my query session.

No how do I call the function? Do I create a stored proc and do the following?

Create Procedure SQLArray
as
EXTERNAL NAME SqlServerProject1.SQLArray.SqlArray

What about your parameter string "2323|232|ttrr" - how does it get passed in?


"In quiet and silence, the truth is made clear."
- Mike DiRenzo
GeneralRe: DEPLOY fails
Sean Ewington
7:00 25 May '07  
"ALTER FUNCTION for "clrfunction_string_to_table" failed because T-SQL and CLR types for parameter "@str" do not match."

To avoid the error above when attempting the DDL to create the function or stored proc (in my case, I used a udf), I had to modify the datatype in the parameter signature:

Where as before I was using "@str varchar(500), @delimiter char(1)", I changed it to "nvarchar" and "nchar" respectively and I was able to fire the DDL with no errors.

Here is how you create the function:

CREATE FUNCTION dbo.clrfunction_string_to_table (@str nvarchar(500), @delimiter nchar(1) )
RETURNS Table (id nvarchar(50))
AS
EXTERNAL NAME SqlServerProject1.UserDefinedFunctions.SQLArray


The caller:
SELECT * FROM dbo.clrfunction_string_to_table('key1|key2|key3|key4|key5', '|')


The Result:
id
--------------------------------------------------
key1
key2
key3
key4
key5





"In quiet and silence, the truth is made clear."
- Mike DiRenzo
Questiona question from Buff Daddy
Sean Ewington
6:56 25 May '07  
"First off, you've written an excellent article on a practically unknown and rarely used feature in SQL, and I'm sure will have a "I KNEW there was a way to do this!" response from many people who read it!

Now my question... Is there a way to return a row for each variable in the [In] statement, whether the record could be found or not? For example, let's say that only the first and second variable could be found in this statement when executed:

SELECT * FROM table1 WHERE sID IN ('1', '2','3')


What I would like to know is how do I return 3 rows even though only 2 were found? For example, in my app, when someone wants to search for multiple record numbers, the results page has to show the records found and also which records weren't, such as:

Record Date
1 1/1/06
2 3/5/05
3 Record Not Found

I know I could do it on the client side, but is there a better way of doing it?"
- Buff Daddy
AnswerRe: a question from Buff Daddy
Sean Ewington
6:57 25 May '07  
"Buff Daddy,

Instead of using "IN" use a "JOIN"... either a left outer or a right join depending on what you want returned.


select * from table1 RIGHT join (
select data from dbo.function_string_to_table('1|2|3', '|')
) list on data = sID

It will null rows for the unmatched ids.

HTH"
- HoyaSaxa93
AnswerRe: a question from Buff Daddy
Sean Ewington
6:58 25 May '07  
"The nice thing about TVF's is that they behave like tables so you can use them anywhere in the FROM clause, not just in the WHERE clause as a subquery. So you can do any of the following

SELECT * FROM dbo.function_string_to_table('1,2,3', ',') FN
LEFT JOIN table1 T1 ON T1.sID = FN.ID

-- or --

SELECT * FROM table1 T1
RIGHT JOIN dbo.function_string_to_table('1,2,3', ',') FN ON FN.ID = T1.sID

Or an inner join - or however else you need to use it like you would a table.

I hope that helps."

- Mark J. Miller
General"load test" application
Toastbrot
12:15 12 Apr '07  
btw, what does your load test application look like?
The strings need to be huge when t-sql needs over one second per iteration...
GeneralRe: "load test" application
Mark J. Miller
12:37 12 Apr '07  
Download the source code - the test project is included. The test project generates a string of 1 to 25 keys and builds them into a string. That string is the argument for the function.

Please, run the T-SQL test and let me know what your results are. My machine is by no means lacking in power, but I must admit I was surprised at the difference.
GeneralRe: "load test" application
Toastbrot
13:34 12 Apr '07  
I had a look at the code. you're not reseting the Stringbuilder, so the String gets longer and longer.

lets see...

outer loop: 1000
count: average 12
key: average length 3.889 (10/1, 90/2, 900/3, 9000/4) + 4 ("|Key") = 7.889

so in the average case you're filling the Stringbuilder with max=1000*12*7.889 > 90kB in the last iteration. and this means you're creating about 1000^2*0.5*(max/1000) = about 45 MB string data just by calling list.ToString()...

this means: depending on how often you run the test the memory could be used up and the garbage collector had to run all the time during the T-SQL-Test whicht made all worse...

please change your code like this:

DateTime start = DateTime.Now;

// CHANGES START HERE
Random rnd = new Random(1234); // seed, so every test generated same numbers!
for (int i = 0; i < 1000; i++)
{
list.Length = 0; // reset Stringbuilder
//create list of random number of keys (between 5 and 25)
int count = 5 + rnd.Next(21);
//CHANGES END HERE

for (int j = 0; j < count; j++)


and run the tests again and post the results, i can't do it at the moment - sorry.
i guess the results will change a lot, T-SQL may be even faster...

and now that i know what the garbage collector had to do... i guess 1000 iteration will not be enough to really see a difference. start with 10000 iterations and work upwards if necessary...


waiting for your results, with kind regards...
GeneralRe: "load test" application
Mark J. Miller
13:14 16 Apr '07  
Thanks, for pointing that out! Interesting that the CLR version was SO much faster even though so much data was being submitted.

However, the test was unrealistic. So I made the changes you recommended and here are the results:

Iterations: 10,000
T-SQL (avg): 10.3543 sec
CLR (avg): 7.2886

Each test was run 10 times and the result was taken. These results are the average of all 10 tests.

The conclusion: The T-SQL code performed much more like I expected in the first place, however the CLR is much more efficient at string manipulation. Which shows a good case for when the CLR should be used over T-SQL.


I'll be out of town for 2 weeks, but when I get back I'll post the changes.
GeneralRe: "load test" application
Hkan Nilsson
3:31 31 May '07  
Here is my results:

Iterations: 10,000
T-SQL (avg): 8.14 sec
CLR (avg): 2.65 sec

using the changed code and debug mode in VS



Håkan Nilsson
-- Swedish developer, DBA, C#, SQL Server, soccer trainer, C64 player


Last Updated 6 Jun 2007 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010