This SQL split function is intended for users who want a
string split to function like a human wants it to, not how a computer thinks it should. It addresses several problems with split functions that have vexed me over the years, including:
- Trims the input for each returned item, so a comma separated list of
ints that looks like "
1, 2, 3" (note the spaces after the commas) will be properly parsed down to
- Returns the input as
int with low overhead so items are not bound by
varchar(8000) or forced to one
- Supports delimiters of a size greater than 1 character (at the cost of performance).
- Removes blank entries, so "
1,,,2" will just return
There are moderately rigorous unit tests included in the script that highlight use cases.
The code should compile and run without modification, however, review the TODO comments for additional details. Specifically:
- If you are on SQL2012, there is a better
int parse that can be used.
- The current code is case sensitive, if you want to make it insensitive, follow the steps in the TODO comments.
- The concept of using a non-persistent computed column to pre-parse the data can be extended to other types, I just wired it up for
ints since that was the most common use on our applications.
Using the Code
There are two functions,
[dbo].[FN_SPLIT](@TextToSplit, @Delimiter) and an overloaded version to make splitting
[dbo].[FN_SPLIT_INT] (@TextToSplit). Both return a table containing the split values.
Note in this example that items that can be parsed as an
int are returned in the computed column of data type
int [item_int]. Note that the length of things like "
x " are trimmed so the returned length will be
1, and that the blank entries between
,,,) are removed.
SELECT *, LEN(item) as [item_len]
FROM [dbo].[FN_SPLIT]('1, x ,-2, y, 3.0, 4,,,5',',');
|item ||item_int ||item_len |
|1 ||1 ||1 |
|x ||NULL ||1 |
|-2 ||-2 ||2 |
|y ||NULL ||1 |
|3.0 ||NULL ||3 |
|4 ||4 ||1 |
|5 ||5 ||1 |
Note in this example, using the overloaded
int version, we do not need to pass in the delimiter, comma is assumed, and only values that are valid integers are returned.
FROM [dbo].[FN_SPLIT_INT]('1, x ,-2, y, 3.0, 4,,,5');
In this example, we are splitting based on a multi-character delimiter.
select [item] FROM [dbo].[FN_SPLIT]('a[tag]b[tag][tag]d', '[tag]');
In this example, we are double splitting on two separate delimiters,
[/tag] (imagine a case for pseudo-XML). Note the white space between the tags is properly trimmed off. While this type of
string manipulation in SQL is not ideal, if you have to do it at the database layer, as opposed to the application or some other layer, this may be a practical solution.
select f2.item , len(f2.item) as item_len
FROM [dbo].[FN_SPLIT](' [tag]a[/tag] [tag]b[/tag] ', '[tag]') f1
CROSS APPLY [dbo].[FN_SPLIT](f1.item, '[/tag]') f2
Explanation of the Code
- There is a giant "
IF" block that parses using 1 of 2 methods based on delimiter size. If the delimiter is 1 character,
CHARINDEX is used, more than one and the slower
PATINDEX is used.
DECLARE @del_len int
SELECT @del_len = LEN(@Delimiter)
IF @del_len = 1
SELECT @ix = CHARINDEX(@Delimiter, @Text)
SELECT @ix = PATINDEX(@del_pat, @Text)
- The return type is always
nvarchar(max) for each item, so there is not a data type dependency for what is returned.
- In the returned table, there is a non-persistent computed column to cast the returned value to an
int. This is low overhead in that if you do not select this column in the results, as would be the case if you do not need it cast to an
int, that code is not executed. This is how we achieve the balance between flexibility, durability and convenience.
RETURNS @retTable TABLE
[item] nvarchar(max) COLLATE DATABASE_DEFAULT NOT NULL,
[item_int] as (
WHEN LEN(item) > 11 THEN NULL
WHEN ISNUMERIC([item]) = 1 AND [item] NOT LIKE '%.%' _
- For each returned item, the value is trimmed, so we can eliminate
null/blank/whitespace values that are usually not wanted after a split operation.
SET @item = LTRIM(RTRIM(SUBSTRING(@Text,@pix,@text_len)))
IF @item <> '' AND @item IS NOT NULL
INSERT INTO @retTable(item) VALUES (@item)
- When doing the
PATINDEX split, we escape the delimiter so if it contains [,% or _, those values will be treated as literals, not as wildcards.
SELECT @del_pat = '%' + REPLACE(REPLACE(REPLACE(@Delimiter
, '_', '[_]')
- We do a pre-check, so if the item has no delimiters, it is returned with very low overhead, or if the input parameters are invalid, we return quickly with no results.
IF RTRIM(ISNULL(@Text,'')) = '' OR RTRIM(ISNULL(@Delimiter,'')) = ''
- It may sound silly, but a big value-add is the unit tests at the bottom of the script. In all of the other split functions I ran through those tests, they would not return the results I was desiring, and in many cases, a flat-out error. Feel free to test your own functions against the cases outlined and see if the results are what your project would expect.
Points of Interest
There are quite a few unit tests included in the script. Some of the performance can be tested via unit tests as well, adjust
@BigTextItemCount to examine how the function scales on your system. As noted above, feel free to extend the return table to support other data types depending on your application needs.
- 10 October 2013: Initial public draft
- 10 October 2013: Updated to include download link, link was missing
- 11 October 2013: Added two additional examples to highlight other features of the function
- 11 October 2013: As requested, added more of an explanation of the code
- 01 February 2015: Default to SQL 2012 version of
- 25 August 2015: Convert input into
varchar(8000) to avoid a truncation error on
[item] values greater than
TRY_CONVERT's input max value.
- 12 November 2015: Bug Fix: Space delimiter is not honored correctly. Updated code and tests to support a delimiter of space and tab. Added tests for space, tab, new line and carriage return. Thanks to Jesse who found the bug!