Click here to Skip to main content
Click here to Skip to main content

T-SQL: Most Practical Split Function

, 11 Oct 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
A SQL String Split function for the real world.

Introduction

This SQL split function is intended for users who want a string split to function like a human wants it to, not how a computer thinks it should. It addresses several problems with split functions that have vexed me over the years, including:

  1. Trims the input for each returned item, so a comma separated list of ints that looks like "1, 2, 3" (note the spaces after the commas) will be properly parsed down to ints.
  2. Returns the input as nvarchar(max) or int with low overhead so items are not bound by varchar(8000) or forced to one datatype.
  3. Supports delimiters of a size greater than 1 character (at the cost of performance).
  4. Removes blank entries, so "1,,,2" will just return 1 and 2.

There are moderately rigorous unit tests included in the script that highlight use cases.

Background

The code should compile and run without modification, however, review the TODO comments for additional details. Specifically:

  • If you are on SQL2012, there is a better int parse that can be used.
  • The current code is case sensitive, if you want to make it insensitive, follow the steps in the TODO comments.
  • The concept of using a non-persistent computed column to pre-parse the data can be extended to other types, I just wired it up for ints since that was the most common use on our applications.

Using the Code

There are two functions, [dbo].[FN_SPLIT](@TextToSplit, @Delimiter) and an overloaded version to make splitting ints easier, [dbo].[FN_SPLIT_INT] (@TextToSplit). Both return a table containing the split values.

Example 1

Note in this example that items that can be parsed as an int are returned in the computed column of data type int [item_int]. Note that the length of things like " x " are trimmed so the returned length will be 1, and that the blank entries between 4 and 5 (,,,) are removed.

SELECT *, LEN(item) as [item_len] 
FROM [dbo].[FN_SPLIT]('1,  x  ,-2, y, 3.0, 4,,,5',',');
item item_int item_len
1 1 1
x NULL 1
-2 -2 2
y NULL 1
3.0 NULL 3
4 4 1
5 5 1

Example 2

Note in this example, using the overloaded int version, we do not need to pass in the delimiter, comma is assumed, and only values that are valid integers are returned.

SELECT * 
FROM [dbo].[FN_SPLIT_INT]('1,  x  ,-2, y, 3.0, 4,,,5');

item_int
1
-2
4
5

Example 3

In this example, we are splitting based on a multi-character delimiter.

select [item] FROM [dbo].[FN_SPLIT]('a[tag]b[tag][tag]d', '[tag]');

item
a
b
d

Example 4

In this example, we are double splitting on two separate delimiters, [tag] and [/tag] (imagine a case for pseudo-XML). Note the white space between the tags is properly trimmed off. While this type of sting manipulation in SQL is not ideal, if you have to do it at the database layer, as opposed to the application or some other layer, this may be a practical solution.

select f2.item , len(f2.item) as item_len
FROM [dbo].[FN_SPLIT]('  [tag]a[/tag]  [tag]b[/tag]  ', '[tag]') f1
CROSS APPLY [dbo].[FN_SPLIT](f1.item, '[/tag]') f2
item item_len
a 1
b 1

Explanation of the Code

  • There is a giant "IF" block that parses using 1 of 2 methods based on delimiter size. If the delimiter is 1 character, CHARINDEX is used, more than one and the slower PATINDEX is used.
    DECLARE @del_len int -- Delimiter length
    SELECT @del_len = LEN(@Delimiter)
    IF @del_len = 1  -- CHARINDEX MODE (Much faster than PATINDEX mode)
        SELECT @ix = CHARINDEX(@Delimiter, @Text)
    ELSE -- PATINDEX Mode (SLOW!) Try to pass in text that uses single char delimiters when possible
        SELECT @ix = PATINDEX(@del_pat, @Text)
  • The return type is always nvarchar(max) for each item, so there is not a data type dependency for what is returned.
  • In the returned table, there is a non-persistent computed column to cast the returned value to an int. This is low overhead in that if you do not select this column in the results, as would be the case if you do not need it cast to an int, that code is not executed. This is how we achieve the balance between flexibility, durability and convenience.
    RETURNS @retTable TABLE 
    (
        -- Output table definition
        [item] nvarchar(max) COLLATE DATABASE_DEFAULT NOT NULL,
    
        -- Since most of the times we split, we are splitting lists of ints, this makes that process easier.
        -- Since the column is non persistent it should only be evaluated when requested (low overhead).
        [item_int] as (
            -- SQL 2012 version, better, use if possible
            -- TRY_CONVERT([int], NULLIF([item],'')) -- SQL 2012 Format, faster and safer, but pre-2012 code provided as well...
    
            -- Pre SQL 2012 syntax.  Not as reliable, so use 2012 when possible by commenting out this CAST and using the TRY_CONVERT above        
            CAST(
                CASE 
                    WHEN LEN(item) > 11 THEN NULL  -- LEN OF (-2147483648) is 11.  Simple out of bounds checking.
                    WHEN ISNUMERIC([item]) = 1 AND [item] NOT LIKE '%.%' THEN [item] -- Ensure value conforms to int
                    ELSE null
                END 
            as int)
        )
    ) 
  • For each returned item, the value is trimmed, so we can eliminate null/blank/whitespace values that are usually not wanted after a split operation.
    SET @item = LTRIM(RTRIM(SUBSTRING(@Text,@pix,@text_len)))
    IF @item <> '' AND @item IS NOT NULL  -- Only save non empty values
        INSERT INTO @retTable(item) VALUES (@item)
  • When doing the PATINDEX split, we escape the delimiter so if it contains [,% or _, those values will be treated as literals, not as wildcards.
    -- Escape characters that will mess up the like clause, and wrap in wild cards %
    SELECT @del_pat = '%' + REPLACE(REPLACE(REPLACE(@Delimiter
        , '[','[[]')
        , '%','[%]')
        , '_', '[_]') 
        + '%'
  • We do a pre-check, so if the item has no delimiters, it is returned with very low overhead, or if the input parameters are invalid, we return quickly with no results.
    IF RTRIM(ISNULL(@Text,'')) = '' OR RTRIM(ISNULL(@Delimiter,'')) = ''
        RETURN
  • It may sound silly, but a big value-add is the unit tests at the bottom of the script. In all of the other split functions I ran through those tests, they would not return the results I was desiring, and in many cases, a flat-out error. Feel free to test your own functions against the cases outlined and see if the results are what your project would expect.

Points of Interest

There are quite a few unit tests included in the script. Some of the performance can be tested via unit tests as well, adjust @BigTextItemCount to examine how the function scales on your system. As noted above, feel free to extend the return table to support other data types depending on your application needs.

History

  • 10 October 2013: Initial public draft
  • 10 October 2013: Updated to include download link, link was missing
  • 11 October 2013: Added two additional examples to highlight other features of the function
  • 11 October 2013: As requested, added more of an explanation of the code.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Brad Joss
Software Developer (Senior)
United States United States
No Biography provided

Comments and Discussions

 
QuestionThanks! Pinmembernirkki30-Dec-13 1:58 
GeneralMy vote of 5 PinprofessionalBrian A Stephens11-Oct-13 4:00 
SuggestionExplanation for the source code PinmvpMika Wendelius11-Oct-13 0:02 
GeneralRe: Explanation for the source code PinprofessionalBrad Joss11-Oct-13 8:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.1411023.1 | Last Updated 11 Oct 2013
Article Copyright 2013 by Brad Joss
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid