Click here to Skip to main content
Click here to Skip to main content

String Aggregation in the World of SQL Server

, 4 Dec 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
This article includes examples of T-SQL capabilities that allow to perform string concatenation flexibly and effectively through the use of different constructs.

Introduction

This articles includes examples of T-SQL capabilities that allow to perform string concatenation more flexibly and effectively through the use of other constructs.

Background

In practice, tasks within the scope of concatenating strings into one come across quite often. It is a pity that the T-SQL standard does not provide for the possibility of using string data inside the aggregating SUM function:

Msg 8117, Level 16, State 1, Line 1 
Operand data type char is invalid for sum operator.

As a solution for such tasks, there was added a GROUP_CONCAT function for MySQL, and LISTAGG – for Oracle. SQL Server, in turn, has no such built-in functionality.

However, this shouldn't be considered as a disadvantage, since T-SQL capabilities allow to perform string concatenation more flexibly and effectively through the use of other constructs that will be revealed below.

Using the code

Suppose we need to concatenate several strings separated by comma in one, using data from the following table:

IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL
    DROP TABLE dbo.Chars
GO
CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY)
INSERT INTO dbo.Chars ([Char])
VALUES ('A'), ('B'), ('C'), ('F'), ('D')   

The most obvious solution for this task is using of a cursor:

DECLARE 
      @Chars VARCHAR(100)
    , @Char CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
    SELECT [Char]
    FROM dbo.Chars
OPEN cur
FETCH NEXT FROM cur INTO @Char
WHILE @@FETCH_STATUS = 0 BEGIN
    SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char) 
    FETCH NEXT FROM cur INTO @Char
END
CLOSE cur
DEALLOCATE cur
SELECT @Chars  

Using of a cursor itself causes reduced efficiency of query execution and, at least, doesn't look very elegant.

To avoid the use of a cursor, the strings can be concatenated via variables assigning:

DECLARE @Chars VARCHAR(100)
SELECT @Chars = ISNULL(@Chars + ', ' +[Char], [Char])   
FROM dbo.Chars
SELECT @Chars   

On the one hand, the construct is rather simple; on the other hand, its performance on a large fetch leaves much to be desired.

Thus, this method will be insolvent in case when concatenation inside a SELECT block is required.

In such cases, XML features are resorted to, using the following construct:

SELECT Chars = STUFF((
	SELECT ', ' + [Char]
	FROM dbo.Chars
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '') 
If we look at the execution plan retrieved in dbForge Studio for SQL Server, – we can notice high query cost for the value method:

To avoid this operation, the query can be rewritten using XQuery properties:

SELECT Chars = STUFF(CAST((
	SELECT [text()] = ', ' + [Char]
	FROM dbo.Chars
	FOR XML PATH(''), TYPE) AS
VARCHAR(100)), 1, 2, '')   

As a result, we will get a very simple and fast execution plan:

In principle, string concatenation by a column does not cause special difficulties.

A more interesting situation is when concatenation of several columns is needed. For example, we have the following table:

IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL
    DROP TABLE dbo.EntityValues
GO
CREATE TABLE dbo.EntityValues (
      EntityID INT
    , Value1 CHAR(1)
    , Value2 CHAR(1) 
)
CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID
	ON dbo.EntityValues (EntityID)
GO 
INSERT INTO dbo.EntityValues (EntityID, Value1, Value2)
VALUES 
    (1, 'A', 'X'),
    (2, 'B', 'Y'),
    (2, 'C', 'Z'),
    (2, 'F', 'H'),
    (1, 'D', 'R')  

in which data must be grouped in the following way:

Alternatively, we can copy XML calls, but then we get re-reading, which may significantly affect the query execution efficiency:

SELECT 
      ev.EntityID
    , Values1 = STUFF(CAST((
        SELECT [text()] = ', ' + ev2.Value1
        FROM dbo.EntityValues ev2
        WHERE ev2.EntityID = ev.EntityID
        FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
    , Values2 = STUFF(CAST((
        SELECT [text()] = ', ' + ev2.Value2
        FROM dbo.EntityValues ev2
        WHERE ev2.EntityID = ev.EntityID
        FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')             
FROM ( 
    SELECT DISTINCT EntityID
    FROM dbo.EntityValues
) ev 

This can be easily convinced, if to look at the execution plan:

To reduce re-reading, a small XML hack can be used:

SELECT
      ev.EntityID
    , Values1 = STUFF(
        REPLACE(
            CAST([XML].query('for $a in /a return xs:string($a)') 
        AS VARCHAR(100)), ' ,', ','), 1, 1, '') 
    , Values2 = STUFF(
        REPLACE(
            CAST([XML].query('for $b in /b return xs:string($b)') 
        AS VARCHAR(100)), ' ,', ','), 1, 1, '') 
FROM (
    SELECT DISTINCT EntityID
    FROM dbo.EntityValues
) ev
CROSS APPLY (
    SELECT [XML] = CAST((
        SELECT 
              [a] = ', ' + ev2.Value1
            , [b] = ', ' + ev2.Value2
        FROM dbo.EntityValues ev2
        WHERE ev2.EntityID = ev.EntityID
        FOR XML PATH('')
    ) AS XML)
) t  

But this query won't be optimal as well due to multiple calls of the query method.

Alternative way – using a cursor:

IF
OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
	DROP TABLE #EntityValues
GO
SELECT DISTINCT
	  EntityID
	, Values1 = CAST(NULL AS VARCHAR(100))
	, Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
DECLARE
	  @EntityID INT
	, @Value1 CHAR(1)
	, @Value2 CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
    SELECT
	      EntityID
	    , Value1
	    , Value2
    FROM dbo.EntityValues
OPEN cur
FETCH NEXT FROM cur INTO
	  @EntityID
	, @Value1
	, @Value2
WHILE @@FETCH_STATUS = 0 BEGIN
UPDATE #EntityValues
    SET 
          Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1) 
        , Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2)
    WHERE EntityID = @EntityID
FETCH NEXT FROM cur INTO
          @EntityID
        , @Value1
        , @Value2
END
CLOSE cur
DEALLOCATE cur
SELECT *
FROM #EntityValues 

However, as practice has shown, when working with large ETL packages, the most efficient solution is the ability to assign variables in the UPDATE statement:

IF
OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
	DROP TABLE #EntityValues
GO
DECLARE
      @Values1 VARCHAR(100)
    , @Values2 VARCHAR(100)
SELECT
      EntityID
    , Value1
    , Value2
    , RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0)
    , Values1 = CAST(NULL AS VARCHAR(100))
    , Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
UPDATE #EntityValues
SET 
      @Values1 = Values1 =
        CASE WHEN RowNum = 1 
            THEN Value1
            ELSE @Values1 + ', ' + Value1 
        END
    , @Values2 = Values2 = 
        CASE WHEN RowNum = 1 
            THEN Value2
            ELSE @Values2 + ', ' + Value2 
        END
SELECT
      EntityID
    , Values1 = MAX(Values1) 
    , Values2 = MAX(Values2)
FROM #EntityValues
GROUP BY EntityID 

SQL Server has no built-in alternative to GROUP_CONCAT and LISTAGG functions. Nevertheless, this doesn't prevent performing string concatenation tasks efficiently. The objective of this article is to show it clearly.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

No Biography provided

Comments and Discussions

 
QuestionDivision by Zero? PinmemberTom Pester3-Feb-14 4:15 
AnswerRe: Division by Zero? PinmemberSergey Syrovatchenko11-Feb-14 3:56 
QuestionGood PinmemberVishal_jj7-Jan-14 23:18 
QuestionFor XML - sample 1 performance Pinmembervl-sher15-Dec-13 21:57 
AnswerRe: For XML - sample 1 performance PinmemberSergey Syrovatchenko18-Dec-13 4:29 
SuggestionAnother possible approach - user-defined aggregates PinmemberMarek Grzenkowicz10-Dec-13 10:41 
GeneralRe: Another possible approach - user-defined aggregates PinmemberSergey Syrovatchenko12-Dec-13 3:37 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.141220.1 | Last Updated 4 Dec 2013
Article Copyright 2013 by Sergey Syrovatchenko
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid