This small tips shows an example how to build a small SQL Server function utilizing a Common Table Expression (CTE) to replace undesired characters from an input string.
Introduction
This tip shows an example how to build a small SQL Server function utilizing a Common Table Expression (CTE) to replace undesired characters from an input string. While this function can be used in a SELECT
statement, it can also be used for example in UPDATE
statements to sanitize data or even in WHERE
clauses. When used as a part of a WHERE
clause, please bear in mind that the performance may degrade dramatically, just like with any function when used as a condition.
The same functionality could be achieved in many different ways, like with regular expression, T-SQL loops, etc, so this is just one way to do this.
Using the input data
So the goal is to create the function described above. Let's break the task into pieces. First thing we need to do, is to use the input string and the string of characters to replace, as data in a SELECT
statement. This is simple to do as a variable can be used as a column without any tables in a query. Consider the following example
BEGIN
DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
@charsToReplace nvarchar(100) = '!"#¤%&/()=?',
@replacement nvarchar(1) = '';
SELECT @stringToReplace, @charsToReplace
END;
This returns our data nicely on a single row
(No column name) (No column name)
this% contains ! illegal/ chars !"#¤%&/()=?
For simplicity, I've used both variables in their own queries later in this tip.
Creating the CTE
This part is the essential content for this tip. Let's have a look at the whole CTE statement
BEGIN
DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
@charsToReplace nvarchar(100) = '!"#¤%&/()=?',
@replacement nvarchar(1) = '';
WITH
CharsToReplace (Chars) AS (
SELECT @charsToReplace
),
InputData (InputString) AS (
SELECT @stringToReplace
),
ReplaceLoop (Position, SingleChar, OutputString) AS (
SELECT 1 AS Position,
SUBSTRING(ctr.Chars, 1, 1) AS SingleChar,
REPLACE(id.InputString,
SUBSTRING(ctr.Chars, 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY InputData id
UNION ALL
SELECT rl.Position + 1 AS Position,
SUBSTRING(ctr.chars, rl.position + 1, 1) AS SingleChar,
REPLACE(rl.OutputString,
SUBSTRING(ctr.chars, rl.Position + 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY ReplaceLoop rl
WHERE LEN(ctr.Chars) > rl.Position
)
SELECT *
FROM ReplaceLoop rl
END;
I take it that you're already familiar with common table expressions and especially recursive CTE's. If not, please have a look at WITH common_table_expression (Transact-SQL).
So the first two CTE tables are just to define separate tables for our inputs. The replace loop is a recursive query looping through all the characters that need to be replaced. The anchoring statement extracts the first character to replace and makes the replacement with the desired character. Result of the replacement will then be used by the subsequent iterations as each iteration extracts the next character to replace and produces it's own output for the next iteration.
So if this statement is executed, the output will be
Position SingleChar OutputString
1 ! this% contains illegal/ chars
2 " this% contains illegal/ chars
3 # this% contains illegal/ chars
4 ¤ this% contains illegal/ chars
5 % this contains illegal/ chars
6 & this contains illegal/ chars
7 / this contains illegal chars
8 ( this contains illegal chars
9 ) this contains illegal chars
10 = this contains illegal chars
11 ? this contains illegal chars
So all the hard work is done by the query.
Selecting the desired result
We definitely don't want all the data the query is returning, only the final result is interesting when all the replacements are done. To achieve this, position column can be utilized. If we simply order the result data descending based on the position and return the first row (highest position) we get the last result.
Consider the following
BEGIN
DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
@charsToReplace nvarchar(100) = '!"#¤%&/()=?',
@replacement nvarchar(1) = '';
WITH
CharsToReplace (Chars) AS (
SELECT @charsToReplace
),
InputData (InputString) AS (
SELECT @stringToReplace
),
ReplaceLoop (Position, SingleChar, OutputString) AS (
SELECT 1 AS Position,
SUBSTRING(ctr.Chars, 1, 1) AS SingleChar,
REPLACE(id.InputString,
SUBSTRING(ctr.Chars, 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY InputData id
UNION ALL
SELECT rl.Position + 1 AS Position,
SUBSTRING(ctr.chars, rl.position + 1, 1) AS SingleChar,
REPLACE(rl.OutputString,
SUBSTRING(ctr.chars, rl.Position + 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY ReplaceLoop rl
WHERE LEN(ctr.Chars) > rl.Position
)
SELECT *
FROM ReplaceLoop rl
ORDER BY rl.position DESC
OFFSET 0 ROWS
FETCH FIRST 1 ROWS ONLY;
END;
With the ORDER BY
clause in place the result is
Position SingleChar OutputString
11 ? this contains illegal chars
Now the output on the single row is just what we wanted.
Implementation of the function
The last thing is to wrap this into a function. Basically the only things to do are
- Define the function declaration with parameters
- Query the desired data and return it
The whole function would look like this
CREATE FUNCTION ReplaceChars(@stringToReplace nvarchar(MAX),
@charsToReplace nvarchar(100),
@replacement nvarchar(1))
RETURNS nvarchar(MAX) AS
BEGIN
DECLARE @returnData nvarchar(MAX);
WITH
CharsToReplace (Chars) AS (
SELECT @charsToReplace
),
InputData (InputString) AS (
SELECT @stringToReplace
),
ReplaceLoop (Position, SingleChar, OutputString) AS (
SELECT 1 AS Position,
SUBSTRING(ctr.Chars, 1, 1) AS SingleChar,
REPLACE(id.InputString,
SUBSTRING(ctr.Chars, 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY InputData id
UNION ALL
SELECT rl.Position + 1 AS Position,
SUBSTRING(ctr.chars, rl.position + 1, 1) AS SingleChar,
REPLACE(rl.OutputString,
SUBSTRING(ctr.chars, rl.Position + 1, 1),
@replacement) AS OutputString
FROM CharsToReplace ctr
CROSS APPLY ReplaceLoop rl
WHERE LEN(ctr.Chars) > rl.Position
)
SELECT @returnData = rl.OutputString
FROM ReplaceLoop rl
ORDER BY rl.position DESC
OFFSET 0 ROWS
FETCH FIRST 1 ROWS ONLY;
RETURN (@returnData);
END;
Let's give it a try. If we use the same input as before, the query would like
SELECT dbo.ReplaceChars('this% contains ! illegal/ chars', '!"#¤%&/()=?', '')
and the result would be
(No column name)
this contains illegal chars
As said in the beginning, this function can be used in many places. For example if you want to use it in an UPDATE
statement sanitizing the data, the query could look something like
UPDATE MyTable
SET MyColumn = dbo.ReplaceChars(MyColumn, '!"#¤%&/()=?', '');
History