Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Small function to replace characters with SQL Server

4.86/5 (5 votes)
13 May 2024CPOL3 min read 6.9K   88  
A small function utilizing CTE to replace undesired characters from a string
This small tips shows an example how to build a small SQL Server function utilizing a Common Table Expression (CTE) to replace undesired characters from an input string.

Introduction

This tip shows an example how to build a small SQL Server function utilizing a Common Table Expression (CTE) to replace undesired characters from an input string. While this function can be used in a SELECT statement, it can also be used for example in UPDATE statements to sanitize data or even in WHERE clauses. When used as a part of a WHERE clause, please bear in mind that the performance may degrade dramatically, just like with any function when used as a condition.

The same functionality could be achieved in many different ways, like with regular expression, T-SQL loops, etc, so this is just one way to do this.

Using the input data

So the goal is to create the function described above. Let's break the task into pieces. First thing we need to do, is to use the input string and the string of characters to replace, as data in a SELECT statement. This is simple to do as a variable can be used as a column without any tables in a query. Consider the following example

SQL
BEGIN
   DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
           @charsToReplace nvarchar(100)  = '!"#¤%&/()=?',
           @replacement nvarchar(1)       = '';

         SELECT @stringToReplace, @charsToReplace
END;

This returns our data nicely on a single row

(No column name)                  (No column name)
this% contains ! illegal/ chars   !"#¤%&/()=?

For simplicity, I've used both variables in their own queries later in this tip.

Creating the  CTE

This part is the essential content for this tip. Let's have a look at the whole CTE statement

SQL
BEGIN
   DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
           @charsToReplace nvarchar(100)  = '!"#¤%&/()=?',
           @replacement nvarchar(1)       = '';

   WITH
      CharsToReplace (Chars) AS (
         SELECT @charsToReplace
      ),
      InputData (InputString) AS (
         SELECT @stringToReplace
      ),
      ReplaceLoop (Position, SingleChar, OutputString) AS (
         SELECT 1                                AS Position,
                SUBSTRING(ctr.Chars, 1, 1)       AS SingleChar,
                REPLACE(id.InputString,
                     SUBSTRING(ctr.Chars, 1, 1),
                     @replacement)               AS OutputString
         FROM        CharsToReplace ctr
         CROSS APPLY InputData      id
         UNION ALL
         SELECT rl.Position + 1                                AS Position,
                SUBSTRING(ctr.chars, rl.position + 1, 1)       AS SingleChar,
                REPLACE(rl.OutputString,
                     SUBSTRING(ctr.chars, rl.Position + 1, 1),
                     @replacement)                             AS OutputString
         FROM        CharsToReplace ctr
      CROSS APPLY ReplaceLoop    rl
         WHERE LEN(ctr.Chars) > rl.Position
      )
   SELECT *
   FROM ReplaceLoop rl
END;

I take it that you're already familiar with common table expressions and especially recursive CTE's. If not, please have a look at WITH common_table_expression (Transact-SQL).

So the first two CTE tables are just to define separate tables for our inputs. The replace loop is a recursive query looping through all the characters that need to be replaced. The anchoring statement extracts the first character to replace and makes the replacement with the desired character. Result of the replacement will then be used by the subsequent iterations as each iteration extracts the next character to replace and produces it's own output for the next iteration.

So if this statement is executed, the output will be

Position   SingleChar   OutputString
1          !            this% contains  illegal/ chars
2          "            this% contains  illegal/ chars
3          #            this% contains  illegal/ chars
4          ¤            this% contains  illegal/ chars
5          %            this contains  illegal/ chars
6          &            this contains  illegal/ chars
7          /            this contains  illegal chars
8          (            this contains  illegal chars
9          )            this contains  illegal chars
10         =            this contains  illegal chars
11         ?            this contains  illegal chars

So all the hard work is done by the query.

Selecting the desired result

We definitely don't want all the data the query is returning, only the final result is interesting when all the replacements are done. To achieve this, position column can be utilized. If we simply order the result data descending based on the position and return the first row (highest position) we get the last result.

Consider the following

SQL
BEGIN
   DECLARE @stringToReplace nvarchar(MAX) = 'this% contains ! illegal/ chars',
           @charsToReplace nvarchar(100)  = '!"#¤%&/()=?',
           @replacement nvarchar(1)       = '';

   WITH
      CharsToReplace (Chars) AS (
         SELECT @charsToReplace
      ),
      InputData (InputString) AS (
         SELECT @stringToReplace
      ),
      ReplaceLoop (Position, SingleChar, OutputString) AS (
         SELECT 1                                AS Position,
                SUBSTRING(ctr.Chars, 1, 1)       AS SingleChar,
                REPLACE(id.InputString,
                     SUBSTRING(ctr.Chars, 1, 1),
                     @replacement)               AS OutputString
         FROM        CharsToReplace ctr
         CROSS APPLY InputData      id
         UNION ALL
         SELECT rl.Position + 1                                AS Position,
                SUBSTRING(ctr.chars, rl.position + 1, 1)       AS SingleChar,
                REPLACE(rl.OutputString,
                     SUBSTRING(ctr.chars, rl.Position + 1, 1),
                     @replacement)                             AS OutputString
         FROM        CharsToReplace ctr
      CROSS APPLY ReplaceLoop    rl
         WHERE LEN(ctr.Chars) > rl.Position
      )
   SELECT *
   FROM ReplaceLoop rl
   ORDER BY rl.position DESC
   OFFSET 0 ROWS
   FETCH FIRST 1 ROWS ONLY;
END;

With the ORDER BY clause in place the result is

Position   SingleChar   OutputString
11         ?            this contains  illegal chars

Now the output on the single row is just what we wanted.

Implementation of the function

The last thing is to wrap this into a function. Basically the only things to do are

  • Define the function declaration with parameters
  • Query the desired data and return it

The whole function would look like this

SQL
CREATE FUNCTION ReplaceChars(@stringToReplace nvarchar(MAX),
                             @charsToReplace nvarchar(100),
                             @replacement nvarchar(1))
RETURNS nvarchar(MAX) AS
BEGIN
   DECLARE @returnData  nvarchar(MAX);

   WITH
      CharsToReplace (Chars) AS (
         SELECT @charsToReplace
      ),
      InputData (InputString) AS (
         SELECT @stringToReplace
      ),
      ReplaceLoop (Position, SingleChar, OutputString) AS (
         SELECT 1                                AS Position,
                SUBSTRING(ctr.Chars, 1, 1)       AS SingleChar,
                REPLACE(id.InputString,
                     SUBSTRING(ctr.Chars, 1, 1),
                     @replacement)               AS OutputString
         FROM        CharsToReplace ctr
         CROSS APPLY InputData      id
         UNION ALL
         SELECT rl.Position + 1                                AS Position,
                SUBSTRING(ctr.chars, rl.position + 1, 1)       AS SingleChar,
                REPLACE(rl.OutputString,
                     SUBSTRING(ctr.chars, rl.Position + 1, 1),
                     @replacement)                             AS OutputString
         FROM        CharsToReplace ctr
         CROSS APPLY ReplaceLoop    rl
         WHERE LEN(ctr.Chars) > rl.Position
      )
   SELECT @returnData = rl.OutputString
   FROM ReplaceLoop rl
   ORDER BY rl.position DESC
   OFFSET 0 ROWS
   FETCH FIRST 1 ROWS ONLY;

   RETURN (@returnData);
END;

Let's give it a try. If we use the same input as before, the query would like

SQL
SELECT dbo.ReplaceChars('this% contains ! illegal/ chars', '!"#¤%&/()=?', '')

and the result would be

(No column name)
this contains  illegal chars

As said in the beginning, this function can be used in many places. For example if you want to use it in an UPDATE statement sanitizing the data, the query could look something like

SQL
UPDATE MyTable
SET MyColumn = dbo.ReplaceChars(MyColumn, '!"#¤%&/()=?', '');

History

  • 13th May, 2024: Created.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)