65.9K
CodeProject is changing. Read more.
Home

TSQL Function to Replace HTML Tags with Delimiter

Aug 6, 2013

CPOL
viewsIcon

19942

A custom function that will hunt down HTML tags and remove them

Introduction

It is often needed to remove all HTML tags for a column(s) from a database table. One way to do it is regular expression pattern matching or using a custom function that will hunt down HTML tags and remove them. In this small post, I will demonstrate this second method.

T-Sql Function

Here is a very simple function that replaces/removes HTML tags and returns clean texts without using any patterns. Additionally, you can pass a delimiter that will replace the HTML tags in-case you do not want to mix everything in one big chunk of text.

CREATE FUNCTION [dbo].[CleanHTMLTags] (@HTMLText VARCHAR(MAX),@ReplaceChar char(1))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX(‘<’,@HTMLText) SET @End = CHARINDEX(‘>’,@HTMLText,
    CHARINDEX(‘<’,@HTMLText)) SET @Length = (@End – @Start) + 1 
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 4)) <> ‘
‘) AND (UPPER(SUBSTRING(@HTMLText, @Start, 5)) <> ”)
begin
SET @HTMLText = RTRIM(LTRIM(STUFF(@HTMLText,@Start,@Length,@ReplaceChar)));
end
ELSE
SET @Length = 0;
SET @Start = CHARINDEX(‘<’,@HTMLText, @End-@Length) 
    SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<’,@HTMLText, @Start))
SET @Length = (@End – @Start) + 1
END
RETURN isnull(RTRIM(LTRIM(@HTMLText)) ,”)
END

And that is it. Say you have a table named “PRODUCTS” and that contains a column “ProductDetails” that contains HTML tags. In our scenario, we have to remove all HTML tags and replace them by comma(,). Utilizing the above function, we can make the call like this:

SELECT [dbo].[CleanHTMLTags](ProductDetails,’,') FROM PRODUCTS

Resultant data should be a big text respect to each records in “PRODUCTS” table with HTML tags removed.