Click here to Skip to main content
14,693,330 members
Articles » Database » Database » General
Tip/Trick
Posted 21 Sep 2014

Stats

135.9K views
15 bookmarked

Convert RTF to Plain Text (Revised Again)

Rate me:
Please Sign up or sign in to vote.
4.91/5 (15 votes)
9 Apr 2016CPOL
Handling for hex expressions and the trailing '}'

Introduction

Most solutions to convert RTF to plain text with pure T-SQL don't handle special characters like German umlauts and all the other special characters above ASCII(128) because they are not embedded in RTF tags but noted as escaped hex values. Also most of these solutions leave a trailing '}' at the end of the converted text. This revised procedure will solve these problems.

Background

Searching the web for a T-SQL procedure to convert RTF-formatted text to plain text, you'll find a lot of matches. Mainly, there are 2 methods described: the first one uses the RichtextCtrl control with the need to reconfigure SQL server settings to allow access to OLE/COM which might be a problem in environments with high security guidelines (e.g. http://www.experts-exchange.com/Database/MS-SQL-Server/Q_27633014.html). The second one will be found in some slightly different versions which all produce results with restrictions as described above (e.g. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90034).

Using the Code

Add the following SQL function to your database:

USE [<YourDatabaseNameHere>]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[RTF2Text]
(
    @rtf nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @Pos1 int;
    DECLARE @Pos2 int;
    DECLARE @hex varchar(316);
    DECLARE @Stage table
    (
        [Char] char(1),
        [Pos] int
    );

    INSERT @Stage
        (
           [Char]
         , [Pos]
        )
    SELECT SUBSTRING(@rtf, [Number], 1)
         , [Number]
      FROM [master]..[spt_values]
     WHERE ([Type] = 'p')
       AND (SUBSTRING(@rtf, Number, 1) IN ('{', '}'));

    SELECT @Pos1 = MIN([Pos])
         , @Pos2 = MAX([Pos])
      FROM @Stage;

    DELETE
      FROM @Stage
     WHERE ([Pos] IN (@Pos1, @Pos2));

    WHILE (1 = 1)
        BEGIN
            SELECT TOP 1 @Pos1 = s1.[Pos]
                 , @Pos2 = s2.[Pos]
              FROM @Stage s1
                INNER JOIN @Stage s2 ON s2.[Pos] > s1.[Pos]
             WHERE (s1.[Char] = '{')
               AND (s2.[Char] = '}')
            ORDER BY s2.[Pos] - s1.[Pos];

            IF @@ROWCOUNT = 0
                BREAK

            DELETE
              FROM @Stage
             WHERE ([Pos] IN (@Pos1, @Pos2));

            UPDATE @Stage
               SET [Pos] = [Pos] - @Pos2 + @Pos1 - 1
             WHERE ([Pos] > @Pos2);

            SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
        END

    SET @rtf = REPLACE(@rtf, '\pard', '');
    SET @rtf = REPLACE(@rtf, '\par', '');
    SET @rtf = STUFF(@rtf, 1, CHARINDEX(' ', @rtf), '');

    WHILE (Right(@rtf, 1) IN (' ', CHAR(13), CHAR(10), '}'))
      BEGIN
        SELECT @rtf = SUBSTRING(@rtf, 1, (LEN(@rtf + 'x') - 2));
        IF LEN(@rtf) = 0 BREAK
      END
    
    SET @Pos1 = CHARINDEX('\''', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            IF @Pos1 > 0
                BEGIN
                    SET @hex = '0x' + SUBSTRING(@rtf, @Pos1 + 2, 2);
                    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), _
CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
                    SET @Pos1 = CHARINDEX('\''', @rtf);
                END
        END

    SET @rtf = @rtf + ' ';

    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            SET @Pos2 = CHARINDEX(' ', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                SET @Pos2 = CHARINDEX('\', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                BEGIN
                    SET @rtf = SUBSTRING(@rtf, 1, @Pos1 - 1);
                    SET @Pos1 = 0;
                END
            ELSE
                BEGIN
                    SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
                    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);
                END
        END

    IF RIGHT(@rtf, 1) = ' '
        SET @rtf = SUBSTRING(@rtf, 1, LEN(@rtf) -1);

    RETURN @rtf;
END

When copying the above code to SQL don't forget to remove the underscore (wich is only required in codeproject to break long lines)!

To convert any RTF-formatted content, call the function above passing the RTF content as parameter of type nvarchar(max):

SELECT [<YourRTFColumnNameHere>]
     , [dbo].[RTF2Text]([<YourRTFColumnNameHere>]) AS [TextFromRTF]
  FROM [dbo].[<YourDatabaseNameHere>]

The function returns the converted text as nvarchar(max) too.

More improvements may be added. If you find any RTF part that isn't covered by the function above, please drop a line here.

Thanks

Thanks to all the authors in the web that have posted their solutions until now and therefore deserve the applause. I simply enhanced these solutions to complete the basic conversion.

Thanks also to all users here posting their tips to make the procedure more robust.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

NightWizzard
Software Developer (Senior)
Germany Germany
30+ years experience as developer with VB.NET, VB, VBA, VBScript, C#, WPF, WinForms, JavaScript, jQuery, PHP, Delphi, ADO, ADO.NET, ASP.NET, Silverlight, HTML, CSS, XAML, XML, T-SQL, MySQL, MariaDb, MS-ACCESS, dBase, OLE/COM, ActiveX, SEPA/DTAUS, ZUGFeRD, DATEV Format and DATEVconnect, DSGVO, TNT Web-API, MS-Office Addins, etc., including:
- 10+ years experience as developer and freelancer
- 10+ years experience as team leader
- 13+ years experience with CRM solutions

Comments and Discussions

 
QuestionStruggling with embeded urls Pin
SparkythePilgrim9-Jun-20 16:34
MemberSparkythePilgrim9-Jun-20 16:34 
QuestionStrips first word on non-RTF values Pin
Member 1469268914-Jan-20 5:54
MemberMember 1469268914-Jan-20 5:54 
QuestionThis is great - thanks very much [+small suggestion] Pin
Member 1412753123-Jan-19 7:22
MemberMember 1412753123-Jan-19 7:22 
QuestionA little help on this RTF Pin
Member 114390624-Dec-18 0:56
MemberMember 114390624-Dec-18 0:56 
SuggestionVery nice component Pin
Mystcreater1-Aug-18 7:58
MemberMystcreater1-Aug-18 7:58 
QuestionThe code does not process group {} Pin
Win32nipuh30-Apr-18 0:10
professionalWin32nipuh30-Apr-18 0:10 
QuestionGreat, but problem with unicode Pin
Win32nipuh16-Apr-18 5:01
professionalWin32nipuh16-Apr-18 5:01 
Questionoutput look like chinese.. Pin
sieb12316-Apr-18 5:04
Membersieb12316-Apr-18 5:04 
QuestionBest approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Code_1_Dreamer22-Mar-18 11:47
MemberCode_1_Dreamer22-Mar-18 11:47 
AnswerRe: Best approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Win32nipuh16-Apr-18 5:02
professionalWin32nipuh16-Apr-18 5:02 
QuestionSome parsing errors when using line breaks /line Pin
Norbert Haberl7-Mar-18 5:44
MemberNorbert Haberl7-Mar-18 5:44 
AnswerRe: Some parsing errors when using line breaks /line Pin
Win32nipuh24-Apr-18 22:42
professionalWin32nipuh24-Apr-18 22:42 
Questioncompatibility with SQL 11.0..... Pin
calwil6-Feb-18 9:56
Membercalwil6-Feb-18 9:56 
QuestionSome cases in which it fails Pin
Member 132487038-Jun-17 7:46
MemberMember 132487038-Jun-17 7:46 
AnswerRe: Some cases in which it fails Pin
NightWizzard9-Jun-17 0:01
MemberNightWizzard9-Jun-17 0:01 
AnswerRe: Some cases in which it fails Pin
blhf24-Apr-18 18:15
Memberblhf24-Apr-18 18:15 
AnswerRe: Some cases in which it fails Pin
ronschuster9-Jul-19 8:04
Memberronschuster9-Jul-19 8:04 
Questionbullets and table Pin
Syed Sumair19-Mar-17 4:27
MemberSyed Sumair19-Mar-17 4:27 
AnswerRe: bullets and table Pin
NightWizzard20-Mar-17 3:26
MemberNightWizzard20-Mar-17 3:26 
GeneralRe: bullets and table Pin
Syed Sumair21-Mar-17 5:10
MemberSyed Sumair21-Mar-17 5:10 
GeneralRe: bullets and table Pin
NightWizzard21-Mar-17 5:44
MemberNightWizzard21-Mar-17 5:44 
QuestionBig thanks Pin
Member 1305089110-Mar-17 3:11
MemberMember 1305089110-Mar-17 3:11 
AnswerRe: Big thanks Pin
NightWizzard10-Mar-17 5:43
MemberNightWizzard10-Mar-17 5:43 
QuestionInvalid length parameter passed to the LEFT or SUBSTRING function FIX Pin
Member 1292776529-Dec-16 18:45
MemberMember 1292776529-Dec-16 18:45 
AnswerRe: Invalid length parameter passed to the LEFT or SUBSTRING function FIX Pin
NightWizzard29-Dec-16 22:59
MemberNightWizzard29-Dec-16 22:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.