Click here to Skip to main content
15,358,692 members
Articles / Mobile Apps / Windows Mobile
Tip/Trick
Posted 21 Sep 2014

Stats

196.7K views
15 bookmarked

Convert RTF to Plain Text (Revised Again)

Rate me:
Please Sign up or sign in to vote.
4.91/5 (15 votes)
9 Apr 2016CPOL1 min read
Handling for hex expressions and the trailing '}'

Introduction

Most solutions to convert RTF to plain text with pure T-SQL don't handle special characters like German umlauts and all the other special characters above ASCII(128) because they are not embedded in RTF tags but noted as escaped hex values. Also most of these solutions leave a trailing '}' at the end of the converted text. This revised procedure will solve these problems.

Background

Searching the web for a T-SQL procedure to convert RTF-formatted text to plain text, you'll find a lot of matches. Mainly, there are 2 methods described: the first one uses the RichtextCtrl control with the need to reconfigure SQL server settings to allow access to OLE/COM which might be a problem in environments with high security guidelines (e.g. http://www.experts-exchange.com/Database/MS-SQL-Server/Q_27633014.html). The second one will be found in some slightly different versions which all produce results with restrictions as described above (e.g. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90034).

Using the Code

Add the following SQL function to your database:

USE [<YourDatabaseNameHere>]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[RTF2Text]
(
    @rtf nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @Pos1 int;
    DECLARE @Pos2 int;
    DECLARE @hex varchar(316);
    DECLARE @Stage table
    (
        [Char] char(1),
        [Pos] int
    );

    INSERT @Stage
        (
           [Char]
         , [Pos]
        )
    SELECT SUBSTRING(@rtf, [Number], 1)
         , [Number]
      FROM [master]..[spt_values]
     WHERE ([Type] = 'p')
       AND (SUBSTRING(@rtf, Number, 1) IN ('{', '}'));

    SELECT @Pos1 = MIN([Pos])
         , @Pos2 = MAX([Pos])
      FROM @Stage;

    DELETE
      FROM @Stage
     WHERE ([Pos] IN (@Pos1, @Pos2));

    WHILE (1 = 1)
        BEGIN
            SELECT TOP 1 @Pos1 = s1.[Pos]
                 , @Pos2 = s2.[Pos]
              FROM @Stage s1
                INNER JOIN @Stage s2 ON s2.[Pos] > s1.[Pos]
             WHERE (s1.[Char] = '{')
               AND (s2.[Char] = '}')
            ORDER BY s2.[Pos] - s1.[Pos];

            IF @@ROWCOUNT = 0
                BREAK

            DELETE
              FROM @Stage
             WHERE ([Pos] IN (@Pos1, @Pos2));

            UPDATE @Stage
               SET [Pos] = [Pos] - @Pos2 + @Pos1 - 1
             WHERE ([Pos] > @Pos2);

            SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
        END

    SET @rtf = REPLACE(@rtf, '\pard', '');
    SET @rtf = REPLACE(@rtf, '\par', '');
    SET @rtf = STUFF(@rtf, 1, CHARINDEX(' ', @rtf), '');

    WHILE (Right(@rtf, 1) IN (' ', CHAR(13), CHAR(10), '}'))
      BEGIN
        SELECT @rtf = SUBSTRING(@rtf, 1, (LEN(@rtf + 'x') - 2));
        IF LEN(@rtf) = 0 BREAK
      END
    
    SET @Pos1 = CHARINDEX('\''', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            IF @Pos1 > 0
                BEGIN
                    SET @hex = '0x' + SUBSTRING(@rtf, @Pos1 + 2, 2);
                    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), _
CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
                    SET @Pos1 = CHARINDEX('\''', @rtf);
                END
        END

    SET @rtf = @rtf + ' ';

    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            SET @Pos2 = CHARINDEX(' ', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                SET @Pos2 = CHARINDEX('\', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                BEGIN
                    SET @rtf = SUBSTRING(@rtf, 1, @Pos1 - 1);
                    SET @Pos1 = 0;
                END
            ELSE
                BEGIN
                    SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
                    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);
                END
        END

    IF RIGHT(@rtf, 1) = ' '
        SET @rtf = SUBSTRING(@rtf, 1, LEN(@rtf) -1);

    RETURN @rtf;
END

When copying the above code to SQL don't forget to remove the underscore (wich is only required in codeproject to break long lines)!

To convert any RTF-formatted content, call the function above passing the RTF content as parameter of type nvarchar(max):

SQL
SELECT [<YourRTFColumnNameHere>]
     , [dbo].[RTF2Text]([<YourRTFColumnNameHere>]) AS [TextFromRTF]
  FROM [dbo].[<YourDatabaseNameHere>]

The function returns the converted text as nvarchar(max) too.

More improvements may be added. If you find any RTF part that isn't covered by the function above, please drop a line here.

Thanks

Thanks to all the authors in the web that have posted their solutions until now and therefore deserve the applause. I simply enhanced these solutions to complete the basic conversion.

Thanks also to all users here posting their tips to make the procedure more robust.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

NightWizzard
Software Developer (Senior)
Germany Germany
30+ years experience as developer with VB.NET, VB, VBA, VBScript, C#, WPF, WinForms, JavaScript, jQuery, PHP, Delphi, ADO, ADO.NET, ASP.NET, Silverlight, HTML, CSS, XAML, XML, T-SQL, MySQL, MariaDb, MS-ACCESS, dBase, OLE/COM, ActiveX, SEPA/DTAUS, ZUGFeRD, DATEV Format and DATEVconnect, DSGVO, TNT Web-API, MS-Office Addins, etc., including:
- 10+ years experience as developer and freelancer
- 10+ years experience as team leader
- 13+ years experience with CRM solutions

Comments and Discussions

 
QuestionGreat code but fails when there is text in between { and } as in my example below. Could you help? Pin
sureshppaul15-Mar-21 0:09
Membersureshppaul15-Mar-21 0:09 
QuestionMessage Closed Pin
28-Feb-21 11:21
Memberrunbikelive28-Feb-21 11:21 
QuestionStruggling with embeded urls Pin
SparkythePilgrim9-Jun-20 15:34
MemberSparkythePilgrim9-Jun-20 15:34 
QuestionStrips first word on non-RTF values Pin
Member 1469268914-Jan-20 4:54
MemberMember 1469268914-Jan-20 4:54 
QuestionThis is great - thanks very much [+small suggestion] Pin
Member 1412753123-Jan-19 6:22
MemberMember 1412753123-Jan-19 6:22 
QuestionA little help on this RTF Pin
Member 114390623-Dec-18 23:56
MemberMember 114390623-Dec-18 23:56 
SuggestionVery nice component Pin
Mystcreater1-Aug-18 6:58
MemberMystcreater1-Aug-18 6:58 
QuestionThe code does not process group {} Pin
Win32nipuh29-Apr-18 23:10
professionalWin32nipuh29-Apr-18 23:10 
QuestionGreat, but problem with unicode Pin
Win32nipuh16-Apr-18 4:01
professionalWin32nipuh16-Apr-18 4:01 
Questionoutput look like chinese.. Pin
sieb12316-Apr-18 4:04
Membersieb12316-Apr-18 4:04 
first i have to apologise since im far from an expert here.
i have an rtf example in the sql database
like this:
{\rtf1\ansi\ansicpg1252\deff0\deflang1043{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}} \viewkind4\uc1\pard\f0\fs17 [Alfred Postma]\par 14/11 Reminder\par \par \par [Alfred Postma]\par On Hold\par \par \par [Alfred Postma]\par Tussen reminder 5/2\par \par \par [Alfred Postma]\par Tussen reminder 6/2\par \par \par [Alfred Postma]\par Heien week 22\par \par \par [Alfred Postma]\par Achteruit\par \par \par [Alfred Postma]\par Boren week 26\par \par \par [Alfred Postma]\par 3/4 Reminder\par \par \par [Alfred Postma]\par Klaar\par \par } ýÿ

when i try to use your function the result looks like chinese(?). could this be a database setting or is this function just not suited for MS project 2016?
the result:
屻瑲ㅦ慜獮屩湡楳灣ㅧ㔲尲敤晦尰敤汦湡ㅧ㐰笳晜湯瑴汢屻て晜楮屬捦慨獲瑥‰楍牣獯景⁴慓獮匠牥晩紻ൽ尊楶睥楫摮尴捵就慰摲晜尰獦㜱嬠汁牦摥倠獯浴嵡灜牡਍㐱ㄯ‱敒業摮牥灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲伊潈摬灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲吊獵敳敲業摮牥㔠㈯灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲吊獵敳敲業摮牥㘠㈯灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲䠊楥湥眠敥㈲灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲䄊档整畲瑩灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲䈊牯湥眠敥㘲灜牡਍灜牡਍灜牡਍䅛晬敲⁤潐瑳慭屝慰൲㌊㐯删浥湩敤屲慰൲尊慰൲尊慰൲嬊汁牦摥倠獯浴嵡灜牡਍汋慡屲慰൲尊慰൲
QuestionBest approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Code_1_Dreamer22-Mar-18 10:47
MemberCode_1_Dreamer22-Mar-18 10:47 
AnswerRe: Best approach is to use a .NET component such as System.Windows.Forms.RichTextBox Pin
Win32nipuh16-Apr-18 4:02
professionalWin32nipuh16-Apr-18 4:02 
QuestionSome parsing errors when using line breaks /line Pin
Norbert Haberl7-Mar-18 4:44
MemberNorbert Haberl7-Mar-18 4:44 
AnswerRe: Some parsing errors when using line breaks /line Pin
Win32nipuh24-Apr-18 21:42
professionalWin32nipuh24-Apr-18 21:42 
Questioncompatibility with SQL 11.0..... Pin
calwil6-Feb-18 8:56
Membercalwil6-Feb-18 8:56 
QuestionSome cases in which it fails Pin
Member 132487038-Jun-17 6:46
MemberMember 132487038-Jun-17 6:46 
AnswerRe: Some cases in which it fails Pin
NightWizzard8-Jun-17 23:01
MemberNightWizzard8-Jun-17 23:01 
AnswerRe: Some cases in which it fails Pin
blhf24-Apr-18 17:15
Memberblhf24-Apr-18 17:15 
AnswerRe: Some cases in which it fails Pin
ronschuster9-Jul-19 7:04
Memberronschuster9-Jul-19 7:04 
Questionbullets and table Pin
Syed Sumair19-Mar-17 3:27
MemberSyed Sumair19-Mar-17 3:27 
AnswerRe: bullets and table Pin
NightWizzard20-Mar-17 2:26
MemberNightWizzard20-Mar-17 2:26 
GeneralRe: bullets and table Pin
Syed Sumair21-Mar-17 4:10
MemberSyed Sumair21-Mar-17 4:10 
GeneralRe: bullets and table Pin
NightWizzard21-Mar-17 4:44
MemberNightWizzard21-Mar-17 4:44 
QuestionBig thanks Pin
Member 1305089110-Mar-17 2:11
MemberMember 1305089110-Mar-17 2:11 
AnswerRe: Big thanks Pin
NightWizzard10-Mar-17 4:43
MemberNightWizzard10-Mar-17 4:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.