Click here to Skip to main content
15,305,754 members
Articles / Mobile Apps / Windows Mobile
Tip/Trick
Posted 21 Sep 2014

Stats

190.9K views
15 bookmarked

Convert RTF to Plain Text (Revised Again)

Rate me:
Please Sign up or sign in to vote.
4.91/5 (15 votes)
9 Apr 2016CPOL1 min read
Handling for hex expressions and the trailing '}'

Introduction

Most solutions to convert RTF to plain text with pure T-SQL don't handle special characters like German umlauts and all the other special characters above ASCII(128) because they are not embedded in RTF tags but noted as escaped hex values. Also most of these solutions leave a trailing '}' at the end of the converted text. This revised procedure will solve these problems.

Background

Searching the web for a T-SQL procedure to convert RTF-formatted text to plain text, you'll find a lot of matches. Mainly, there are 2 methods described: the first one uses the RichtextCtrl control with the need to reconfigure SQL server settings to allow access to OLE/COM which might be a problem in environments with high security guidelines (e.g. http://www.experts-exchange.com/Database/MS-SQL-Server/Q_27633014.html). The second one will be found in some slightly different versions which all produce results with restrictions as described above (e.g. http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90034).

Using the Code

Add the following SQL function to your database:

USE [<YourDatabaseNameHere>]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[RTF2Text]
(
    @rtf nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
    DECLARE @Pos1 int;
    DECLARE @Pos2 int;
    DECLARE @hex varchar(316);
    DECLARE @Stage table
    (
        [Char] char(1),
        [Pos] int
    );

    INSERT @Stage
        (
           [Char]
         , [Pos]
        )
    SELECT SUBSTRING(@rtf, [Number], 1)
         , [Number]
      FROM [master]..[spt_values]
     WHERE ([Type] = 'p')
       AND (SUBSTRING(@rtf, Number, 1) IN ('{', '}'));

    SELECT @Pos1 = MIN([Pos])
         , @Pos2 = MAX([Pos])
      FROM @Stage;

    DELETE
      FROM @Stage
     WHERE ([Pos] IN (@Pos1, @Pos2));

    WHILE (1 = 1)
        BEGIN
            SELECT TOP 1 @Pos1 = s1.[Pos]
                 , @Pos2 = s2.[Pos]
              FROM @Stage s1
                INNER JOIN @Stage s2 ON s2.[Pos] > s1.[Pos]
             WHERE (s1.[Char] = '{')
               AND (s2.[Char] = '}')
            ORDER BY s2.[Pos] - s1.[Pos];

            IF @@ROWCOUNT = 0
                BREAK

            DELETE
              FROM @Stage
             WHERE ([Pos] IN (@Pos1, @Pos2));

            UPDATE @Stage
               SET [Pos] = [Pos] - @Pos2 + @Pos1 - 1
             WHERE ([Pos] > @Pos2);

            SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
        END

    SET @rtf = REPLACE(@rtf, '\pard', '');
    SET @rtf = REPLACE(@rtf, '\par', '');
    SET @rtf = STUFF(@rtf, 1, CHARINDEX(' ', @rtf), '');

    WHILE (Right(@rtf, 1) IN (' ', CHAR(13), CHAR(10), '}'))
      BEGIN
        SELECT @rtf = SUBSTRING(@rtf, 1, (LEN(@rtf + 'x') - 2));
        IF LEN(@rtf) = 0 BREAK
      END
    
    SET @Pos1 = CHARINDEX('\''', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            IF @Pos1 > 0
                BEGIN
                    SET @hex = '0x' + SUBSTRING(@rtf, @Pos1 + 2, 2);
                    SET @rtf = REPLACE(@rtf, SUBSTRING(@rtf, @Pos1, 4), _
CHAR(CONVERT(int, CONVERT (binary(1), @hex,1))));
                    SET @Pos1 = CHARINDEX('\''', @rtf);
                END
        END

    SET @rtf = @rtf + ' ';

    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);

    WHILE @Pos1 > 0
        BEGIN
            SET @Pos2 = CHARINDEX(' ', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                SET @Pos2 = CHARINDEX('\', @rtf, @Pos1 + 1);

            IF @Pos2 < @Pos1
                BEGIN
                    SET @rtf = SUBSTRING(@rtf, 1, @Pos1 - 1);
                    SET @Pos1 = 0;
                END
            ELSE
                BEGIN
                    SET @rtf = STUFF(@rtf, @Pos1, @Pos2 - @Pos1 + 1, '');
                    SET @Pos1 = PATINDEX('%\%[0123456789][\ ]%', @rtf);
                END
        END

    IF RIGHT(@rtf, 1) = ' '
        SET @rtf = SUBSTRING(@rtf, 1, LEN(@rtf) -1);

    RETURN @rtf;
END

When copying the above code to SQL don't forget to remove the underscore (wich is only required in codeproject to break long lines)!

To convert any RTF-formatted content, call the function above passing the RTF content as parameter of type nvarchar(max):

SQL
SELECT [<YourRTFColumnNameHere>]
     , [dbo].[RTF2Text]([<YourRTFColumnNameHere>]) AS [TextFromRTF]
  FROM [dbo].[<YourDatabaseNameHere>]

The function returns the converted text as nvarchar(max) too.

More improvements may be added. If you find any RTF part that isn't covered by the function above, please drop a line here.

Thanks

Thanks to all the authors in the web that have posted their solutions until now and therefore deserve the applause. I simply enhanced these solutions to complete the basic conversion.

Thanks also to all users here posting their tips to make the procedure more robust.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

NightWizzard
Software Developer (Senior)
Germany Germany
30+ years experience as developer with VB.NET, VB, VBA, VBScript, C#, WPF, WinForms, JavaScript, jQuery, PHP, Delphi, ADO, ADO.NET, ASP.NET, Silverlight, HTML, CSS, XAML, XML, T-SQL, MySQL, MariaDb, MS-ACCESS, dBase, OLE/COM, ActiveX, SEPA/DTAUS, ZUGFeRD, DATEV Format and DATEVconnect, DSGVO, TNT Web-API, MS-Office Addins, etc., including:
- 10+ years experience as developer and freelancer
- 10+ years experience as team leader
- 13+ years experience with CRM solutions

Comments and Discussions

 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
Batsheva Greenhut3-Dec-14 22:54
MemberBatsheva Greenhut3-Dec-14 22:54 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
hebrahimi14-Dec-14 20:03
Memberhebrahimi14-Dec-14 20:03 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
NightWizzard15-Dec-14 6:37
MemberNightWizzard15-Dec-14 6:37 
AnswerRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
Member 103324005-May-15 2:43
MemberMember 103324005-May-15 2:43 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
NightWizzard5-May-15 8:05
MemberNightWizzard5-May-15 8:05 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
Tomas Granlund9-Nov-15 3:10
MemberTomas Granlund9-Nov-15 3:10 
GeneralRe: Invalid length parameter passed to the LEFT or SUBSTRING function. Pin
NightWizzard9-Nov-15 7:27
MemberNightWizzard9-Nov-15 7:27 
Questionrtf to plain... Pin
Member 1125557822-Nov-14 7:36
MemberMember 1125557822-Nov-14 7:36 
Thanks very much.

Now, I am facing a problem of additional tags remaining in the output. Here is a version of the content of 1 cell. If I pull out this text and save it as an rtf, it can be rendered in word, but if I convert it using the rtf2text function, I still end up with many tags. This particular rtf document is complex (has tables, cells with stuff that is struck out), but I am also facing problems with cells that have simpler content.

Any help would be appreciated.

{\rtf1\ansi\ansicpg1252\deff1\deflang1033{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}{\f2\froman\fprq2\fcharset2 Symbol;}}{\colortbl;\red0\green0\blue0;\red255\green255\blue255;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;\red192\green192\blue192;}\plain\lang1033\hich\f1\dbch\f1\loch\f1\fs24 {\field{\*\fldinst {HYPERLINK 132502040,9000001591903060,9000001591903060 }}{\fldrslt {{\rtf1{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}{\f2\froman\fprq2\fcharset2 Symbol;}}{\colortbl;\red0\green0\blue255;}\cf1\f1\fs20 \b\ul \escp General:\b0\ul0 }}}}\par \fi-360\li720\f2\fs20 \'b7 \f1\tab \f1\fs20 {\*\bkmkstart bk9000001592003060}{\*\bkmkend bk9000001592003060}\escp Admission Date: {\*\bkmkstart bk9000001592103060}{\*\bkmkend bk9000001592103060}\escp 11-20-2014. \par\pard \fi-360\li720\f2\fs20 \'b7 \f1\tab \f1\fs20 {\*\bkmkstart bk9000001592203060}{\*\bkmkend bk9000001592203060}\escp Arrived From {\*\bkmkstart bk9000001592303060}{\*\bkmkend bk9000001592303060}\escp home .\par\pard \fi-360\li720\f2\fs20 \'b7 \f1\tab \f1\fs20 {\*\bkmkstart bk9000001592403060}{\*\bkmkend bk9000001592403060}\escp Source of Information {\*\bkmkstart bk9000001592503060}{\*\bkmkend bk9000001592503060}\escp patient.\par\pard \f1\fs10 \par {\*\bkmkstart bk9000001592603060}{\*\bkmkend bk9000001592603060}\f1\fs20 \b\ul \escp Care Providers:\b0\ul0 \par {\*\bkmkstart bk103600130}{\*\bkmkend bk103600130}\fi-360\li720\f2\fs20 \'b7 \f1\tab \f1\fs20 \b \escp CEMALETIN NEVBER,\b0 (Attending): Referring MD\par\pard \f1\fs10 \par {\field{\*\fldinst {HYPERLINK 132502040,9000001592703060,9000001592703060 }}{\fldrslt {{\rtf1{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}{\f2\froman\fprq2\fcharset2 Symbol;}}{\colortbl;\red0\green0\blue255;}\cf1\f1\fs20 \b\ul \escp Other Care Providers:\b0\ul0 }}}}\par \trowd\trgaph108\trpaddl108\trpaddr108\trpaddfl3\trpaddfr3\trkeep\trleft0\clbrdrl\brdrs \brdrw20\brdrcf2\clbrdrt\brdrs \brdrw20\brdrcf2\clbrdrr\brdrs \brdrw20\brdrcf2\clbrdrb\brdrs \brdrw20\brdrcf2\cellx10080\pard\intbl \ql {\*\bkmkstart bk9000001592803060}{\*\bkmkend bk9000001592803060}\pard\intbl\fi-468\li720\f2\fs20 \'b7 \f1 \f1\fs20 \b \escp Primary Care Provider: \b0 \f1\fs20 \escp warren licht\cell \row \pard \trowd\trgaph108\trpaddl108\trpaddr108\trpaddfl3\trpaddfr3\trkeep\trleft0\clbrdrl\brdrs \brdrw20\brdrcf2\clbrdrt\brdrs \brdrw20\brdrcf2\clbrdrr\brdrs \brdrw20\brdrcf2\clbrdrb\brdrs \brdrw20\brdrcf2\cellx10080\pard\intbl \ql {\*\bkmkstart bk9000001592903060}{\*\bkmkend bk9000001592903060}\pard\intbl\fi-468\li720\f2\fs20 \'b7 \f1 \f1\fs20 \b \escp Care providers for Follow up (PCP/Outpatient Provider): \b0 \f1\fs20 \escp not certain yet\cell \row \pard \f1\fs10 \par {\field{\*\fldinst {HYPERLINK 132502040,9000001593103060,9000001593103060 }}{\fldrslt {{\rtf1{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}{\f2\froman\fprq2\fcharset2 Symbol;}}{\colortbl;\red0\green0\blue255;}\cf1\f1\fs20 \b\ul \escp Chief Complaint/Reason for Visit:\b0\ul0 }}}}\par \trowd\trgaph108\trpaddl108\trpaddr108\trpaddfl3\trpaddfr3\trkeep\trleft0\clbrdrl\brdrs \brdrw20\brdrcf2\clbrdrt\brdrs \brdrw20\brdrcf2\clbrdrr\brdrs \brdrw20\brdrcf2\clbrdrb\brdrs \brdrw20\brdrcf2\cellx10080\pard\intbl \ql {\*\bkmkstart bk9000001593203060}{\*\bkmkend bk9000001593203060}\pard\intbl\fi-468\li720\f1 \f1\fs20 \b \escp Chief Complaint/Reason for Admission: \b0 \f1\fs20 \escp fell, injured left ankle\cell \row \pard \f1\fs10 \par {\field{\*\fldinst {HYPERLINK 132502040,9000001593303060,9000001593303060 }}{\fldrslt {{\rtf1{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}{\f2\froman\fprq2\fcharset2 Symbol;}}{\colortbl;\red0\green0\blue255;}\cf1\f1\fs20 \b\ul \escp History of Present Illness:\b0\ul0 }}}}\par \trowd\trgaph108\trpaddl108\trpaddr108\trpaddfl3\trpaddfr3\trkeep\trleft0\clbrdrl\brdrs \brdrw20\brdrcf2\clbrdrt\brdrs \brdrw20\brdrcf2\clbrdrr\brdrs \brdrw20\brdrcf2\clbrdrb\brdrs \brdrw20\brdrcf2\cellx10080\pard\intbl \ql {\*\bkmkstart bk9000001593403060}{\*\bkmkend bk9000001593403060}\pard\intbl\fi-468\li720\f2\fs20 \'b7 \f1 \f1\fs20 \b \escp HPI: \b0 \cell \row \pard {\rtf1\sste16000\ansi\deflang1033\ftnbj\uc1\deff0
{\fonttbl{\f0 \fnil Arial;}{\f1 \fnil \fcharset0 Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\f1\fs20 Normal;}{\cs1 Default Paragraph Font;}}
{\*\revtbl{Unknown;}}
\paperw12240\paperh15840\margl1800\margr1800\margt1440\margb1440\headery720\footery720\nogrowautofit\deftab720\formshade\fet4\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn12240\pghsxn15840\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\ql\plain\f1\fs24\plain\f0\fs20\lang1033\hich\f0\dbch\f0\loch\f0\fs20 test stuff from gregg\par
}
\f1\fs10 \par {\*\bkmkstart bk37300060}{\*\bkmkend bk37300060} \f1\fs20 \b \escp Allergies:\b0 \par \fi-360\li720\f2\fs20 \'b7 \f1\tab \f1\fs20 \b \escp A-Spas S/L\b0 : Drug, Unknown, Active\par\pard\f1\fs10 \par \f1\fs20 \i \pard\f1\fs20 \b \escp * Outpatient Medication Status not yet specified\b0 \par \i0 \f1\fs10 \par {\commentSL}\pard\f1\fs20 \b \escp Electronic Signatures:\b0 \par \f1\fs20 \b\ul \escp Husk, Gregg A (MD)\b0\ul0 \f1\fs18 \escp (Signed 11-21-2014 22:50)\par \fi-360\li720\f1\tab \f1\fs20 \b\i \escp Authored: \b0\i0 \f1\fs20 \i \escp General, Chief Complaint/Reason for Admission/HPI, Allergies/Medications\i0 \par\pard \par \par \f1\fs20 \b\i \escp Last Updated: \b0\i0 \f1\fs20 \i \escp 11-21-2014 22:50\i0 \f1\fs20 \i \escp by Husk, Gregg A (MD)\i0 \par {\commentSL}}
AnswerRe: rtf to plain... Pin
NightWizzard22-Nov-14 8:26
MemberNightWizzard22-Nov-14 8:26 
QuestionRTF to Plain Text Pin
Member 1125557822-Nov-14 6:11
MemberMember 1125557822-Nov-14 6:11 
AnswerRe: RTF to Plain Text Pin
NightWizzard22-Nov-14 6:36
MemberNightWizzard22-Nov-14 6:36 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.