Click here to Skip to main content
Click here to Skip to main content

Clean RTF Merge Fields

By , 10 Feb 2008
Rate this:
Please Sign up or sign in to vote.

Introduction

RTF to me has always been a pain; it is so tempting because it is all plain text and thus 'human' readable, but hold your horses...try to modify that font on that piece of string on line x...well, let's be honest... don't even try.

Well, RTF is still tempting for me, especially when it comes to users, documents, and yes: mail/document merging!

Alright, let's get to the point. This piece of code helps me when users use an RTF document, post it to a website, and expect their custom fields to be updated with the data from any source (addresses, personal names, or ... strange flower names).

This particular class expects an input as a string (builder), and a start- and an end char. It searches for a pair of these chars in the input string, removes all the RTF coding between them, and returns a cleaned up version of the full RTF string.

Also, an array of the changes is available.

Background

The problem mostly resides in how Word ver.X makes, modifies, and rebuilds RTF documents.

(This is no hail to WordPad or any other RTF editor, but they definitely do a better job at recreating the RTF into a simpler and straighter code than MS Word.)

If you would like a user to use merge fields like this in his document: [this_is_a_merge_field] and replace them with your own database field, there is a problem in Word when even a user accidentally changes the font and then removes it, in between the merge field string.

You would expect it to show in the RTF code as plain simple: [this_is_a_merge_field].

But instead, it becomes something like this: [this_is_a_}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329\charrsid3492762 merge}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329 _field].

Well, no way our simple String.Replace is going to find our 'this_is_a_merge_field' dbase field within that string. So... we made something that deals with this problem!

Using the code

The basic code puts in a string(builder). If you want to, give some start and end chars like [ ] or < >, or even # % ([ and ] are default). Don't put { } in it or a pipe |, as these are RTF codes or will be replaced.

Here is the main sub. Put a string(builder) in and expect a string(builder) out cleaned up:

Public Function CleanDocument(ByVal rtfSring As String, _
       Optional ByVal detectStartChar As Char = CChar("["),_
       Optional ByVal detectEndChar As Char = CChar("]")) As String

    Dim time As Integer = Date.Now.TimeOfDay.Milliseconds

    Dim sb As New StringBuilder(rtfSring)
    Dim sbclean As New StringBuilder(sb.ToString)
    Dim tempstr(1) As String
    Dim stepper As Integer = 0
    Do
        tempstr = ReturnNextRtfString(sb, detectStartChar, detectEndChar, True)
        
        If tempstr(0) Is Nothing Then Exit Do
        sbclean.Replace(tempstr(0), tempstr(1))
        
        ReDim Preserve _ArrayOfFields(1, stepper)
        _ArrayOfFields(0, stepper) = tempstr(0)
        _ArrayOfFields(1, stepper) = tempstr(1)
        stepper += 1
    Loop

    processedinmillisecconds = Date.Now.TimeOfDay.Milliseconds - time

    Return sbclean.ToString

End Function

Now, the helper subs find each substring with the start and end char:

Private Function ReturnNextRtfString(ByRef sb As StringBuilder, _
        ByVal startchar As Char, ByVal endchar As Char, _
        Optional ByVal autoclean As Boolean = False) As String()

    Dim startcounter, endcounter As Integer
    Dim acounter As Integer
    Dim returnstring(1) As String
    
    For acounter = bcounter To sb.Length - 1
         
        If sb.Chars(acounter) = startchar Then
            startcounter = acounter
        End If
        
        If sb.Chars(acounter) = endchar Then
            endcounter = acounter + 1
            'set nieuwe start voor de volgende aanroep van de functie
            bcounter = acounter + 1
        End If
        
        If startcounter > 0 AndAlso endcounter > startcounter Then
           
            If autoclean = True Then
                returnstring(1) = CleanRtfString(sb.ToString.Substring
                (startcounter, endcounter - startcounter))
                    returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            Else
                returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            End If

            Exit Function
        End If
    Next
    Return returnstring
End Function

And finally, the clean up function:

Private Function CleanRtfString(ByRef rtfstring As String) As String
    Dim sb As New StringBuilder(rtfstring)
    Dim cleansb As New StringBuilder

    Dim ccounter As Integer

    For ccounter = 0 To sb.Length
 
        If Asc(sb.Chars(ccounter)) > 32 AndAlso sb.Chars(ccounter) <> "|" _
           AndAlso sb.Chars(ccounter) <> "\" AndAlso sb.Chars(ccounter) <> "{" _
           AndAlso sb.Chars(ccounter) <> "}" Then
               cleansb.Append(sb.Chars(ccounter))
        End If

        If ccounter + 1 >= sb.Length Then Exit For
       
        If sb.Chars(ccounter + 1) = "\" OrElse sb.Chars(ccounter + 1) = "{" 
        OrElse sb.Chars(ccounter + 1) = "}" Then
            For dcounter As Integer = ccounter + 1 To sb.Length - 1
                If sb.Chars(dcounter) = CChar(" ") Then Exit For
                    sb.Chars(dcounter) = CChar("|")
            Next
        End If
    Next
   
    cleansb.Replace("|", "")
    Return cleansb.ToString
End Function

Problems to work on

  • When users insert an image in between the tags of the merge field, the cleaner will not be able to clean it up correctly.
  • When a Word hyperlink is in between the tags, this will also mangle the output.
  • Because the pipe char is used as a replacement, it can not be used in the field.
  • Spaces are the only identifiers in RTF, so as a simple solution, I remove them; this could be done more neatly.

History

This is my first article on The Code Project!

  • 10-02-2008: The first version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

rafaelpb
Software Developer
Netherlands Netherlands
No Biography provided

Comments and Discussions

 
QuestionVariable code is showing up in the merged document PinmemberIS-Helpdesk22-May-09 5:25 
AnswerRe: Variable code is showing up in the merged document Pinmemberrafaelpb26-Jun-09 2:04 
GeneralThank you Very MUCH!!!! Pinmembersafremen26-Apr-09 3:12 
GeneralCleaner ver of Cleaner PinmemberTL Wallace28-Jan-09 4:05 
GeneralRe: Cleaner ver of Cleaner PinmemberFenzo10-Feb-11 22:26 
AnswerRe: Cleaner ver of Cleaner Pinmemberrafaelpb6-Apr-11 21:02 
Generalthanks PinmemberTL Wallace28-Jan-09 3:24 
GeneralRe: thanks Pinmemberrafaelpb13-Feb-09 5:01 
GeneralUndefined Variables PinmemberMember 222428219-Sep-08 10:45 
GeneralRe: Undefined Variables Pinmemberrafaelpb6-Nov-08 3:08 
QuestionHelp Me Pinmemberkklowanshi13-Mar-08 1:38 
AnswerRe: Help Me Pinmemberrafaelpb7-Apr-08 1:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140421.2 | Last Updated 10 Feb 2008
Article Copyright 2008 by rafaelpb
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid