Click here to Skip to main content
15,884,472 members
Articles / Programming Languages / Visual Basic

Clean RTF Merge Fields

Rate me:
Please Sign up or sign in to vote.
4.56/5 (5 votes)
10 Feb 2008CPOL3 min read 44.6K   332   17   12
Class to clean up / remove the RTF from custom merge fields in (RTF) documents.

Introduction

RTF to me has always been a pain; it is so tempting because it is all plain text and thus 'human' readable, but hold your horses...try to modify that font on that piece of string on line x...well, let's be honest... don't even try.

Well, RTF is still tempting for me, especially when it comes to users, documents, and yes: mail/document merging!

Alright, let's get to the point. This piece of code helps me when users use an RTF document, post it to a website, and expect their custom fields to be updated with the data from any source (addresses, personal names, or ... strange flower names).

This particular class expects an input as a string (builder), and a start- and an end char. It searches for a pair of these chars in the input string, removes all the RTF coding between them, and returns a cleaned up version of the full RTF string.

Also, an array of the changes is available.

Background

The problem mostly resides in how Word ver.X makes, modifies, and rebuilds RTF documents.

(This is no hail to WordPad or any other RTF editor, but they definitely do a better job at recreating the RTF into a simpler and straighter code than MS Word.)

If you would like a user to use merge fields like this in his document: [this_is_a_merge_field] and replace them with your own database field, there is a problem in Word when even a user accidentally changes the font and then removes it, in between the merge field string.

You would expect it to show in the RTF code as plain simple: [this_is_a_merge_field].

But instead, it becomes something like this: [this_is_a_}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329\charrsid3492762 merge}{\rtlch\fcs1 \af0 \ltrch\fcs0 \insrsid15991329 _field].

Well, no way our simple String.Replace is going to find our 'this_is_a_merge_field' dbase field within that string. So... we made something that deals with this problem!

Using the code

The basic code puts in a string(builder). If you want to, give some start and end chars like [ ] or < >, or even # % ([ and ] are default). Don't put { } in it or a pipe |, as these are RTF codes or will be replaced.

Here is the main sub. Put a string(builder) in and expect a string(builder) out cleaned up:

VB
Public Function CleanDocument(ByVal rtfSring As String, _
       Optional ByVal detectStartChar As Char = CChar("["),_
       Optional ByVal detectEndChar As Char = CChar("]")) As String

    Dim time As Integer = Date.Now.TimeOfDay.Milliseconds

    Dim sb As New StringBuilder(rtfSring)
    Dim sbclean As New StringBuilder(sb.ToString)
    Dim tempstr(1) As String
    Dim stepper As Integer = 0
    Do
        tempstr = ReturnNextRtfString(sb, detectStartChar, detectEndChar, True)
        
        If tempstr(0) Is Nothing Then Exit Do
        sbclean.Replace(tempstr(0), tempstr(1))
        
        ReDim Preserve _ArrayOfFields(1, stepper)
        _ArrayOfFields(0, stepper) = tempstr(0)
        _ArrayOfFields(1, stepper) = tempstr(1)
        stepper += 1
    Loop

    processedinmillisecconds = Date.Now.TimeOfDay.Milliseconds - time

    Return sbclean.ToString

End Function

Now, the helper subs find each substring with the start and end char:

VB
Private Function ReturnNextRtfString(ByRef sb As StringBuilder, _
        ByVal startchar As Char, ByVal endchar As Char, _
        Optional ByVal autoclean As Boolean = False) As String()

    Dim startcounter, endcounter As Integer
    Dim acounter As Integer
    Dim returnstring(1) As String
    
    For acounter = bcounter To sb.Length - 1
         
        If sb.Chars(acounter) = startchar Then
            startcounter = acounter
        End If
        
        If sb.Chars(acounter) = endchar Then
            endcounter = acounter + 1
            'set nieuwe start voor de volgende aanroep van de functie
            bcounter = acounter + 1
        End If
        
        If startcounter > 0 AndAlso endcounter > startcounter Then
           
            If autoclean = True Then
                returnstring(1) = CleanRtfString(sb.ToString.Substring
                (startcounter, endcounter - startcounter))
                    returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            Else
                returnstring(0) = sb.ToString.Substring(startcounter, 
        endcounter - startcounter)
                Return returnstring
            End If

            Exit Function
        End If
    Next
    Return returnstring
End Function

And finally, the clean up function:

VB
Private Function CleanRtfString(ByRef rtfstring As String) As String
    Dim sb As New StringBuilder(rtfstring)
    Dim cleansb As New StringBuilder

    Dim ccounter As Integer

    For ccounter = 0 To sb.Length
 
        If Asc(sb.Chars(ccounter)) > 32 AndAlso sb.Chars(ccounter) <> "|" _
           AndAlso sb.Chars(ccounter) <> "\" AndAlso sb.Chars(ccounter) <> "{" _
           AndAlso sb.Chars(ccounter) <> "}" Then
               cleansb.Append(sb.Chars(ccounter))
        End If

        If ccounter + 1 >= sb.Length Then Exit For
       
        If sb.Chars(ccounter + 1) = "\" OrElse sb.Chars(ccounter + 1) = "{" 
        OrElse sb.Chars(ccounter + 1) = "}" Then
            For dcounter As Integer = ccounter + 1 To sb.Length - 1
                If sb.Chars(dcounter) = CChar(" ") Then Exit For
                    sb.Chars(dcounter) = CChar("|")
            Next
        End If
    Next
   
    cleansb.Replace("|", "")
    Return cleansb.ToString
End Function

Problems to work on

  • When users insert an image in between the tags of the merge field, the cleaner will not be able to clean it up correctly.
  • When a Word hyperlink is in between the tags, this will also mangle the output.
  • Because the pipe char is used as a replacement, it can not be used in the field.
  • Spaces are the only identifiers in RTF, so as a simple solution, I remove them; this could be done more neatly.

History

This is my first article on The Code Project!

  • 10-02-2008: The first version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Netherlands Netherlands
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionVariable code is showing up in the merged document Pin
IS-Helpdesk22-May-09 5:25
IS-Helpdesk22-May-09 5:25 
AnswerRe: Variable code is showing up in the merged document Pin
rafaelpb26-Jun-09 2:04
rafaelpb26-Jun-09 2:04 
GeneralThank you Very MUCH!!!! Pin
safremen26-Apr-09 3:12
safremen26-Apr-09 3:12 
GeneralCleaner ver of Cleaner Pin
Terence Wallace28-Jan-09 4:05
Terence Wallace28-Jan-09 4:05 
GeneralRe: Cleaner ver of Cleaner Pin
Fenzo10-Feb-11 22:26
Fenzo10-Feb-11 22:26 
AnswerRe: Cleaner ver of Cleaner Pin
rafaelpb6-Apr-11 21:02
rafaelpb6-Apr-11 21:02 
Generalthanks Pin
Terence Wallace28-Jan-09 3:24
Terence Wallace28-Jan-09 3:24 
GeneralRe: thanks Pin
rafaelpb13-Feb-09 5:01
rafaelpb13-Feb-09 5:01 
GeneralUndefined Variables Pin
Member 222428219-Sep-08 10:45
Member 222428219-Sep-08 10:45 
GeneralRe: Undefined Variables Pin
rafaelpb6-Nov-08 3:08
rafaelpb6-Nov-08 3:08 
QuestionHelp Me Pin
kklowanshi13-Mar-08 1:38
kklowanshi13-Mar-08 1:38 
AnswerRe: Help Me Pin
rafaelpb7-Apr-08 1:24
rafaelpb7-Apr-08 1:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.