Click here to Skip to main content
15,867,704 members
Articles / Programming Languages / Visual Basic

Converting RTF to HTML in VB.NET the Easy Way

Rate me:
Please Sign up or sign in to vote.
4.91/5 (20 votes)
14 Jan 2010CPOL2 min read 162.5K   4.1K   29   41
A quick and easy solution to produce excellent HTML from RTF without parsing

Introduction

This article will explain an easy, robust way to convert rich text to HTML using VB.NET and Microsoft Office Automation.

Background

This all started out because I needed to take the contents of a RichTextBox in an application I had developed and insert it into the body of an email. We're a Microsoft shop all around, so I could depend on Outlook 2007 to be the email client for all users, and I assumed (poorly) that I would be able to insert rich text into an Outlook email with little or no problem. Silly me.

Once I figured out that Outlook did not support rich text, even though it was using Word as its editor, I set about trying to convert my RTF to HTML, and I assumed (again) that there must be some simple straightforward way to do it without parsing all the RTF and accounting for each and every formatting tag myself. An exhaustive search of the internet turned up several third party apps; some of them were free, most of them parsed the RTF and seemed to be a little incomplete, and none of them really fit the bill when it came to simplicity.

I started fooling around with Office automation, thinking that if Microsoft didn't supply direct access to their RTF to HTML conversion process, perhaps they would supply indirect access. Sure enough, after fiddling around with Word for a while, I was able to figure out how to use Word as a translator and convert RTF directly to HTML in one short function. So here, for the assistance of all the other wage slaves out there struggling with a similar problem, is how I did it. Nothing earth shattering here, but a very handy function to have in your back pocket.

Using the Code

Basically, just throw this function into your VB.NET project. You'll need to include a reference to the Microsoft Word 12.0 Object Library (COM object). Other Word libraries may do just as well, but this is how I've used it.

VB.NET
Public Function sRTF_To_HTML(ByVal sRTF As String) As String
    'Declare a Word Application Object and a Word WdSaveOptions object
    Dim MyWord As Microsoft.Office.Interop.Word.Application
    Dim oDoNotSaveChanges As Object = _
         Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges
    'Declare two strings to handle the data
    Dim sReturnString As String = ""
    Dim sConvertedString As String = ""
    Try
        'Instantiate the Word application,
        ‘set visible to false and create a document
        MyWord = CreateObject("Word.application")
        MyWord.Visible = False
        MyWord.Documents.Add()
        'Create a DataObject to hold the Rich Text
        'and copy it to the clipboard
        Dim doRTF As New System.Windows.Forms.DataObject
        doRTF.SetData("Rich Text Format", sRTF)
        Clipboard.SetDataObject(doRTF)
        'Paste the contents of the clipboard to the empty,
        'hidden Word Document
        MyWord.Windows(1).Selection.Paste()
        '…then, select the entire contents of the document
        'and copy back to the clipboard
        MyWord.Windows(1).Selection.WholeStory()
        MyWord.Windows(1).Selection.Copy()
        'Now retrieve the HTML property of the DataObject
        'stored on the clipboard
        sConvertedString = _
             Clipboard.GetData(System.Windows.Forms.DataFormats.Html)
        'Remove some leading text that shows up in some instances
        '(like when you insert it into an email in Outlook
        sConvertedString = _
             sConvertedString.Substring(sConvertedString.IndexOf("<html"))
        'Also remove multiple  characters that somehow end up in there
        sConvertedString = sConvertedString.Replace("Â", "")
        '…and you're done.
        sReturnString = sConvertedString
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
    Catch ex As Exception
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
        MsgBox("Error converting Rich Text to HTML")
    End Try
    Return sReturnString
End Function

'
'That does it. If you need to insert your HTML into an
'Outlook mail message (as I did) here's how to do it using the function above.
'
Dim myotl As Microsoft.Office.Interop.Outlook.Application
Dim myMItem As Microsoft.Office.Interop.Outlook.MailItem
myotl = CreateObject("Outlook.application")
myMItem = myotl.CreateItem(Microsoft.Office.Interop.Outlook.OlItemType.olMailItem)
myMItem.Subject = 
    "This email was converted from rich text to HTML using a simple function in VB.net"
myMItem.Display(False)
myMItem.BodyFormat = Microsoft.Office.Interop.Outlook.OlBodyFormat.olFormatHTML
myMItem.HTMLBody = sConvertedString

Points of Interest

One word of warning, the HTML produced by this conversion process is very verbose. It produces a lot of lines of HTML for some very basic formatting, but it has performed error free conversion on thousands of pages of data thus far here where I work.

I am still surprised that Microsoft does not simply have RTF to HTML conversion functionality readily available in its development libraries. It seems like a logical and intuitive function to provide. Still, at least, there's a workaround.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
United States United States
Hanley Loller. Ex-professional kayaker went back to school at 30 to learn computer programming. Earned my BS in computer science from East Tennessee State University in 2001. Worked for a couple of different companies before landing in the Office of Computing and Information Technology at the Kentucky State Legislature where I mostly write applications using SQL and VB.net. I love my job, but it's still not as good as kayaking for a living.

Comments and Discussions

 
QuestionAccess Violation exception thrown Pin
Nick Rogerson15-Dec-21 13:31
Nick Rogerson15-Dec-21 13:31 
AnswerRe: Access Violation exception thrown Pin
Nick Rogerson16-Dec-21 1:11
Nick Rogerson16-Dec-21 1:11 
GeneralMy vote of 5 Pin
donaldperera31-Aug-16 3:16
donaldperera31-Aug-16 3:16 
QuestionIt works when debugging, but not always when implemented?? Pin
Member 920379424-Aug-16 5:14
Member 920379424-Aug-16 5:14 
SuggestionAnother way: Pin
Daniel Leykauf15-Feb-15 5:16
Daniel Leykauf15-Feb-15 5:16 
Questionusing code in class assembly calling it from SSRS report threading issue Pin
EarendilHope3-Nov-14 5:13
EarendilHope3-Nov-14 5:13 
QuestionCongratulations on this code Pin
Juan Carlos Valero10-Oct-13 6:10
Juan Carlos Valero10-Oct-13 6:10 
AnswerRe: Congratulations on this code Pin
Hanleyk121-Oct-13 8:58
Hanleyk121-Oct-13 8:58 
QuestionError Pin
Ahmad Halabi11-Apr-13 22:11
Ahmad Halabi11-Apr-13 22:11 
AnswerRe: Error Pin
Hanleyk115-Apr-13 9:28
Hanleyk115-Apr-13 9:28 
GeneralRe: Error Pin
Ahmad Halabi15-Apr-13 12:38
Ahmad Halabi15-Apr-13 12:38 
GeneralRe: Error Pin
Hanleyk116-Apr-13 9:12
Hanleyk116-Apr-13 9:12 
AnswerRe: Error Pin
Hanleyk115-Apr-13 9:51
Hanleyk115-Apr-13 9:51 
SuggestionRandom trailing characters using office 14 interop Pin
drewBorell17-Jul-12 11:48
drewBorell17-Jul-12 11:48 
GeneralMy vote of 4 Pin
Rakesh Meel19-Oct-11 21:04
professionalRakesh Meel19-Oct-11 21:04 
GeneralRe: My vote of 4 Pin
Hanleyk126-Oct-11 6:39
Hanleyk126-Oct-11 6:39 
QuestionConverter leaves hardlinks to local files Pin
Johnnyasdf10-Sep-11 14:56
Johnnyasdf10-Sep-11 14:56 
AnswerRe: Converter leaves hardlinks to local files Pin
Hanleyk126-Oct-11 6:42
Hanleyk126-Oct-11 6:42 
GeneralMy vote of 5 Pin
Stijn Courtheyn5-Aug-11 3:32
Stijn Courtheyn5-Aug-11 3:32 
GeneralRe: My vote of 5 Pin
Hanleyk112-Aug-11 7:51
Hanleyk112-Aug-11 7:51 
Generaluse of code Pin
SIFNOk14-Feb-11 21:30
SIFNOk14-Feb-11 21:30 
GeneralRe: use of code Pin
Hanleyk116-Feb-11 5:22
Hanleyk116-Feb-11 5:22 
GeneralRemoving some of the Verbose HTML Pin
PaulNash19-Jan-11 0:40
PaulNash19-Jan-11 0:40 
GeneralRe: Removing some of the Verbose HTML Pin
Hanleyk131-Jan-11 6:46
Hanleyk131-Jan-11 6:46 
GeneralProblem with Hyperlinks Pin
PaulNash19-Jan-11 0:32
PaulNash19-Jan-11 0:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.