Click here to Skip to main content
Click here to Skip to main content

Converting RTF to HTML in VB.NET the Easy Way

, 14 Jan 2010
Rate this:
Please Sign up or sign in to vote.
A quick and easy solution to produce excellent HTML from RTF without parsing

Introduction

This article will explain an easy, robust way to convert rich text to HTML using VB.NET and Microsoft Office Automation.

Background

This all started out because I needed to take the contents of a RichTextBox in an application I had developed and insert it into the body of an email. We're a Microsoft shop all around, so I could depend on Outlook 2007 to be the email client for all users, and I assumed (poorly) that I would be able to insert rich text into an Outlook email with little or no problem. Silly me.

Once I figured out that Outlook did not support rich text, even though it was using Word as its editor, I set about trying to convert my RTF to HTML, and I assumed (again) that there must be some simple straightforward way to do it without parsing all the RTF and accounting for each and every formatting tag myself. An exhaustive search of the internet turned up several third party apps; some of them were free, most of them parsed the RTF and seemed to be a little incomplete, and none of them really fit the bill when it came to simplicity.

I started fooling around with Office automation, thinking that if Microsoft didn't supply direct access to their RTF to HTML conversion process, perhaps they would supply indirect access. Sure enough, after fiddling around with Word for a while, I was able to figure out how to use Word as a translator and convert RTF directly to HTML in one short function. So here, for the assistance of all the other wage slaves out there struggling with a similar problem, is how I did it. Nothing earth shattering here, but a very handy function to have in your back pocket.

Using the Code

Basically, just throw this function into your VB.NET project. You'll need to include a reference to the Microsoft Word 12.0 Object Library (COM object). Other Word libraries may do just as well, but this is how I've used it.

Public Function sRTF_To_HTML(ByVal sRTF As String) As String
    'Declare a Word Application Object and a Word WdSaveOptions object
    Dim MyWord As Microsoft.Office.Interop.Word.Application
    Dim oDoNotSaveChanges As Object = _
         Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges
    'Declare two strings to handle the data
    Dim sReturnString As String = ""
    Dim sConvertedString As String = ""
    Try
        'Instantiate the Word application,
        ‘set visible to false and create a document
        MyWord = CreateObject("Word.application")
        MyWord.Visible = False
        MyWord.Documents.Add()
        'Create a DataObject to hold the Rich Text
        'and copy it to the clipboard
        Dim doRTF As New System.Windows.Forms.DataObject
        doRTF.SetData("Rich Text Format", sRTF)
        Clipboard.SetDataObject(doRTF)
        'Paste the contents of the clipboard to the empty,
        'hidden Word Document
        MyWord.Windows(1).Selection.Paste()
        '…then, select the entire contents of the document
        'and copy back to the clipboard
        MyWord.Windows(1).Selection.WholeStory()
        MyWord.Windows(1).Selection.Copy()
        'Now retrieve the HTML property of the DataObject
        'stored on the clipboard
        sConvertedString = _
             Clipboard.GetData(System.Windows.Forms.DataFormats.Html)
        'Remove some leading text that shows up in some instances
        '(like when you insert it into an email in Outlook
        sConvertedString = _
             sConvertedString.Substring(sConvertedString.IndexOf("<html"))
        'Also remove multiple  characters that somehow end up in there
        sConvertedString = sConvertedString.Replace("Â", "")
        '…and you're done.
        sReturnString = sConvertedString
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
    Catch ex As Exception
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
        MsgBox("Error converting Rich Text to HTML")
    End Try
    Return sReturnString
End Function

'
'That does it. If you need to insert your HTML into an
'Outlook mail message (as I did) here's how to do it using the function above.
'
Dim myotl As Microsoft.Office.Interop.Outlook.Application
Dim myMItem As Microsoft.Office.Interop.Outlook.MailItem
myotl = CreateObject("Outlook.application")
myMItem = myotl.CreateItem(Microsoft.Office.Interop.Outlook.OlItemType.olMailItem)
myMItem.Subject = 
    "This email was converted from rich text to HTML using a simple function in VB.net"
myMItem.Display(False)
myMItem.BodyFormat = Microsoft.Office.Interop.Outlook.OlBodyFormat.olFormatHTML
myMItem.HTMLBody = sConvertedString

Points of Interest

One word of warning, the HTML produced by this conversion process is very verbose. It produces a lot of lines of HTML for some very basic formatting, but it has performed error free conversion on thousands of pages of data thus far here where I work.

I am still surprised that Microsoft does not simply have RTF to HTML conversion functionality readily available in its development libraries. It seems like a logical and intuitive function to provide. Still, at least, there's a workaround.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Hanleyk1
Software Developer
United States United States
Hanley Loller. Ex-professional kayaker went back to school at 30 to learn computer programming. Earned my BS in computer science from East Tennessee State University in 2001. Worked for a couple of different companies before landing in the Office of Computing and Information Technology at the Kentucky State Legislature where I mostly write applications using SQL and VB.net. I love my job, but it's still not as good as kayaking for a living.

Comments and Discussions

 
QuestionCongratulations on this code PinmemberjkValero10-Oct-13 6:10 
AnswerRe: Congratulations on this code PinmemberHanleyk121-Oct-13 8:58 
QuestionError PinmemberAhmad Halabi11-Apr-13 22:11 
AnswerRe: Error [modified] PinmemberHanleyk115-Apr-13 9:28 
GeneralRe: Error PinmemberAhmad Halabi15-Apr-13 12:38 
GeneralRe: Error PinmemberHanleyk116-Apr-13 9:12 
AnswerRe: Error PinmemberHanleyk115-Apr-13 9:51 
SuggestionRandom trailing characters using office 14 interop PinmemberdrewBorell17-Jul-12 11:48 
GeneralMy vote of 4 PinmemberRakesh Meel19-Oct-11 21:04 
GeneralRe: My vote of 4 PinmemberHanleyk126-Oct-11 6:39 
QuestionConverter leaves hardlinks to local files PinmemberJohnnyasdf10-Sep-11 14:56 
AnswerRe: Converter leaves hardlinks to local files PinmemberHanleyk126-Oct-11 6:42 
GeneralMy vote of 5 PinmemberStijn Courtheyn5-Aug-11 3:32 
GeneralRe: My vote of 5 PinmemberHanleyk112-Aug-11 7:51 
Generaluse of code PinmemberSIFNOk14-Feb-11 21:30 
GeneralRe: use of code PinmemberHanleyk116-Feb-11 5:22 
GeneralRemoving some of the Verbose HTML PinmemberPaulNash19-Jan-11 0:40 
GeneralRe: Removing some of the Verbose HTML PinmemberHanleyk131-Jan-11 6:46 
GeneralProblem with Hyperlinks PinmemberPaulNash19-Jan-11 0:32 
GeneralRe: Problem with Hyperlinks PinmemberHanleyk131-Jan-11 6:47 
GeneralGreat code. (problem with images) PinmemberCroody14-Sep-10 4:26 
Hi,
 
Thank you for the great code. It is really helped me.
 
When I insert images the richtextbox they show alright but do not appear in the receipent's mailbox.
 
Does anybody know why or what can be done to correct this?
 
Thank you in advance.
 
croody
GeneralRe: Great code. (problem with images) PinmemberHanleyk114-Sep-10 10:41 
GeneralRe: Great code. (problem with images) PinmemberHanleyk129-Sep-10 10:31 
GeneralRe: Great code. (problem with images) PinmemberStijn Courtheyn5-Aug-11 3:12 
GeneralRe: Great code. (problem with images) PinmemberScotchy5-Aug-11 10:00 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140709.1 | Last Updated 14 Jan 2010
Article Copyright 2010 by Hanleyk1
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid