Click here to Skip to main content
12,547,782 members (48,767 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


28 bookmarked

Converting RTF to HTML in VB.NET the Easy Way

, 14 Jan 2010 CPOL
Rate this:
Please Sign up or sign in to vote.
A quick and easy solution to produce excellent HTML from RTF without parsing


This article will explain an easy, robust way to convert rich text to HTML using VB.NET and Microsoft Office Automation.


This all started out because I needed to take the contents of a RichTextBox in an application I had developed and insert it into the body of an email. We're a Microsoft shop all around, so I could depend on Outlook 2007 to be the email client for all users, and I assumed (poorly) that I would be able to insert rich text into an Outlook email with little or no problem. Silly me.

Once I figured out that Outlook did not support rich text, even though it was using Word as its editor, I set about trying to convert my RTF to HTML, and I assumed (again) that there must be some simple straightforward way to do it without parsing all the RTF and accounting for each and every formatting tag myself. An exhaustive search of the internet turned up several third party apps; some of them were free, most of them parsed the RTF and seemed to be a little incomplete, and none of them really fit the bill when it came to simplicity.

I started fooling around with Office automation, thinking that if Microsoft didn't supply direct access to their RTF to HTML conversion process, perhaps they would supply indirect access. Sure enough, after fiddling around with Word for a while, I was able to figure out how to use Word as a translator and convert RTF directly to HTML in one short function. So here, for the assistance of all the other wage slaves out there struggling with a similar problem, is how I did it. Nothing earth shattering here, but a very handy function to have in your back pocket.

Using the Code

Basically, just throw this function into your VB.NET project. You'll need to include a reference to the Microsoft Word 12.0 Object Library (COM object). Other Word libraries may do just as well, but this is how I've used it.

Public Function sRTF_To_HTML(ByVal sRTF As String) As String
    'Declare a Word Application Object and a Word WdSaveOptions object
    Dim MyWord As Microsoft.Office.Interop.Word.Application
    Dim oDoNotSaveChanges As Object = _
    'Declare two strings to handle the data
    Dim sReturnString As String = ""
    Dim sConvertedString As String = ""
        'Instantiate the Word application,
        ‘set visible to false and create a document
        MyWord = CreateObject("Word.application")
        MyWord.Visible = False
        'Create a DataObject to hold the Rich Text
        'and copy it to the clipboard
        Dim doRTF As New System.Windows.Forms.DataObject
        doRTF.SetData("Rich Text Format", sRTF)
        'Paste the contents of the clipboard to the empty,
        'hidden Word Document
        '…then, select the entire contents of the document
        'and copy back to the clipboard
        'Now retrieve the HTML property of the DataObject
        'stored on the clipboard
        sConvertedString = _
        'Remove some leading text that shows up in some instances
        '(like when you insert it into an email in Outlook
        sConvertedString = _
        'Also remove multiple  characters that somehow end up in there
        sConvertedString = sConvertedString.Replace("Â", "")
        '…and you're done.
        sReturnString = sConvertedString
        If Not MyWord Is Nothing Then
            MyWord = Nothing
        End If
    Catch ex As Exception
        If Not MyWord Is Nothing Then
            MyWord = Nothing
        End If
        MsgBox("Error converting Rich Text to HTML")
    End Try
    Return sReturnString
End Function

'That does it. If you need to insert your HTML into an
'Outlook mail message (as I did) here's how to do it using the function above.
Dim myotl As Microsoft.Office.Interop.Outlook.Application
Dim myMItem As Microsoft.Office.Interop.Outlook.MailItem
myotl = CreateObject("Outlook.application")
myMItem = myotl.CreateItem(Microsoft.Office.Interop.Outlook.OlItemType.olMailItem)
myMItem.Subject = 
    "This email was converted from rich text to HTML using a simple function in"
myMItem.BodyFormat = Microsoft.Office.Interop.Outlook.OlBodyFormat.olFormatHTML
myMItem.HTMLBody = sConvertedString

Points of Interest

One word of warning, the HTML produced by this conversion process is very verbose. It produces a lot of lines of HTML for some very basic formatting, but it has performed error free conversion on thousands of pages of data thus far here where I work.

I am still surprised that Microsoft does not simply have RTF to HTML conversion functionality readily available in its development libraries. It seems like a logical and intuitive function to provide. Still, at least, there's a workaround.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Software Developer
United States United States
Hanley Loller. Ex-professional kayaker went back to school at 30 to learn computer programming. Earned my BS in computer science from East Tennessee State University in 2001. Worked for a couple of different companies before landing in the Office of Computing and Information Technology at the Kentucky State Legislature where I mostly write applications using SQL and I love my job, but it's still not as good as kayaking for a living.

You may also be interested in...


Comments and Discussions

GeneralProblem with Hyperlinks Pin
PaulNash19-Jan-11 0:32
memberPaulNash19-Jan-11 0:32 
GeneralRe: Problem with Hyperlinks Pin
Hanleyk131-Jan-11 6:47
memberHanleyk131-Jan-11 6:47 
GeneralGreat code. (problem with images) Pin
Croody14-Sep-10 4:26
memberCroody14-Sep-10 4:26 
GeneralRe: Great code. (problem with images) Pin
Hanleyk114-Sep-10 10:41
memberHanleyk114-Sep-10 10:41 
GeneralRe: Great code. (problem with images) Pin
Hanleyk129-Sep-10 10:31
memberHanleyk129-Sep-10 10:31 
GeneralRe: Great code. (problem with images) Pin
Stijn Courtheyn5-Aug-11 3:12
memberStijn Courtheyn5-Aug-11 3:12 
GeneralRe: Great code. (problem with images) Pin
Scotchy5-Aug-11 10:00
memberScotchy5-Aug-11 10:00 
GeneralRe: Great code. (problem with images) Pin
Hanleyk112-Aug-11 7:39
memberHanleyk112-Aug-11 7:39 
I'm definitely getting a different result than scotchy. The HTML created from rich text in my app definitely contains image tags.

When I debug and intercept the HTML string before it is handed over to Outlook, it contains image tags and it stores the images in a temp folder under "local settings", something like this: "C:\Documents and Settings\Username\Local Settings\Temp\msohtmlclip1\01\clip_image002.gif" It shows up in the body of the email just fine, although the body of the email has an image resizing/positioning tool built into it and the actual image data is more complex than simply the gif file I am listing here. I doubt that just attaching this gif file would give you the results you are looking for. If you want the image to show up properly, you need the automated process to handle it for you.

The process of html conversion in the example I am looking at creates the following series of six files that combined seem to contain the image data between them.

clip_colorschememapping.xml 1 KB
clip_image001.wmz 368 KB
clip_image002.gif 18 KB
clip_imaage003.wmz 1 KB
clip_image004.gif 1 KB
clip_themedata.thmx 4 KB

I would suggest looking for a configuration solution to this problem rather than trying to code around it at a low(er) level. What version of Visual Studio, Word and Outlook are you using? I'm currently using VS 2008 and Outlook 2007 although we were using VS and Outlook 2003 when this code was written. I'm also referencing the "Microsoft Word 12.0 Object Library" and running in a Windows XP environment with Office 2007 Installed. Are there any significant deviations to any of these configurations that might be changing the outcome?
GeneralRe: Great code. (problem with images) Pin
Hanleyk112-Aug-11 7:56
memberHanleyk112-Aug-11 7:56 
GeneralBullets or indent problem, not sure Pin
bigbro_198517-Aug-10 1:21
memberbigbro_198517-Aug-10 1:21 
GeneralRe: Bullets or indent problem, not sure Pin
Hanleyk123-Aug-10 10:18
memberHanleyk123-Aug-10 10:18 
GeneralMy vote of 5 Pin
bigbro_198517-Aug-10 1:14
memberbigbro_198517-Aug-10 1:14 
JokeRe: My vote of 5 Pin
Hanleyk123-Aug-10 10:20
memberHanleyk123-Aug-10 10:20 
GeneralThanks, your code helped Pin
krashcontrol4-Apr-10 0:42
memberkrashcontrol4-Apr-10 0:42 
GeneralRe: Thanks, your code helped Pin
Hanleyk18-Apr-10 9:36
memberHanleyk18-Apr-10 9:36 
GeneralMy vote of 1 Pin
JeffBall18-Jan-10 16:48
memberJeffBall18-Jan-10 16:48 
GeneralRe: My vote of 1 Pin
Hanleyk120-Jan-10 2:55
memberHanleyk120-Jan-10 2:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.161018.1 | Last Updated 14 Jan 2010
Article Copyright 2010 by Hanleyk1
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid