Click here to Skip to main content
11,427,997 members (60,430 online)
Click here to Skip to main content

Converting RTF to HTML in VB.NET the Easy Way

, 14 Jan 2010 CPOL
Rate this:
Please Sign up or sign in to vote.
A quick and easy solution to produce excellent HTML from RTF without parsing

Introduction

This article will explain an easy, robust way to convert rich text to HTML using VB.NET and Microsoft Office Automation.

Background

This all started out because I needed to take the contents of a RichTextBox in an application I had developed and insert it into the body of an email. We're a Microsoft shop all around, so I could depend on Outlook 2007 to be the email client for all users, and I assumed (poorly) that I would be able to insert rich text into an Outlook email with little or no problem. Silly me.

Once I figured out that Outlook did not support rich text, even though it was using Word as its editor, I set about trying to convert my RTF to HTML, and I assumed (again) that there must be some simple straightforward way to do it without parsing all the RTF and accounting for each and every formatting tag myself. An exhaustive search of the internet turned up several third party apps; some of them were free, most of them parsed the RTF and seemed to be a little incomplete, and none of them really fit the bill when it came to simplicity.

I started fooling around with Office automation, thinking that if Microsoft didn't supply direct access to their RTF to HTML conversion process, perhaps they would supply indirect access. Sure enough, after fiddling around with Word for a while, I was able to figure out how to use Word as a translator and convert RTF directly to HTML in one short function. So here, for the assistance of all the other wage slaves out there struggling with a similar problem, is how I did it. Nothing earth shattering here, but a very handy function to have in your back pocket.

Using the Code

Basically, just throw this function into your VB.NET project. You'll need to include a reference to the Microsoft Word 12.0 Object Library (COM object). Other Word libraries may do just as well, but this is how I've used it.

Public Function sRTF_To_HTML(ByVal sRTF As String) As String
    'Declare a Word Application Object and a Word WdSaveOptions object
    Dim MyWord As Microsoft.Office.Interop.Word.Application
    Dim oDoNotSaveChanges As Object = _
         Microsoft.Office.Interop.Word.WdSaveOptions.wdDoNotSaveChanges
    'Declare two strings to handle the data
    Dim sReturnString As String = ""
    Dim sConvertedString As String = ""
    Try
        'Instantiate the Word application,
        ‘set visible to false and create a document
        MyWord = CreateObject("Word.application")
        MyWord.Visible = False
        MyWord.Documents.Add()
        'Create a DataObject to hold the Rich Text
        'and copy it to the clipboard
        Dim doRTF As New System.Windows.Forms.DataObject
        doRTF.SetData("Rich Text Format", sRTF)
        Clipboard.SetDataObject(doRTF)
        'Paste the contents of the clipboard to the empty,
        'hidden Word Document
        MyWord.Windows(1).Selection.Paste()
        '…then, select the entire contents of the document
        'and copy back to the clipboard
        MyWord.Windows(1).Selection.WholeStory()
        MyWord.Windows(1).Selection.Copy()
        'Now retrieve the HTML property of the DataObject
        'stored on the clipboard
        sConvertedString = _
             Clipboard.GetData(System.Windows.Forms.DataFormats.Html)
        'Remove some leading text that shows up in some instances
        '(like when you insert it into an email in Outlook
        sConvertedString = _
             sConvertedString.Substring(sConvertedString.IndexOf("<html"))
        'Also remove multiple  characters that somehow end up in there
        sConvertedString = sConvertedString.Replace("Â", "")
        '…and you're done.
        sReturnString = sConvertedString
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
    Catch ex As Exception
        If Not MyWord Is Nothing Then
            MyWord.Quit(oDoNotSaveChanges)
            MyWord = Nothing
        End If
        MsgBox("Error converting Rich Text to HTML")
    End Try
    Return sReturnString
End Function

'
'That does it. If you need to insert your HTML into an
'Outlook mail message (as I did) here's how to do it using the function above.
'
Dim myotl As Microsoft.Office.Interop.Outlook.Application
Dim myMItem As Microsoft.Office.Interop.Outlook.MailItem
myotl = CreateObject("Outlook.application")
myMItem = myotl.CreateItem(Microsoft.Office.Interop.Outlook.OlItemType.olMailItem)
myMItem.Subject = 
    "This email was converted from rich text to HTML using a simple function in VB.net"
myMItem.Display(False)
myMItem.BodyFormat = Microsoft.Office.Interop.Outlook.OlBodyFormat.olFormatHTML
myMItem.HTMLBody = sConvertedString

Points of Interest

One word of warning, the HTML produced by this conversion process is very verbose. It produces a lot of lines of HTML for some very basic formatting, but it has performed error free conversion on thousands of pages of data thus far here where I work.

I am still surprised that Microsoft does not simply have RTF to HTML conversion functionality readily available in its development libraries. It seems like a logical and intuitive function to provide. Still, at least, there's a workaround.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Hanleyk1
Software Developer
United States United States
Hanley Loller. Ex-professional kayaker went back to school at 30 to learn computer programming. Earned my BS in computer science from East Tennessee State University in 2001. Worked for a couple of different companies before landing in the Office of Computing and Information Technology at the Kentucky State Legislature where I mostly write applications using SQL and VB.net. I love my job, but it's still not as good as kayaking for a living.

Comments and Discussions

 
SuggestionAnother way: Pin
Daniel Leykauf15-Feb-15 6:16
memberDaniel Leykauf15-Feb-15 6:16 
Questionusing code in class assembly calling it from SSRS report threading issue Pin
jrmocfl3-Nov-14 6:13
memberjrmocfl3-Nov-14 6:13 
QuestionCongratulations on this code Pin
jkValero10-Oct-13 7:10
memberjkValero10-Oct-13 7:10 
AnswerRe: Congratulations on this code Pin
Hanleyk121-Oct-13 9:58
memberHanleyk121-Oct-13 9:58 
QuestionError Pin
Ahmad Halabi11-Apr-13 23:11
memberAhmad Halabi11-Apr-13 23:11 
AnswerRe: Error [modified] Pin
Hanleyk115-Apr-13 10:28
memberHanleyk115-Apr-13 10:28 
GeneralRe: Error Pin
Ahmad Halabi15-Apr-13 13:38
memberAhmad Halabi15-Apr-13 13:38 
GeneralRe: Error Pin
Hanleyk116-Apr-13 10:12
memberHanleyk116-Apr-13 10:12 
AnswerRe: Error Pin
Hanleyk115-Apr-13 10:51
memberHanleyk115-Apr-13 10:51 
SuggestionRandom trailing characters using office 14 interop Pin
drewBorell17-Jul-12 12:48
memberdrewBorell17-Jul-12 12:48 
GeneralMy vote of 4 Pin
Rakesh Meel19-Oct-11 22:04
memberRakesh Meel19-Oct-11 22:04 
GeneralRe: My vote of 4 Pin
Hanleyk126-Oct-11 7:39
memberHanleyk126-Oct-11 7:39 
QuestionConverter leaves hardlinks to local files Pin
Johnnyasdf10-Sep-11 15:56
memberJohnnyasdf10-Sep-11 15:56 
AnswerRe: Converter leaves hardlinks to local files Pin
Hanleyk126-Oct-11 7:42
memberHanleyk126-Oct-11 7:42 
Sorry to be so slow in replying. I meant to look at this problem and see if there was a relatively simple way to address it. However, work has been taking up most of my time and I've been remiss in following up. If I find the time to look into this in the near future I'll post my answer back here. Meanwhile, if you find a good solution, I'd certainly appreciate it if you posted it. Thanks for your compliment and good luck with your code.
GeneralMy vote of 5 Pin
Stijn Courtheyn5-Aug-11 4:32
memberStijn Courtheyn5-Aug-11 4:32 
GeneralRe: My vote of 5 Pin
Hanleyk112-Aug-11 8:51
memberHanleyk112-Aug-11 8:51 
Generaluse of code Pin
SIFNOk14-Feb-11 22:30
memberSIFNOk14-Feb-11 22:30 
GeneralRe: use of code Pin
Hanleyk116-Feb-11 6:22
memberHanleyk116-Feb-11 6:22 
GeneralRemoving some of the Verbose HTML Pin
PaulNash19-Jan-11 1:40
memberPaulNash19-Jan-11 1:40 
GeneralRe: Removing some of the Verbose HTML Pin
Hanleyk131-Jan-11 7:46
memberHanleyk131-Jan-11 7:46 
GeneralProblem with Hyperlinks Pin
PaulNash19-Jan-11 1:32
memberPaulNash19-Jan-11 1:32 
GeneralRe: Problem with Hyperlinks Pin
Hanleyk131-Jan-11 7:47
memberHanleyk131-Jan-11 7:47 
GeneralGreat code. (problem with images) Pin
Croody14-Sep-10 5:26
memberCroody14-Sep-10 5:26 
GeneralRe: Great code. (problem with images) Pin
Hanleyk114-Sep-10 11:41
memberHanleyk114-Sep-10 11:41 
GeneralRe: Great code. (problem with images) Pin
Hanleyk129-Sep-10 11:31
memberHanleyk129-Sep-10 11:31 
GeneralRe: Great code. (problem with images) Pin
Stijn Courtheyn5-Aug-11 4:12
memberStijn Courtheyn5-Aug-11 4:12 
GeneralRe: Great code. (problem with images) Pin
Scotchy5-Aug-11 11:00
memberScotchy5-Aug-11 11:00 
GeneralRe: Great code. (problem with images) Pin
Hanleyk112-Aug-11 8:39
memberHanleyk112-Aug-11 8:39 
GeneralRe: Great code. (problem with images) Pin
Hanleyk112-Aug-11 8:56
memberHanleyk112-Aug-11 8:56 
GeneralBullets or indent problem, not sure Pin
bigbro_198517-Aug-10 2:21
memberbigbro_198517-Aug-10 2:21 
GeneralRe: Bullets or indent problem, not sure Pin
Hanleyk123-Aug-10 11:18
memberHanleyk123-Aug-10 11:18 
GeneralMy vote of 5 Pin
bigbro_198517-Aug-10 2:14
memberbigbro_198517-Aug-10 2:14 
JokeRe: My vote of 5 Pin
Hanleyk123-Aug-10 11:20
memberHanleyk123-Aug-10 11:20 
GeneralThanks, your code helped Pin
krashcontrol4-Apr-10 1:42
memberkrashcontrol4-Apr-10 1:42 
GeneralRe: Thanks, your code helped Pin
Hanleyk18-Apr-10 10:36
memberHanleyk18-Apr-10 10:36 
GeneralMy vote of 1 Pin
JeffBall18-Jan-10 17:48
memberJeffBall18-Jan-10 17:48 
GeneralRe: My vote of 1 Pin
Hanleyk120-Jan-10 3:55
memberHanleyk120-Jan-10 3:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150428.2 | Last Updated 14 Jan 2010
Article Copyright 2010 by Hanleyk1
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid