Click here to Skip to main content
Click here to Skip to main content
Add your own
alternative version
Go to top

Writing Your Own RTF Converter

, 1 Aug 2013
An article on how to write a custom RTF parser and converter.
rtfconverter_sourcecode.zip
RtfConverter_SourceCode
bin
Debug
Release
Itenso.Rtf.Converter.Html.dll
Itenso.Rtf.Converter.Xml.dll
Itenso.Rtf.Interpreter.dll
Itenso.Rtf.Parser.dll
Itenso.Sys.dll
log4net.dll
nunit-console-runner.dll
nunit.core.dll
nunit.framework.dll
nunit.util.dll
Rtf2Html.exe
Rtf2Raw.exe
Rtf2Xml.exe
RtfInterpreterTests.exe
RtfParserTests.exe
RtfWindows.exe
RtfWinForms.exe
Converter
Html
Properties
Xml
Properties
docu
Word2007RTFSpec9.pdf
ext
log4net.dll
nunit-console-runner.dll
nunit.core.dll
nunit.framework.dll
nunit.util.dll
Interpreter
Converter
Image
Text
Interpreter
Model
Properties
Support
InterpreterTests
Properties
RtfInterpreterTest
RtfInterpreterTest_0.rtf
RtfInterpreterTest_1.rtf
RtfInterpreterTest_10.rtf
RtfInterpreterTest_11.rtf
RtfInterpreterTest_12.rtf
RtfInterpreterTest_13.rtf
RtfInterpreterTest_14.rtf
RtfInterpreterTest_15.rtf
RtfInterpreterTest_16.rtf
RtfInterpreterTest_17.rtf
RtfInterpreterTest_18.rtf
RtfInterpreterTest_19.rtf
RtfInterpreterTest_2.rtf
RtfInterpreterTest_20.rtf
RtfInterpreterTest_21.rtf
RtfInterpreterTest_22.rtf
RtfInterpreterTest_23.rtf
RtfInterpreterTest_3.rtf
RtfInterpreterTest_4.rtf
RtfInterpreterTest_4.rtf.jpg
RtfInterpreterTest_4.rtf.jpg.hex
RtfInterpreterTest_5.rtf
RtfInterpreterTest_5.rtf.png
RtfInterpreterTest_5.rtf.png.hex
RtfInterpreterTest_6.rtf
RtfInterpreterTest_6.rtf.wmf
RtfInterpreterTest_6.rtf.wmf.hex
RtfInterpreterTest_7.rtf
RtfInterpreterTest_7.rtf.emf
RtfInterpreterTest_7.rtf.emf.hex
RtfInterpreterTest_8.rtf
RtfInterpreterTest_8.rtf.wmf
RtfInterpreterTest_8.rtf.wmf.hex
RtfInterpreterTest_9.rtf
RtfInterpreterTest_fail_0.rtf
RtfInterpreterTest_fail_1.rtf
RtfInterpreterTest_fail_2.rtf
RtfInterpreterTest_fail_3.rtf
RtfInterpreterTest_fail_4.rtf
keys
ItensoRtfConverter.snk
Parser
Model
Parser
Properties
Support
ParserTests
Properties
RtfParserTest
minimal.rtf
RtfParserTest_0.rtf
RtfParserTest_1.rtf
RtfParserTest_2.rtf
RtfParserTest_3.rtf
RtfParserTest_4.rtf
RtfParserTest_5.rtf
RtfParserTest_6.rtf
RtfParserTest_7.rtf
RtfParserTest_8.rtf
RtfParserTest_fail_0.rtf
RtfParserTest_fail_1.rtf
RtfParserTest_fail_2.rtf
RtfParserTest_fail_3.rtf
RtfParserTest_fail_4.rtf
RtfParserTest_fail_5.rtf
RtfParserTest_fail_6.rtf
Rtf2Html
Properties
Rtf2Raw
Properties
Rtf2Xml
Properties
RtfWindows
Properties
RtfWinForms
DefaultText.rtf
Properties
Sys
Application
Collection
Logging
Properties
Test
RtfConverter_v1.7.0.zip
Rtf2Xml
Properties
bin
Release
Itenso.Rtf.Converter.Html.dll
Itenso.Rtf.Converter.Xml.dll
Itenso.Rtf.Interpreter.dll
Itenso.Rtf.Parser.dll
Itenso.Sys.dll
log4net.dll
nunit-console-runner.dll
nunit.core.dll
nunit.framework.dll
nunit.util.dll
Rtf2Html.exe
Rtf2Raw.exe
Rtf2Xml.exe
RtfInterpreterTests.exe
RtfParserTests.exe
RtfWindows.exe
RtfWinForms.exe
Converter
Html
Properties
Xml
Properties
docu
Word2007RTFSpec9.pdf
ext
log4net.dll
nunit-console-runner.dll
nunit.core.dll
nunit.framework.dll
nunit.util.dll
Interpreter
Converter
Image
Text
Interpreter
Model
Properties
Support
InterpreterTests
Properties
RtfInterpreterTest
RtfInterpreterTest_0.rtf
RtfInterpreterTest_1.rtf
RtfInterpreterTest_10.rtf
RtfInterpreterTest_11.rtf
RtfInterpreterTest_12.rtf
RtfInterpreterTest_13.rtf
RtfInterpreterTest_14.rtf
RtfInterpreterTest_15.rtf
RtfInterpreterTest_16.rtf
RtfInterpreterTest_17.rtf
RtfInterpreterTest_18.rtf
RtfInterpreterTest_19.rtf
RtfInterpreterTest_2.rtf
RtfInterpreterTest_20.rtf
RtfInterpreterTest_21.rtf
RtfInterpreterTest_22.rtf
RtfInterpreterTest_23.rtf
RtfInterpreterTest_3.rtf
RtfInterpreterTest_4.rtf
RtfInterpreterTest_4.rtf.jpg
RtfInterpreterTest_4.rtf.jpg.hex
RtfInterpreterTest_5.rtf
RtfInterpreterTest_5.rtf.png
RtfInterpreterTest_5.rtf.png.hex
RtfInterpreterTest_6.rtf
RtfInterpreterTest_6.rtf.wmf
RtfInterpreterTest_6.rtf.wmf.hex
RtfInterpreterTest_7.rtf
RtfInterpreterTest_7.rtf.emf
RtfInterpreterTest_7.rtf.emf.hex
RtfInterpreterTest_8.rtf
RtfInterpreterTest_8.rtf.wmf
RtfInterpreterTest_8.rtf.wmf.hex
RtfInterpreterTest_9.rtf
RtfInterpreterTest_fail_0.rtf
RtfInterpreterTest_fail_1.rtf
RtfInterpreterTest_fail_2.rtf
RtfInterpreterTest_fail_3.rtf
RtfInterpreterTest_fail_4.rtf
keys
ItensoRtfConverter.snk
Parser
Model
Parser
Properties
Support
ParserTests
Properties
RtfParserTest
minimal.rtf
RtfParserTest_0.rtf
RtfParserTest_1.rtf
RtfParserTest_2.rtf
RtfParserTest_3.rtf
RtfParserTest_4.rtf
RtfParserTest_5.rtf
RtfParserTest_6.rtf
RtfParserTest_7.rtf
RtfParserTest_8.rtf
RtfParserTest_fail_0.rtf
RtfParserTest_fail_1.rtf
RtfParserTest_fail_2.rtf
RtfParserTest_fail_3.rtf
RtfParserTest_fail_4.rtf
RtfParserTest_fail_5.rtf
RtfParserTest_fail_6.rtf
Rtf2Html
Properties
Rtf2Raw
Properties
RtfWindows
Properties
RtfWinForms
DefaultText.rtf
Properties
Sys
Application
Collection
Logging
Properties
Test
Rtf2Html2010.suo
Rtf2Raw2010.suo
Rtf2Xml2010.suo
RtfInterpreter2010.suo
RtfParser2010.suo
RtfWindows2010.suo
RtfWinForms2010.suo
History

3rd March, 2012 - v1.4.0.0
-------------------------------------------------------------------------------
- RtfTextFormat: Fixed reset of super script in DeriveWithSuperScript

26th April, 2011 - v1.3.0.0
-------------------------------------------------------------------------------
- RtfVisualImageAdapter: Relaxed handling for non-hex image data

8th April, 2011 - v1.2.0.0
-------------------------------------------------------------------------------
- RtfFontBuilder: Relaxed handling for missing font names, generating font name 'UnnamedFont_{fond-id}'

14th February, 2011
-------------------------------------------------------------------------------
- Replaced RtfHmltCpecialCharConverter with RtfHtmlSpecialCharCollection
- RtfHtmlConverter: New property SpecialCharacters

25th January, 2011
-------------------------------------------------------------------------------
- Rtf Interpreter: Fixed retroactive paragraph changes

1th December, 2010
-------------------------------------------------------------------------------
- RtfHtmlSpecialCharConverter: New class to handle special character conversion
- RtfHtmlConverterSettings: New property SpecialCharsRepresentation
- RtfHtmlConverter: Added support for special character conversion
- Rtf2Html: New command line argument /SC to control the special character conversion
- Removed projects and solutions for Visual Studio 2005
- Added projects and solutions for Visual Studio 2010

20th August, 2009
-------------------------------------------------------------------------------
- RtfHtmlConverterSettings: New property UseNonBreakingSpaces
- Rtf2Html: New command line argument /NBS to replace spaces with non-breaking spaces (default=off)

18th August, 2009
-------------------------------------------------------------------------------
- signed assemblies
- RtfImageConverterSettings: New property BackgroundColor
- Rtf2Raw, Rtf2Html: New command line argument /BC for the image background color

3rd August, 2009
-------------------------------------------------------------------------------
- RtfHtmlConverter: Replacing text spaces with non-breaking-spaces
- RtfImageConverter: Fixed missing converted image info in case of undefined target format

20th May, 2009
-------------------------------------------------------------------------------
- RtfHtmlConverter: Added support for Justify alignment
- RtfHtmlConverter: Fixed missing closing tag </ul> for bulleted lists, in case when ConvertScope is set to Content

5th May, 2009
-------------------------------------------------------------------------------
- RtfSpec: New tag highlight for highlighted text
- RtfInterpreter: Added support for text highlighting
- Rtf2Raw, Rtf2Xml, Rtf2Html: Enumeration ProgramExitCode contains all program exit codes

20th February, 2009
-------------------------------------------------------------------------------
- RtfParser: Various new specialized exceptions based on RtfParserException
- RtfInterpreter: Various new specialized exceptions based on RtfInterpreterException
- Projects Sys, Parser and Interpreter: Extracted localizable strings to Strings.cs and Strings.resx
- Rtf2Html: New command line argument /DS to control the conversion scope

18th February, 2009
-------------------------------------------------------------------------------
- RtfSpec: New tag 'nonshppict' for alternative images
- RtfInterpreter: Ignoring alternative images
- RtfInterpreterTest: New unit-test for alternative images

16th February, 2009
-------------------------------------------------------------------------------
- RtfImageConvertSettings: New properties ScaleOffset and ScaleExtension to control image scaling
- Rtf2Raw and Rtf2Html: New command line argument /XS to fix the BorderBug
- Changed binaries from debug to release (slightly better performance)

5th February, 2009
-------------------------------------------------------------------------------
- RtfInterpreter: Extended group handling to recognize state transition from header to document in case no other mechanism detects it and the content starts with a group with such a 'destination'

3rd February, 2009
-------------------------------------------------------------------------------
- Rtf2Html: Added support to convert visual hyperlinks
  - New command line argument /CH to convert visual hyperlinks (default is off)
  - New command line argument /HP:pattern with the regular expression pattern to recognize visual hyperlinks (optional)
  - Programmatical control with RtfHtmlConvertSettings.ConvertVisualHyperlinks and RtfHtmlConvertSettings.VisualHyperlinkPattern
  - Only visible hyperlinks will be converted. Does not support hyperlinks which are represented by a display text and the actual hyperlink stored in a (hidden) field content.
- Refactored code - or rather 'ReSharped' :)

22th October, 2008 
-------------------------------------------------------------------------------
- RtfParser: Fixed to properly handle skipping of unicode alternative representation in case these are written in hex-encoded form 
- RtfHtmlConverter: New property DocumentImages which provides information about the converted images using IRtfConvertedImageInfo 
- added ChangeHistory.txt (this file)

15th October, 2008 
-------------------------------------------------------------------------------
- Added support for tags 
   \sub: changes font size to 2/3 of the current font size and moves 'down' by half the current font size. 
   \super: changes font size to 2/3 of the current font size and moves 'up' by half the current font size. 
   \nosupersub: resets the 'up'/'down' baseline alignment to zero.
     ATTENTION: this leaves the font size unchanged as it is not known by the current implementation, what the 'previous' font size was.
     Hence, depending on the rtf-writer, this might lead to content that is displayed with a smaller font size than intended. 
   \v*: toggles the new IsHidden property of the IRtfTextFormat.
   \v and \v1 turn it on while \v0 turns it off (according to the behavior or 'boolean tags'). 
   \viewkind: triggers the transition from interpreter state InHeader to InDocument (but only if the font table is already defined).
     This supports documents without color table and prevents formatting or content at the beginning from being ignored. 
- Extended/fixed support for tags 
    \dn and \up: will use the specified default value of '6' if none is given in the RTF (instead of resetting to zero). 
- RtfTextConverterSettings/RtfXmlConverterSettings/RtfHtmlConverterSettings: have a new flag IsShowHiddenText which defaults to false. 
- RtfTextConverter/RtfXmlConverter/RtfHtmlConverter: will only append found text to the plain text buffer if it is not marked hidden in its text format or if the new setting IsShowHiddenText is explicitly set to true. 
- Rtf2Raw/Rtf2Xml/Rtf2Html: New commandline argument /HT to convert hidden text 
- RtfWinForms/RtfWindows: Conversion is considering the text selection 
- Added projects and solutions for Visual Studio 2005 
- Download now contains the binaries 

13th October, 2008 
-------------------------------------------------------------------------------
- RtfImageConverter: New property ConvertedImages which provides information about the converted images using IRtfConvertedImageInfo 

3rd October, 2008 
-------------------------------------------------------------------------------
- RtfHtmlConverter: Fixed encoding of image file names 

-------------------------------------------------------------------------------
26th September, 2008 
- Added support for \pict tags wrapped in a \*\shppict group 
- RtfGroup: Extended debugging visualization 

23rd September, 2008 
-------------------------------------------------------------------------------
- RtfParser: Fixed local group fonts 

18th September, 2008 
-------------------------------------------------------------------------------
- RtfSpec: New tag constants for the theme fonts and stylesheet 
- RtfParser: Added support for dealing with theme fonts during decoding (ugly but necessary when such fonts are used for hex encoded content) 
- RtfFontBuilder: Added support for theme fonts 
- RtfFontBuilder: Added support for font names with an alternative representation 
- RtfFontTableBuilder: Added support for theme fonts 
- RtfInterpeter: Added special support for stylesheets by ignoring their contents (to prevent the style names from appearing in the document content) 
- Added the test document (in two variants) as unit test input 

31st July, 2008 
-------------------------------------------------------------------------------
- New Windows Forms sample application which demonstrates a simple conversion from RTF to Text/XML/HTML 
- New WPF sample application which demonstrates a simple conversion from RTF to Text/XML/HTML 
- IRtfInterpreterContext: Replaced int DefaultFontIndex by string DefaultFontId to support WPF RichTextBox font indexing 

14th July, 2008 
-------------------------------------------------------------------------------
- RtfHtmlConveter: New class IRtfHtmlStyleConverter which provides external style conversion 
- RtfHtmlConveter: Changed default image type from GIF to JPEG 
- RtfHtmlConveter: Fixed align of images 

11th July, 2008 
-------------------------------------------------------------------------------
- Support for character set decoding per font 
- Support for decoding multi-byte hex-encoded characters (this handles East Asian fonts commonly encoded in this way instead of using Unicode) 
- Special treatment for the Windows legacy pseudo codepage 42 (mapped to the ANSI codepage) 
- All command line applications: new command line parameter /CE for specifying the output encoding (such as UTF-8 or UTF-16 a.k.a. Unicode) 
- Rtf2Html: Fixed HTML-encoded string 
- Command line application Rtf2Html: New command line parameter /CS for the HTML character set 
- RtfParserTest and RtfInterpreterTest: New unit-tests for multi-byte character set decoding 
- Minor bug fixes 

3rd July, 2008 
-------------------------------------------------------------------------------
- Command line applications: Fixed exception in case when log parser is not used 

1st July, 2008 
-------------------------------------------------------------------------------
- Initial public release 

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jani Giannoudis
Software Developer (Senior)
Switzerland Switzerland
Jani is Co-founder of Meerazo.com, a free service which allows to share resources like locations, things, persons and their services in a cooperating group of people.

| Advertise | Privacy | Mobile
Web01 | 2.8.140916.1 | Last Updated 1 Aug 2013
Article Copyright 2008 by Jani Giannoudis
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid