|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionIn 1992, Microsoft introduced the Rich Text Format for specifying simple formatted text with embedded graphics. Initially intended to transfer such data between different applications on different operating systems (MS-DOS, Windows, OS/2, and Apple Macintosh), today this format is commonly used in Windows for enhanced editing capabilities ( However, as soon as you have to work with such data in RTF format, additional wishes start to get strong:
The component introduced in this article has been designed with the following goals in mind:
Please keep the following shortcomings in mind:
In general, this should not pose a big problem for many areas of use. A conforming RTF writer should always write content with readers in mind that they do not know about tags and features which were introduced later in the standards history. As a consequence, a lot of the content in an RTF document is stored several times (at least if the writer cares about other applications). This is taken advantage of by the interpreter here, which just simply focuses on the visual content. Some writers in common use, however, improperly support this alternate representation which will result in differences in the resulting output. Thanks to its open architecture, the RTF parser is a solid base for development of an RTF converter which focuses on layout. RTF Parser
The actual parsing of the data is being done by the class
The actual parsing process can be monitored by The integrated parser listener The parser listener // ----------------------------------------------------------------------
public class MyVisitor : IRtfElementVisitor
{
void RtfWriteStructureModel()
{
RtfParserListenerFileLogger logger =
new RtfParserListenerFileLogger( @"c:\temp\RtfParser.log" );
IRTFGroup structureRoot =
RtfParserTool.Parse( @"{\rtf1foobar}", logger );
structureRoot.Visit( this );
} // RtfWriteStructureModel
// ----------------------------------------------------------------------
void IRtfElementVisitor.VisitTag( IRtfTag tag )
{
Console.WriteLine( "Tag: " + tag.FullName );
} // IRtfElementVisitor.VisitTag
// ----------------------------------------------------------------------
void IRtfElementVisitor.VisitGroup( IRtfGroup group )
{
Console.WriteLine( "Group: " + group.Destination );
foreach ( IRtfElement child in group.Contents )
{
child.Visit( this ); // recursive
}
} // IRtfElementVisitor.VisitGroup
// ----------------------------------------------------------------------
void IRtfElementVisitor.VisitText( IRtfText text )
{
Console.WriteLine( "Text: " + text.Text );
} // IRtfElementVisitor.VisitText
} // MyVisitor
Note, however, that the same result for such simple functionality could be achieved by writing a custom The utility class The interface // ------------------------------------------------------------------------
public class MyParserListener : RtfParserListenerBase
{
// ----------------------------------------------------------------------
protected override void DoParseBegin()
{
Console.WriteLine( "parse begin" );
} // DoParseBegin
// ----------------------------------------------------------------------
protected override void DoGroupBegin()
{
Console.WriteLine( "group begin - level " + Level.ToString() );
} // DoGroupBegin
// ----------------------------------------------------------------------
protected override void DoTagFound( IRtfTag tag )
{
Console.WriteLine( "tag " + tag.FullName );
} // DoTagFound
// ----------------------------------------------------------------------
protected override void DoTextFound( IRtfText text )
{
Console.WriteLine( "text " + text.Text );
} // DoTextFound
// ----------------------------------------------------------------------
protected override void DoGroupEnd()
{
Console.WriteLine( "group end - level " + Level.ToString() );
} // DoGroupEnd
// ----------------------------------------------------------------------
protected override void DoParseSuccess()
{
Console.WriteLine( "parse success" );
} // DoParseSuccess
// ----------------------------------------------------------------------
protected override void DoParseFail( RtfException reason )
{
Console.WriteLine( "parse failed: " + reason.Message );
} // DoParseFail
// ----------------------------------------------------------------------
protected override void DoParseEnd()
{
Console.WriteLine( "parse end" );
} // DoParseEnd
} // MyParserListener
Note that the used base class already provides (empty) implementations for all the interface methods, so only the ones which are required for a specific purpose need to be overridden. RTF Interpreter
Once an RTF document has been parsed into a structure model, it is subject to interpretation through the RTF interpreter. One obvious way to interpret the structure is to build a Document Model which provides high-level access to the meaning of the document's contents. A very simple document model is part of this component, and consists of the following building blocks:
The various Visuals represent the recognized visible RTF elements, and can be examined with any Analogous to the possibilities of the RTF parser, the provided Analyzing documents might be simplified by using the Construction of the document model is also achieved through such an interpreter listener ( The following example shows how to make use of the high-level API of the document model: // ----------------------------------------------------------------------
void RtfWriteDocumentModel( Stream rtfStream )
{
RtfInterpreterListenerFileLogger logger =
new RtfInterpreterListenerFileLogger( @"c:\temp\RtfInterpreter.log" );
IRtfDocument document = RtfInterpreterTool.BuildDoc( rtfStream, logger );
RtfWriteDocument( document );
} // RtfWriteDocumentModel
// ----------------------------------------------------------------------
void RtfWriteDocument( IRtfDocument document )
{
Console.WriteLine( "RTF Version: " + document.RtfVersion.ToString() );
// document info
Console.WriteLine( "Title: " + document.DocumentInfo.Title );
Console.WriteLine( "Subject: " + document.DocumentInfo.Subject );
Console.WriteLine( "Author: " + document.DocumentInfo.Author );
// ...
// fonts
foreach ( IRtfFont font in document.FontTable )
{
Console.WriteLine( "Font: " + font.Name );
}
// colors
foreach ( IRtfColor color in document.ColorTable )
{
Console.WriteLine( "Color: " + color.AsDrawingColor.ToString() );
}
// user properties
foreach ( IRtfDocumentProperty documentProperty in document.UserProperties )
{
Console.WriteLine( "User property: " + documentProperty.Name );
}
// visuals (preferably handled through an according visitor)
foreach ( IRtfVisual visual in document.VisualContent )
{
switch ( visual.Kind )
{
case RtfVisualKind.Text:
Console.WriteLine( "Text: " + ((IRtfVisualText)visual).Text );
break;
case RtfVisualKind.Break:
Console.WriteLine( "Tag: " +
((IRtfVisualBreak)visual).BreakKind.ToString() );
break;
case RtfVisualKind.Special:
Console.WriteLine( "Text: " +
((IRtfVisualSpecialChar)visual).CharKind.ToString() );
break;
case RtfVisualKind.Image:
IRtfVisualImage image = (IRtfVisualImage)visual;
Console.WriteLine( "Image: " + image.Format.ToString() +
" " + image.Width.ToString() + "x" + image.Height.ToString() );
break;
}
}
} // RtfWriteDocument
As with the parser, the class The interface // ------------------------------------------------------------------------
public class MyInterpreterListener : RtfInterpreterListenerBase
{
// ----------------------------------------------------------------------
protected override void DoBeginDocument( IRtfInterpreterContext context )
{
// custom action
} // DoBeginDocument
// ----------------------------------------------------------------------
protected override void DoInsertText( IRtfInterpreterContext context, string text )
{
// custom action
} // DoInsertText
// ----------------------------------------------------------------------
protected override void DoInsertSpecialChar( IRtfInterpreterContext context,
RtfVisualSpecialCharKind kind )
{
// custom action
} // DoInsertSpecialChar
// ----------------------------------------------------------------------
protected override void DoInsertBreak( IRtfInterpreterContext context,
RtfVisualBreakKind kind )
{
// custom action
} // DoInsertBreak
// ----------------------------------------------------------------------
protected override void DoInsertImage( IRtfInterpreterContext context,
RtfVisualImageFormat format,
int width, int height, int desiredWidth, int desiredHeight,
int scaleWidthPercent, int scaleHeightPercent,
string imageDataHex
)
{
// custom action
} // DoInsertImage
// ----------------------------------------------------------------------
protected override void DoEndDocument( IRtfInterpreterContext context )
{
// custom action
} // DoEndDocument
} // MyInterpreterListener
The RTF Base ConvertersAs a foundation for the development of more complex converters, there are four base converters available for text, images, XML, and HTML. They are designed to be extended by inheritance.
Text ConverterThe // ----------------------------------------------------------------------
void ConvertRtf2Text( Stream rtfStream )
{
// logger
RtfInterpreterListenerFileLogger logger =
new RtfInterpreterListenerFileLogger( @"c:\temp\RtfInterpreter.log" );
// text converter
RtfTextConvertSettings textConvertSettings = new RtfTextConvertSettings();
textConvertSettings.BulletText = "-"; // // replace default bullet text '°'
RtfTextConverter textConverter = new RtfTextConverter( textConvertSettings );
// interpreter
RtfInterpreterTool.Interpret( rtfStream, logger, textConverter );
Console.WriteLine( textConverter.PlainText );
} // ConvertRtf2Text
Image ConverterThe // ----------------------------------------------------------------------
void ConvertRtf2Image( Stream rtfStream )
{
// logger
RtfInterpreterListenerFileLogger logger =
new RtfInterpreterListenerFileLogger( @"c:\temp\RtfInterpreter.log" );
// image converter
// convert all images to JPG
RtfVisualImageAdapter imageAdapter = new RtfVisualImageAdapter( ImageFormat.Jpeg );
RtfImageConvertSettings imageConvertSettings =
new RtfImageConvertSettings( imageAdapter );
imageConvertSettings.ImagesPath = @"c:\temp\images\";
imageConvertSettings.ScaleImage = true; // scale images
RtfImageConverter imageConverter = new RtfImageConverter( imageConvertSettings );
// interpreter
RtfInterpreterTool.Interpret( rtfStream, logger, imageConverter );
// all images are saved to the path 'c:\temp\images\'
} // ConvertRtf2Image
XML ConverterThe // ----------------------------------------------------------------------
void ConvertRtf2Xml( Stream rtfStream )
{
// logger
RtfInterpreterListenerFileLogger logger =
new RtfInterpreterListenerFileLogger( @"c:\temp\RtfInterpreter.log" );
// interpreter
IRtfDocument rtfDocument = RtfInterpreterTool.BuildDoc( rtfStream, logger );
// XML convert
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Indent = true;
xmlWriterSettings.IndentChars = ( " " );
string fileName = @"c:\temp\Rtf.xml";
using ( XmlWriter writer = XmlWriter.Create( fileName, xmlWriterSettings ) )
{
RtfXmlConverter xmlConverter = new RtfXmlConverter( rtfDocument, writer );
xmlConverter.Convert();
writer.Flush();
}
} // ConvertRtf2Xml
HTML ConverterThe // ----------------------------------------------------------------------
void ConvertRtf2Html( Stream rtfStream )
{
// logger
RtfInterpreterListenerFileLogger logger =
new RtfInterpreterListenerFileLogger( @"c:\temp\RtfInterpreter.log" );
// image converter
// convert all images to JPG
RtfVisualImageAdapter imageAdapter =
new RtfVisualImageAdapter( ImageFormat.Jpeg );
RtfImageConvertSettings imageConvertSettings =
new RtfImageConvertSettings( imageAdapter );
imageConvertSettings.ScaleImage = true; // scale images
RtfImageConverter imageConverter =
new RtfImageConverter( imageConvertSettings );
// interpreter
IRtfDocument rtfDocument = RtfInterpreterTool.Interpret( rtfStream,
logger, imageConverter );
// html converter
RtfHtmlConvertSettings htmlConvertSettings =
new RtfHtmlConvertSettings( imageAdapter );
htmlConvertSettings.StyleSheetLinks.Add( "default.css" );
RtfHtmlConverter htmlConverter = new RtfHtmlConverter( rtfDocument,
htmlConvertSettings );
Console.WriteLine( htmlConverter.Convert() );
} // ConvertRtf2Html
HTML
The
RTF Converter ApplicationsThe console applications Rtf2RawThe command line application Rtf2Raw source-file [destination] [/IT:format] [/CE:encoding]
[/IS+] [/ST-] [/SI-] [/LD:path] [/LP] [/LI] [/D] [/O] [/?]
source-file source rtf file
destination destination directory (default=source-file directory)
/IT:format images type format: bmp, emf, exif, gif, icon, jpg,
png, tiff or wmf (default=original)
/CE:encoding character encoding: ASCII, UTF7, UTF8, Unicode,
BigEndianUnicode, UTF32, OperatingSystem (default=UTF8)
/IS+ image scale (default=off)
/ST- don't save text to the destination (default=on)
/SI- don't save images to the destination (default=on)
/LD:path log file directory (default=destination directory)
/LP write rtf parser log (default=off)
/LI write rtf interpreter log (default=off)
/D write text to screen (default=off)
/O open text in associated application (default=off)
/? this help
Samples:
Rtf2Raw MyText.rtf
Rtf2Raw MyText.rtf c:\temp
Rtf2Raw MyText.rtf c:\temp /CSS:MyCompany.css
Rtf2Raw MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css
Rtf2Raw MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css /IT:png
Rtf2Raw MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css /IT:png
/LD:log /LP /LI
Rtf2XmlThe command line application Rtf2Xml source-file [destination] [/CE:encoding] [/P:prefix]
[/NS:namespace] [/LD:path] [/LP] [/LI] [/?]
source-file source rtf file
destination destination directory (default=source-file directory)
/CE:encoding character encoding: ASCII, UTF7, UTF8, Unicode,
BigEndianUnicode, UTF32, OperatingSystem (default=UTF8)
/P:prefix xml prefix (default=none)
/NS:namespace xml namespace (default=none)
/LD:path log file directory (default=destination directory)
/LP write rtf parser log (default=off)
/LI write rtf interpreter log (default=off)
/? this help
Samples:
Rtf2Xml MyText.rtf
Rtf2Xml MyText.rtf c:\temp
Rtf2Xml MyText.rtf c:\temp /NS:MyNs
Rtf2Xml MyText.rtf c:\temp /LD:log /LP /LI
Rtf2HtmlThe command line application Rtf2Html source-file [destination] [/CSS:names] [/ID:path] [/IT:format] [/CE:encoding]
[/CS:charset] [/SH-] [/SI-] [/LD:path] [/LP] [/LI] [/D] [/O] [/?]
source-file source rtf file
destination destination directory (default=source-file directory)
/CSS:name1,name2 style sheet names (default=none)
/ID:path images directory (default=destination directory)
/IT:format images type format: jpg, gif or png (default=jpg)
/CE:encoding character encoding: ASCII, UTF7, UTF8, Unicode,
BigEndianUnicode, UTF32, OperatingSystem (default=UTF8)
/CS:charset document character set used for the HTML header meta-tag
'content' (default=UTF-8)
/SH- don't save HTML to the destination (default=on)
/SI- don't save images to the destination (default=on)
/LD:path log file directory (default=destination directory)
/LP write rtf parser log file (default=off)
/LI write rtf interpreter log file (default=off)
/D display HTML text on screen (default=off)
/O open HTML in associated application (default=off)
/? this help
Samples:
Rtf2Html MyText.rtf
Rtf2Html MyText.rtf c:\temp
Rtf2Html MyText.rtf c:\temp /CSS:MyCompany.css
Rtf2Html MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css
Rtf2Html MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css
/ID:images /IT:png
Rtf2Html MyText.rtf c:\temp /CSS:MyCompany.css,ThisProject.css
/ID:images /IT:png /LD:log /LP /LI
Sample RTF to HTML Converter - RTF Input
Sample RTF to HTML Converter - HTML Output
ProjectsThe following projects are provided in the RTF converter component:
System FunctionsThe project
AcknowledgementSpecial thanks to Leon Poyyayil for the design and his support and contribution in the development of this component. History
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||