|
Introduction
NRtfTree Library (LGPL) is a set of classes written entirely in C# that may be used to manage RTF documents in your own applications. NRtfTree will help you:
- Open and parse RTF files.
- Analyze the content of RTF files.
- Add, modify and remove document elements (i.e. text, control words, control symbols).
- Create new RTF documents.
Background
RTF (Rich Text Format) is a method of encoding formatted text and graphics for easy transfer between applications. An RTF document can contain text, images, tables, lists, hyperlinks and many other text and graphic elements. In addition, RTF is the format used internally by the RichTextBox control included as part of .NET Framework. Nevertheless, its functionality is not enough to satisfy all aspects of RTF file management.
Using the Code
NRtfTree has two modes of operation:
- DOM-like mode: RTF documents are loaded in a tree structure and are provided several methods to traverse it, access tag contents and modify or create new nodes. This implementation requires the entire content of a document to be parsed and stored in memory.
In this mode, the main classes are RtfTree and RtfTreeNode:

- SAX-like mode: RTF file parser is implemented as an event-driven model in which the programmer provides callback methods that are invoked by the parser as part of its traversal of the RTF document.
In this mode, the main classes are RtfReader and SARParser:

Examples
The following lines show how you can use the class library in your own code.
- DOM-like mode
This code loads an RTF document into an RtfTree object and inspects all the child nodes:
public void doSomething()
{
RtfTree tree = new RtfTree();
tree.LoadRtfFile("c:\rtfdoc.rtf");
RtfTreeNode root = tree.RootNode;
RtfTreeNode node = new RtfTreeNode();
for(int i = 0; i < root.ChildNodes.Count; i++)
{
node = root.ChildNodes[i];
if(node.NodeType == RTF_NODE_TYPE.GROUP)
{
}
else if(node.NodeType == RTF_NODE_TYPE.CONTROL)
{
}
else if(node.NodeType == RTF_NODE_TYPE.KEYWORD)
{
switch(nodo.NodeKey)
{
case "f":
break;
case "cf":
break;
case "fs":
break;
}
}
else if(node.NodeType == RTF_NODE_TYPE.TEXT)
{
}
}
}
- SAX-like mode
This is an example of the implementation of a simple rft sax-parser:
public class MyParser : SARParser
{
public override void StartRtfDocument()
{
doc +=
"<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\r\n";
doc += "<DOCUMENT>\r\n";
}
public override void EndRtfDocument()
{
doc += "\r\n</DOCUMENT>";
}
public override void StartRtfGroup()
{
}
public override void EndRtfGroup()
{
}
public override void RtfControl(string key,
bool hasParam, int param)
{
}
public override void RtfKeyword(string key,
bool hasParam, int param)
{
switch(key)
{
case "b":
break;
case "i":
break;
}
}
public override void RtfText(string text)
{
doc += text;
}
}
Once you have completed the parser, you can start parsing the RTF document by calling the function RtfReader.Parse(). Then the handlers for the configured events are automatically called as many times as necessary:
MiParser parser = new MyParser(res);
reader = new RtfReader(parser);
reader.LoadRtfFile(rutaRTF);
reader.Parse();
- RtfDocument class
You can create new RTF documents using the new class RtfDocument (beta):
RtfDocument doc = new RtfDocument("testdoc.rtf");
RtfTextFormat format = new RtfTextFormat();
format.size = 20;
format.bold = true;
format.underline = true;
doc.AddText("Title", format);
doc.AddNewLine();
doc.AddNewLine();
format.size = 12;
format.bold = false;
format.underline = false;
doc.AddText("This is a test.", format);
doc.AddText("This is a text.");
doc.AddNewLine();
doc.AddImage("test.png", 50, 50);
doc.Close();
Software License
NRtfTree Library is licensed under the GNU LGPL license.
More Information
You can find up-to-date information on my personal home page (Spanish) or NRtfTree SourceForge Project (English).
References
History
- 2007/09/02 - v0.3.0 beta 1
- New license: LGPL.
- New classes to create RTF documents (basic support in beta):
RtfDocument, RtfColorTable, RtfFontTable and RtfTextFormat.
RtfTree class:
- New property
MergeSpecialCharacters. When it is set to true, if special character is found ('\') it is converted to Text node and eventually merged to adjacent text nodes.
- New property
Text. Returns plain text from the RTF document.
- New method
GetEncoding(). Returns document encoding.
RtfTreeNode class:
- New property
Tree. Returns a reference to owner RTF tree.
- New method To
String().
- New method
InsertChild(). Inserts a new node at the specified location.
- Methods
SelectXXXByType() have been replaced by SelectXXX() overloads.
- New methods
SelectSibling() (3 overloads).
RtfNodeCollection class:
- New method
Insert(). Inserts a new node at the specified location.
- New method
RemoveRange(). Remove a range of nodes from the list.
InfoGroup class:
- Fixed Bugs:
Group and Root node types initialization with "ROOT" and "GROUP".
NRtfTree.Rtf property didn't include last '}' in a group node RTF code.
- NRtfTree does not treat correctly special characters '\', '{' and '}' as part of the text.
- Methods
RtfTreeNode.AppendChild() and InsertChild() should update Root and Tree properties recursively.
- 2006/12/10 - v0.2.1
- Fixed - Bug in
NRtfTree.SaveRtf() - Special character hex codes with one digit.
- 2005/12/17 - v0.2.0
- New namespaces:
Net.Sgoliver.NRtfTree.Core and Net.Sgoliver.NRtfTree.Util
- New classes:
ImageNode, ObjectNode, InfoGroup.
RtfTreeNode class:
- New properties:
LastChild, NextSibling, PreviousSibling, Rtf.
- New methods:
CloneNode(), HasChildNodes(), SelectSingleNode(), SelectSingleChildNode(), SelectChildNodes(), SelectNodes(), SelectSingleChildNodeType(), SelectChildNodesByType(), SelectNodesByType(), SelectSingleNodeByType().
- New indexer [equivalent to
SelectSingleChildNode()].
- Some optimization changes.
- RtfTree class:
- New methods:
ToStringEx(), SaveRtf(), GetColorTable() y GetFontTable() y GetInfoGroup()
- Some optimization changes.
- Some bugs fixed.
- RtfNodeCollection class:
- New methods:
IndexOf(), AddRange()
- RtfLex class:
parseText() now ignores new line, tabs and null characters.
- Some optimization changes.
- 2005/08/13 - v0.1
| You must Sign In to use this message board. |
|
| | Msgs 1 to 25 of 39 (Total in Forum: 39) (Refresh) | FirstPrevNext |
|
|
 |
|
|
Hi, is it possible to merge rtf documents to one document with this library without losing styles, tables etc. from the original documents. An example would be nice. Thank you in advance.
Olli
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Found issue with other character sets (eg. Chinese/Japanese etc).
Sample rtf string. "{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\froman\fprq1\fcharset128 Arial Unicode MS;}{\f1\fswiss\fcharset0 Arial;}}{\*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\sb100\sa100\lang1033\f0\fs24\'82\'d0\'82\'e7\'82\'aa\'82\'c8\lang2057\f1\fs20\par}";
where '82\'d0\'82\'e7\'82\'aa\'82\'c8 is the text. But this is identified as Nodetype = control.
Thanks Sathish
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
 |
|
|
Hi, I would like find and replace text in rtf document, but I don't known how. Could you help me? RtfTree tree = new RtfTree(); tree.LoadRtfFile(sourceFileName); // //Here, I need find and replace string (for example Find: |<CustomerName>| a Replace: Intel) // tree.SaveRtf(destinationFileName);
Lada
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi, I have been looking at the help file and for the life of me I can't see how to convert rtf to html. Can someone give me a vb code example. Thanks
PQSIK
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
NRtfTree is NOT a library to convert RTF to HTML. NRtfTree can help you to proccess RTF documents in a general way, so you won't find any reference to HTML conversion in help files.
However, I have included an example of use of NRtfTree in the zip (Rtf to HTML conversion), but it's ONLY an example, and it's a very basic application. Have a look at the code of TraductorRtf.cs
-- modified at 15:17 Monday 26th November, 2007
|
| Sign In·View Thread·PermaLink | 2.00/5 (1 vote) |
|
|
|
 |
|
|
HI. Good work thanks for it.
I have one note:
In RtfLex class, function parse Text uses this contruction
StringBuilder Texto = new StringBuilder(((char)c).ToString(),3000000);
in my case it throws "Out of memory exception". I changed it to
StringBuilder Texto = new StringBuilder(((char)c).ToString());
all works fine and speed of parsing increased in 6 times
sorry for bad English
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
Hi Orsol,
your modification is ok:
StringBuilder Texto = new StringBuilder(((char)c).ToString());
It will be corrected in final release.
Thanks.
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
|
That's not enough. Remove the separate StringBuilder from each token (plus the key and parameter handling ones), put one into the lexer itself, initialize it once when the lexer starts, and reuse the same StringBuilder every time you start a new token by simply setting its Length to zero. Processing speed will be more than a hundred times faster (not an exaggeration, I measured it). Also, you should rewrite the lexer because it is far from optimal (you Peek and Read the same characters repeatedly, this is not required, a lexer like this works all right with one character lookahead, consult a compiler textbook for details). Also, change the token class into a struct, that way you won't need to instantiate one for each token, this will give another boost of speed, about a factor of two or three.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
Hi The library is very good. i hv used it to convert a rtf file to html, but the table in rtf file not get parsed. so the tables in my rtf files appear as plain text in generated html
Ananta
-- modified at 7:26 Friday 25th May, 2007
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hi I was wondering if you have any plan to port to C/C++. I saw your plan for Java. I am ignorant about C#, is it hard to translate to C++?
alam
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
Hi Salvador, the fonttable in RTF document isn't accesed by index (as colortable), but by name. So your tFuentes[] doesn't work in many case I've tested with Word2003 RtfFile.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
It looks like parseRtfTree() method missed some code:
***
switch (tok.Type) { case RtfTokenType.GroupStart: newNode = new RtfTreeNode(RtfNodeType.Group); curNode.AppendChild(newNode); curNode = newNode; level++; break; case RtfTokenType.GroupEnd: curNode = curNode.ParentNode; level--; break; case RtfTokenType.Keyword: ????? case RtfTokenType.Control: ????? case RtfTokenType.Text: newNode = new RtfTreeNode(tok); curNode.AppendChild(newNode); break; default: res = -1; break; }
***
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
The code is correct. This is a C# feature (and C++, and Java...). In a 'switch' structure you can group several 'case' statements together to execute the same block of code in the three 'cases'. Example:
case RtfTokenType.Keyword: case RtfTokenType.Control: case RtfTokenType.Text: newNode = new RtfTreeNode(tok); curNode.AppendChild(newNode); break;
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
When there is an image in the doc, I can see "{\pict\wmetafile8\picw7303\pich2910\picwgoal4140\pichgoal1650 (...image data...)" and am able to pick up the \pict RtfControl, and the RtfText usng the SAX model.
I just wonder if anyone has a clue about the format of the image data. I did a test with a gif file of 8558 bytes. Then cut and paste it into a Rtf text box, the image data blows up to 63048 bytes. It is a wmetafile 8 format (as indicated inside the RtfGroup) but I can't figure out how to manipulate this text, e.g. format conversion etc.
tsy
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
Hello tmak,
you can check out a new version of NRtfTree library on my web page:
www50.brinkster.com/sgolivernet[^]
There is a new class named ImageNode, which parses \pict keywords and saves the contents to an image file.
Sgoliver.
-- modified at 15:13 Sunday 28th May, 2006
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
I haven't done extensive testing of the HTML translator but it seems to be well done.
However, it doesn't appear to parse URLs. Would be nice if it could do that.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
|
HTML generation is not so good. First bug that I found - that alignment of text is not processed at all 
|
| Sign In·View Thread·PermaLink | 1.00/5 (1 vote) |
|
|
|
 |
|
|
General News Question Answer Joke Rant Admin
|