Building table of contents directly out of the CHM file and using a class
library internal user control to display it.
Building the help keyword index from text-based hhk-index files or internally
stored binary index formats. The class library also contains a user control for
displaying the keyword index.
Implementing a full-text search engine which uses the CHMs' internally stored
full-text index. As above, the class library contains a user control for easy
full-text search integration.
Support for CHMs compiled with internaltional languages like Russian, Hebrew,
etc.
Contents
-
Contents
-
Introduction
-
Reading CHM-Files
-
Files of the HtmlHelp system
-
The internal "file system" and its files
-
Using the class library
-
Useful links
-
Conclusion
-
History
2. Introduction
First of all, sorry for my bad English. Its not my native language :)
The article is about a class library for reading CHM (Microsoft compiled HTML
help 1.0/1.1) files. With the use of this library, you can easily integrate a
help system in your application without using the Microsoft UI. The demo
project will teach you, how to embed the Microsoft Web-browser control and how
to interact with the library.
I've created this library during my last project where I had to create a Windows
application with a fully integrated help system (help browser window embedded
into the application, table of contents, index and search panes like in Visual
Studio). Since the default help providing tools which are shipped with VS.NET
only work with the Microsoft UI (default Microsoft HTMLHelp viewer), I had to
find some other way how to serve a commonly known help system to the users of
my application. Searching the net was kind of frustrating, because there was no
managed library or sample code available for handling CHM files and their
contents. But I've found other resources and articles which made it possible
for me to implement my own managed library (see useful links at the end of this
article).
Note: The class libary is not performance optimized. I'm using a lot of
.NET classes which are not very efficient but easy to use (ArrayList
,
HashTable
, HybridDictionary
, RegEx
,
...). You shouldn't get in troubles with small help files (small in the manner
of table of contents and/or index size).
3. Reading CHM-Files
Basically, a CHM file contains its own file system. You can handle read/write
streaming using the
IStorage interface, which supports the creation and management of
structured storage objects. I won't go deeper into the usage of
IStorage and the wrapper creation because there is already an article
called Decompiling CHM
(help) files with C# available. Have a look at it if you need more
information about
IStorage and how to use it.
4. Files of the HtmlHelp system
In some cases, especially in bigger help systems, CHM is not the only file
extension we are interested in. If your HtmlHelp file tends to become too big,
the
HTML Help Workshop (usually your help creation tool) offers
possibilities to split the help system into multiple files with different kinds
of contents which mainly are:
file extension |
file contents |
CHM: |
the main help file contains all help content files
(HTML files, images, etc.). This file can contain all the contents of the other
files. |
CHI: |
This file contains internal system files. The CHM
help system contains system files (e.g. a string table file, an URL table file,
...). Those files may be stored separate in a file with the extension CHI (e.g.
the help system contains a table of contents in binary format which needs a lot
of space). |
CHQ: |
This file contains the full-text search index of the
help system. If this file doesn't exist, the full-text search index can be
found in the CHM-file or full-text searching is disabled. |
CHW: |
This file contains the help index (see index pane of
your HtmlHelp-viewer) in binary format. If this file doesn't exist, the
binary index can be found in the CHM-file if enabled. The help system allows a
second format for storing the index beside the binary one. A small help index
may also be stored in a sitemap-format in a file with the extension .hhk.
Such a text-based index will always be stored as internal content file in the
corresponding CHM-file storage.
The help index always consists of two different type of links:
|
All these different files contain their own "file system" which can be
read/written using the
IStorage interface.
5.The internal "file system" and its files
As I already mentioned, the internal "file system" contains content files and
system files. Point of our interest are the system files. I'll only talk about
the necessary system files, because there are some system files which the
library doesn't decode during the data extraction (no interesting content).
The following list of internal files gives a short overview about their names,
formats, where they can be found and about their contents. The class library
implements objects with similar names as the system files to make it more
transparent. These classes are responsible for decoding the binary files or
parsing text-based contents.
For a more detailed description on the different files, their contents and how
to decode them, see
Pabs unofficial HTML help specification.
So here we go:
5.1 The #SYSTEM file
Internal file name: #SYSTEM
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMSystem
This file contains the main information about the help system such as: name of
the sitemap contents file, name of the sitemap index file, the default help
topic, a flag if full-text searching is supported, flags if the help system
contains ALinks and/or KLinks, flags if the system has a binary table of
contents and/or a binary index, the compiler version, and many more. All these
information are stored in binary format and must be decoded directly from a
binary stream.
5.2 The #IDXHDR file
Internal file name: #IDXHDR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMIdxhdr
This file mainly contains links (offsets) to the #STRINGS file
such as: offset to the frame name, offset to the window name, offset to the
image list, offset to merged files, number of topic nodes including the
contents and index files, and some other flags.
This file has a fixed sizes of 4096 bytes and must be read binary.
5.3 The #STRINGS file
Internal file name: #STRINGS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMStrings
This file is a list of ANSI/UTF-8 NT (NIL terminator) strings. It contains all
topic names, window names and other strings. The very first entry is just a NIL
character allowing the help system to specify a zero offset and get a valid
string. The internal strings are sliced up into blocks of 4096 bytes length.
5.4 The #TOCIDX file
Internal file name: #TOCIDX
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTocidx
This file only exists in files with a non-empty contents file, binary table of
contents = true and compatibility = 1.1. It contains the binary table of
contents for the help file in a tree format.
5.5 The #TOPICS file
Internal file name: #TOPICS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTopics, HtmlHelp.CHMDecoding.TopicEntry
This file contains information on the topics present. It mainly stores offsets
into the #TOCIDX (if binary table of contents is enabled),
the #STRINGS and the #URLTBL files.
5.6 The #URLSTR file
Internal file name: #URLSTR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrlstr
This file contains URL strings and frame names. The URL string is always
relative to the storage root.
5.7 The #URLTBL file
Internal file name: #URLTBL
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrltable, UrlTableEntry
This file contains a URL table mapping topics to URLs. It mainly stores offsets
into the #URLSTR file and an index into the #TOPICS
file.
5.8 The $FIftiMain file
Internal file name: $FIftiMain
Format: binary
Found in: CHM, CHQ
Library class: HtmlHelp.CHMDecoding.FullTextEngine
This file stores information for the full-text search. So you do not have to
search all content files and topics for words and phrases by your own. This
speeds up searching considerably, since the index in this file contains data on
which word occurs in which files and at which locations.
The file starts with a header. This is followed by index nodes, leaf nodes and
word location codes (WCLs). The index and leaf nodes are a fixed size (set in
the header) and the WCL entries are variable size (set in the leaf nodes).
The class library implements a FullTextEngine
class which
implements the following algorithm for searching:
Read the header, seek to the root index node, search the root index node for a
word greater or equal to the desired, descend to the next index level, repeat
the previous two steps as many times as the tree is deep, then search the
resulting leaf node until the desired word is found, read the correct part of
the WCLs for that leaf node and extract the topic numbers for that word.
See
Pabs unofficial HTML help specification for detailed information about
the full-text index file.
5.9 The $WWAssociativeLinks\BTree and $WWKeywordLinks\BTree file
Internal file name: BTree
(located in sub-storages $WWAssociativeLinks and/or $WWKeywordLinks)
Format: binary
Found in: CHM, CHI, CHW
Library class: HtmlHelp.CHMDecoding.CHMBtree
This file stores the binary index of the help system. Depending on the
sub-storage, it contains the ALinks or KLinks of the help system. The file
format is the same for both index types. The file contains two different types
of entries (besides a header): Listing blocks and Index blocks. Decoding the
listing blocks forms an index tree where sub-keywords are ", " separated (e.g.
main item keyword "Dialog", sub items keywords "Dialog, About" or "Dialog, Find
and Replace"). Each listing block has at least one Index block entry. More than
one index block entry means this keyword can be found in multiple topics. "See
Also" keywords are also stored in this file.
5.10 The Table of Contents file
Internal file name: specified in #SYSTEM
(if not available, search for "Table of contents.hhc" or "<chmname>.hhc" in
the storage's content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHCParser
This file contains the text-based table of contents of the help system. The
library class uses regular expression parsing to build the table of contents
tree.
A sample HHC file:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft� HTML Help Workshop 4.1">
<!---->
</HEAD><BODY>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Introduction to GraphEdit">
<param name="Local" value="graphedit_help.htm">
</OBJECT>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Building Filter Graphs">
</OBJECT>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Build a File Playback Graph">
<param name="Local" value="build_graph.htm">
</OBJECT>
</UL>
</UL>
</BODY></HTML>
5.11 The Index file
Internal file name: specified in #SYSTEM
(if not available search for "Index.hhk" or "<chmname>.hhk" in the storage's
content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHKParser
This file contains the text-based index of the help system. The format is the
same as in HHC files except that there is only one level allowed in the site
map and the first Name entry specifies the keyword which is followed by Name,
Local pairs.
5.12 Information types and categories
Information types and categories are only supported for CHMs using a text-based
TOC or Index (by HtmlHelp Workshop). You can define information types in
HtmlHelp Workshop and assign them to table of contents nodes and index entries.
This allows a viewer to filter the contents which are displayed to the user.
e.g. you can define the information types "SDK Reference", "FAQ", "HOWTOs" etc.
and the viewer can filter help contents depending on the user's selection. You
can define categories and assign one or more information types to this
category. e.g. define the categories "Beginner", "Intermediate", "Advanced",
assign the previously created information types to the categories and the
viewer can filter help contents depending on the user's skill level.
Information types and categories are stored in the .hhc and .hhk files.
For an information how the information types and categories are stored in
HHC/HHK files see
Table 5.56/5.59 at Pabs unofficial HTML help specification.
6. Using the class library
About 90% of the HtmlHelp class library classes are marked as internal
,
so you only have a little effort to get started.
6.1 HtmlHelp library/viewer UML
(click to enlarge)
HtmlHelp
namespace UML
This is the main namespace for you to work with. It contains the main class for
all operations which is HtmlHelp.HtmlHelpSystem
.
Use this class to load files, merge files, access the table of contents, access
the index and perform full-text searches.
(click to enlarge)
This UML diagram visualizes the HtmlHelp Viewer demo application and how it uses
the library.
In the center of this diagram, you can see the Viewer
class. This
class represents the main Viewer
form class.
It instantiates the three library internal user controls (HtmlHelp.UIComponents.helpIndex
,
HtmlHelp.UIComponents.helpSearch
, HtmlHelp.UIComponents.TocTree
)
and the main class HtmlHelp.HtmlHelpSystem
.
6.2 The data dumping
I've introduced this feature in version 0.3. The problem of the library without
this feature is, that it takes a huge amount of time to open CHM files with
very large text-based table of content files. The reason for that
is regular expression parsing of big .hhc files.
Using the dumping feature will save about 90%+ of the loading time in such
scenarios. After the first time you load such a CHM the internal CHMFile
class will create a data dump depending on an instance of the DumpingInfo
class you have provided to the OpenFile()
or MergeFile()
methods.
If you load the same file a second time, the data dump will be used to load some
of the CHM data (depends on your preferences).
Using the DumpingInfo
class you can specify:
-
The output/input directory of the dump file,
-
The compression level of the dump file (uses
ic#code's zip library),
-
And what kind of data should be dumped using the flags:
-
DumpingFlags.DumpTextTOC ... text based tocs should be dumped
-
DumpingFlags.DumpBinaryTOC ... binary tocs should be dumped
-
DumpingFlags.DumpTextIndex ... text based index should be dumped
-
DumpingFlags.DumpBinaryIndex ... binary index should be dumped
-
DumpingFlage.DumpStrings ... the contents of the #STRINGS file should be dumped
-
DumpingFlags.DumpUrlStr ... the contents of the #URLSTR file should be dumped
-
DumpingFlags.DumpUrlTbl ... the contents of the #URLTBL file should be dumped
-
DumpingFlags.DumpTopics ... the contents of the #TOPICS file should be dumped
-
DumpingFlags.DumpFullText ... the full-text index should be dumped
See the following code snippets of this chapter.
To see how efficient the dumping feature is, I've done a test with the following
results:
Test done on: Intel Xeon 3.06GHz HT, 1GB Ram, WinXP, SCSI-HDs
DumpingFlags set: DumpingFlags.DumpBinaryTOC |
DumpingFlags.DumpTextTOC | DumpingFlags.DumpTextIndex |
DumpingFlags.DumpBinaryIndex | DumpingFlags.DumpUrlStr |
DumpingFlags.DumpStrings
Dump comrpession: DumpCompression.Medium
DirectX9 SDK CHM (binary index and binary TOC):
Read time without dump: --- HtmlHelp file read in 00:00:02.1874784
Write time of dump data*: --- Dump written in 00:00:01.0937360 (dump
file size: ~780KB)
Read time with dump: --- HtmlHelp file read in 00:00:00.7499904 (dump
file size: ~780KB)
Net read time of dump: --- Dump read in 00:00:00.5781176 (dump file
size: ~780KB)
CHM with a ~900KB sitemap TOC (binary index and text-based TOC):
Read time without dump: --- HtmlHelp file read in 00:00:05.7811760 (slow
RegEx parsing :( )
Write time of dump data*: --- Dump written in 00:00:00.5156184 (dump
file size: ~300KB)
Read time with dump: --- HtmlHelp file read in 00:00:00.3437456 (dump
file size: ~300KB)
Net read time of dump: --- Dump read in 00:00:00.2187472 (dump file
size: ~300KB)
* the write time of the dump is not included in the "Read time without dump"
timespan.
I think the two examples above shows how the usage of the data dumping can speed
up the loading process.
Another pro of using dump files is, that you will save a few MBs of memory,
because only the necessary fields are stored and loaded from the dump file.
Also the initial size of strings is known when reading the dump, so the .NET
Framework can instantiate the string instances with an initial size.
6.3 Initializing the library
The main class for using this library is HtmlHelp.HtmlHelpSystem
.
Use this library to open or merge CHM files, get the table of contents or index
etc.
First of all, you need to create an instance of the HtmlHelp.HtmlHelpSystem
class. The provided sample viewer application instantiates the class in the
constructor of the main form and saves the instance in a class variable (see Viewer.cs
line 110 (contructor)).
private string LM_Key = @"Software\Klaus Weisser\HtmlHelpViewer\";
HtmlHelpSystem _reader = null;
DumpingInfo _dmpInfo=null;
InfoTypeCategoryFilter _filter = new InfoTypeCategoryFilter();
string _prefDumpOutput="";
DumpCompression _prefDumpCompression = DumpCompression.Medium;
DumpingFlags _prefDumpFlags = DumpingFlags.DumpBinaryTOC |
DumpingFlags.DumpTextTOC |
DumpingFlags.DumpTextIndex |
DumpingFlags.DumpBinaryIndex |
DumpingFlags.DumpUrlStr | DumpingFlags.DumpStrings;
string _prefURLPrefix = "mk:@MSITStore:";
bool _prefUseHH2TreePics = false;
public Viewer()
{
_reader = new HtmlHelpSystem();
HtmlHelpSystem.UrlPrefix = "mk:@MSITStore:";
string sTemp = System.Environment.GetEnvironmentVariable("TEMP");
if(sTemp.Length <= 0)
sTemp = System.Environment.GetEnvironmentVariable("TMP");
_prefDumpOutput = sTemp;
_dmpInfo =
new DumpingInfo(DumpingFlags.DumpBinaryTOC |
DumpingFlags.DumpTextTOC |
DumpingFlags.DumpTextIndex | DumpingFlags.DumpBinaryIndex |
DumpingFlags.DumpUrlStr | DumpingFlags.DumpStrings,
sTemp, DumpCompression.Medium);
LoadRegistryPreferences();
HtmlHelpSystem.UrlPrefix = _prefURLPrefix;
HtmlHelpSystem.UseHH2TreePics = _prefUseHH2TreePics;
InitializeComponent();
}
private void LoadRegistryPreferences()
{
RegistryKey regKey = Registry.LocalMachine.CreateSubKey(LM_Key);
bool bEnable = bool.Parse(regKey.GetValue("EnableDumping",true).ToString());
_prefDumpOutput = (string) regKey.GetValue("DumpOutputDir",_prefDumpOutput);
_prefDumpCompression = (DumpCompression)
((int)regKey.GetValue("CompressionLevel", _prefDumpCompression));
_prefDumpFlags = (DumpingFlags) ((int)regKey.GetValue("DumpingFlags",
_prefDumpFlags));
if(bEnable)
_dmpInfo = new DumpingInfo(_prefDumpFlags, _prefDumpOutput,
_prefDumpCompression);
else
_dmpInfo = null;
_prefURLPrefix = (string) regKey.GetValue("ITSUrlPrefix", _prefURLPrefix);
_prefUseHH2TreePics = bool.Parse(regKey.GetValue("UseHH2TreePics",
_prefUseHH2TreePics).ToString());
}
6.4 Open a CHM file
To open a file, use the following lines of code (see Viewer.cs line
1088):
tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();
_reader.OpenFile( openFileDialog1.FileName, _dmpInfo );
tocTree1.Enabled = _reader.HasTableOfContents;
helpIndex1.Enabled = _reader.HasIndex;
helpSearch2.Enabled = _reader.FullTextSearch;
tocTree1.BuildTOC( _reader.TableOfContents, _filter );
if( _reader.HasKLinks )
helpIndex1.BuildIndex( _reader.Index, IndexType.KeywordLink, _filter );
else if( _reader.HasALinks )
helpIndex1.BuildIndex( _reader.Index, IndexType.AssiciativeLinks, _filter);
NavigateBrowser( _reader.DefaultTopic );
this.Text = _reader.FileList[0].FileInfo.HelpWindowTitle +
" - HtmlHelp - Viewer";
miCustomize.Enabled = ( _reader.HasInformationTypes || _reader.HasCategories);
GC.Collect();
The variables _tocTree1
, helpIndex1
and helpSearch2
are instances of the user controls provided in the class library. So with the
line _reader.OpenFile( openFileDialog1.FileName, _filter )
, all
the internal file decoding will be done. If you are running your application in
Debug mode, watch the Output window of VS.NET to see what the library is
currently doing.
After this few lines of code, and some additional UI updates, the HtmlHelp.HtmlHelpSystem
class has loaded the table of contents, index, full-text search and all the
other internal system files.
6.5 Merging additional CHM files
Once you have opened a CHM file, the HtmlHelp.HtmlHelpSystem
class
offers the feature to merge additional files. This will result in one single
table of contents tree and one index tree (see Viewer.cs line 1168).
tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();
_reader.MergeFile( openFileDialog1.FileName, _dmpInfo );
tocTree1.Enabled = _reader.HasTableOfContents;
helpIndex1.Enabled = _reader.HasIndex;
helpSearch2.Enabled = _reader.FullTextSearch;
tocTree1.BuildTOC( _reader.TableOfContents, _filter );
if( _reader.HasKLinks )
helpIndex1.BuildIndex( _reader.Index, IndexType.KeywordLink, _filter );
else if( _reader.HasALinks )
helpIndex1.BuildIndex( _reader.Index, IndexType.AssiciativeLinks, _filter);
NavigateBrowser( _reader.DefaultTopic );
miCustomize.Enabled = ( _reader.HasInformationTypes || _reader.HasCategories);
GC.Collect();
Notes: Don't forget to rebuild the TOC and index after a merge action, to
get the new entries into the user control's UI. If you use the the _reader.OpenFile()
method, the HtmlHelp.HtmlHelpSystem
internal data will be deleted
and get recreated using the new file. So calling _reader.OpenFile()
will always reset the help system.
6.6 Accessing the table of contents programmatically
If you use the provided user controls of the library, you don't have to care
about the table of contents structure since the tree view will be filled by the
control itself. In some cases, e.g. implementing your own HelpProvider for the
class library, you may want to access the table of contents tree
programmatically.
The table of contents tree can be accessed using the HtmlHelp.HtmlHelpSystem
's
property TableOfContents
. The property returns an instance of the
class HtmlHelp.TableOfContents
. This class offers some search
methods and a property to access the tree itself. Use the property TOC
of the HtmlHelp.TableOfContents
class to receive an ArrayList
of HtmlHelp.TOCItem
instances. Each of these items store the topic
name, the location, the URL, and an ArrayList
of HtmlHelp.TOCItem
instances which represents the children of the item.
The following sample code demonstrates how a TreeNodeCollection
of
a TreeView
control can be filled up:
private void Main()
{
TableOfContents currentToc = _reader.TableOfContents;
tocTreeView.Nodes.Clear();
BuildTOC(currentToc.TOC, tocTreeView.Nodes);
tocTreeView.Update();
}
private void BuildTOC(ArrayList tocItems, TreeNodeCollection col)
{
foreach( TOCItem curItem in tocItems )
{
TreeNode newNode = new TreeNode( curItem.Name,
curItem.ImageIndex, curItem.ImageIndex );
newNode.Tag = curItem;
if(curItem.Children.Count > 0)
{
BuildTOC(curItem.Children, newNode.Nodes);
}
col.Add(newNode);
}
}
6.7 Accessing the index programmatically
The index can be accessed using the HtmlHelp.HtmlHelpSystem
's
property Index
. The property returns an instance of the class HtmlHelp.Index
.
This index class stores two ArrayList
s with entries of type HtmlHelp.IndexItem
.
One is representing the KLinks
and one the ALinks
.
The index also builds a tree but different than the table of contents, because
the keyword text contains always the parent keyword ( ", " separated list).
Each HtmlHelp.IndexItem
contains an ArrayList
with
the associated help topics. This ArrayList
can be accessed using
the property Topics
. Each item is of type HtmlHelp.IndexTopic
.
If an HtmlHelp.IndexItem
contains more than one HtmlHelp.IndexTopic
entry, the user has to choose the topic to display (see 2nd screenshot).
The following sample code demonstrates how to fill a ListBox
control
with the KeywordLinks
index:
private void Main()
{
Index currentIndex = _reader.Index;
BuildIndex( currentIndex, IndexType.KeywordLinks);
}
public void BuildIndex(Index index, IndexType typeOfIndex)
{
ArrayList _arrIndex = null;
switch(typeOfIndex)
{
case IndexType.AssiciativeLinks: _arrIndex = index.ALinks; break;
case IndexType.KeywordLink: _arrIndex = index.KLinks; break;
}
lbIndex.Items.Clear();
_arrIndex.Sort();
foreach(IndexItem curItem in _arrIndex)
{
lbIndex.Items.Add( GetIndent(curItem.Indent) + curItem.KeyWord );
}
}
6.8 Accessing file contents programmatically
With version 0.4 of the library, I've introduced two new properties called
ContentFile
and FileContents
in the following classes:
HtmlHelp.TOCItem
, HtmlHelp.IndexTopic
, HtmlHelp.CHMDecoding.TopicEntry
The property FileContents
directly returns the file content as
string. It automatically applies the correct file encoding.
IndexTopic !! TopicEntry curEntry = <any instance of TopicEntry>;
string sContent = curEntry.FileContents;
Using the property FileContents
doesn't allow you to implement
error tracking. If the content file can't be opened or read, you simply get an
empty string without any exceptions which is the same as reading an empty file
!
The property ContentFile
returns an instance of HtmlHelp.Storage.FileObject
or null
if not accessible.
You can use this property to access the native file contents of the associated
content files.
Note: If you read native contents from a text file like htm,
hhc, hhk etc. you have to make sure to use the appropriate text encoding !
Always make sure you're closing the returned FileObject instance if not needed
any longer.
The following code snippet will show you, how to read the text-content of a
topic file.
TopicEntry curEntry = <any instance of TopicEntry>;
FileObject fo = curEntry.ContentFile;
if(fo != null)
{
if(fo.CanRead)
{
byte[] fileData = new byte [fo.Length];
fo.Read(fileData, 0, (int)fo.Length);
fo.Close();
string sContent = curEntry.TextEncoding.GetString(fileData);
}
else
{
fo.Close();
MessageBox.Show("File " + curEntry.Locale + " not readable!");
}
}
else
{
MessageBox.Show("Couldn't get file object for " + curEntry.Locale);
}
This snippet doesn't implement a file-type checking. It assumes that the content
file is a text-file !
6.9 Use the class library's table of contents user control
As I've already told you, the class library has some built-in user controls for
displaying the 3 main help panes: Table of contents, Index and Search. Using
this controls allows you to integrate the HtmlHelp system with a few Drag and
Drops and some lines of code in minutes.
The user control for the table of contents is implemented in the class HtmlHelp.UIComponents.TocTree
.
You can use the VS.NET Forms designer to place the control on your
application's form and adjust its properties.
The most important method is BuildTOC( HtmlHelp.TableOfContents tocInstance )
(see 6.4). If you call this method, the internal TreeView
control will be filled with the table of contents items (see 6.6).
For interacting with your UI, the control implements an event called TocSelected
.
This will be raised whenever the user selects a new topic in the table of
contents, notifying the main UI that a new topic should be displayed.
this.tocTree1.Dock = System.Windows.Forms.DockStyle.Fill;
this.tocTree1.DockPadding.All = 2;
this.tocTree1.Location = new System.Drawing.Point(0, 0);
this.tocTree1.Name = "tocTree1";
this.tocTree1.Size = new System.Drawing.Size(292, 484);
this.tocTree1.TabIndex = 0;
this.tocTree1.TocSelected +=
new TocSelectedEventHandler(this.tocTree1_TocSelected);
private void tocTree1_TocSelected(object sender, TocEventArgs e)
{
if( e.Item.Local.Length > 0)
{
NavigateBrowser(e.Item.Url);
}
}
6.10 Use the class library's index user control
The user control for displaying the index pane is located in the class HtmlHelp.UIComponents.helpIndex
.
You can use the VS.NET Forms designer to place the control and adjust its
properties.
The most important method is BuildIndex( HtmlHelp.Index indexInstance,
IndexType typeOfIndey )
(see 6.4). If you call
this method, the internal ListBox
will be filled up with the index
tree (see 6.7).
For interacting with your UI, this control implements two events. The first one
is named IndexSelected
. This event is raised if the user selects a
topic related to an index entry, notifying the main UI that a new topic should
be displayed.
The second event is named TopicsFound
. This event is raised if the
user selects an index entry with multiple related topics. If you do not handle
this event, the class library will pop up a built-in dialog (see second
screenshot). Handle the event if you want to create a different UI than the
default one.
this.helpIndex1.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpIndex1.Location = new System.Drawing.Point(0, 0);
this.helpIndex1.Name = "helpIndex1";
this.helpIndex1.Size = new System.Drawing.Size(292, 470);
this.helpIndex1.TabIndex = 0;
this.helpIndex1.IndexSelected +=
new IndexSelectedEventHandler(this.helpIndex1_IndexSelected);
this.helpIndex1.TopicsFound +=
new TopicsFoundEventHandler(this.helpIndex1_TopicsFound);
private void helpIndex1_IndexSelected(object sender, IndexEventArgs e)
{
if(e.URL.Length > 0)
NavigateBrowser(e.URL);
}
private void helpIndex1_TopicsFound(object sender, TopicsFoundEventArgs e)
{
}
6.11 Use the class library's full-text search user control
The user control for displaying the full-text search pane can be found in the
class HtmlHelp.UIComponents.helpSearch
. As with the two other
controls, you can use VS.NET Forms designer to place the control and adjust its
properties.
The difference to the other controls is, that you do not have to initialize the
content of this control since the user has to enter a search string before
searching can be done.
For this behavior, the control offers two events. At first, the FTSearch
event will be raised notifying the main UI that the user wants to perform a
full-text search. You can access the search parameters using the event
arguments. Now, we have to initiate the full-text search by calling the PerformSearch()
method of HtmlHelp.HtmlHelpSystem
instance (variable _reader
in examples above). This method returns a DataTable
instance with
the found entries stored in DataRow
s.
The DataTable
results contain the following fields:
-
Rating - a calculated rating of the topic.
-
Title - the title of the topic
-
Locale - the locale string of the topic (virtual link of the content file in
the CHM store)
-
Location - the location of the topic (useful if searching in a merged
environment)
-
URL - the URL which can be used by the web browser control
Once you have received the results, you have to "send" them back to the user
control using the method SetResults()
.
If the user selects a topic from the search results, the event HitSelected
will be raised, notifying the main UI that a new topic should be displayed.
this.helpSearch2.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpSearch2.Location = new System.Drawing.Point(0, 0);
this.helpSearch2.Name = "helpSearch2";
this.helpSearch2.Size = new System.Drawing.Size(292, 470);
this.helpSearch2.TabIndex = 0;
this.helpSearch2.HitSelected +=
new HitSelectedEventHandler(this.helpSearch2_HitSelected);
this.helpSearch2.FTSearch +=
new FTSearchEventHandler(this.helpSearch2_FTSearch);
private void helpSearch2_FTSearch(object sender, SearchEventArgs e)
{
this.Cursor = Cursors.WaitCursor;
try
{
DataTable dtResults = _reader.PerformSearch( e.Words,
500, e.PartialWords, e.TitlesOnly);
helpSearch2.SetResults(dtResults);
}
finally
{
this.Cursor = Cursors.Arrow;
}
}
private void helpSearch2_HitSelected(object sender, HitEventArgs e)
{
if( e.URL.Length > 0)
{
NavigateBrowser(e.URL);
}
}
6.12 The HelpProviderEx component
New in version 0.3 is an extendet HelpProvider
component.
If you create an application with an integrated help system (viewer, table of
conents, index, etc.) you can use the HelpProviderEx
class for
providing standard help functionalities to your dialogs or views.
The HelpProviderEx
component should work like the standard .NET
component HelpProvider
if you are not working with the HelpHelpSystem
class and have not initialized the HelpProviderEx
with a viewer
(should because I haven't fully tested the component till now).
Since you will use HelpProviderEx
to get rid of the Microsoft
CHM-Viewer UI, you have to initialize the HelpProviderEx
class
with a viewer application.
This viewer has to implement the interface HtmlHelp.UIComponents.IHelpViewer
:
public interface IHelpViewer
{
void NavigateTo(string url);
void ShowHelp(string namespaceFilter, HelpNavigator hlpNavigator,
string keyword);
void ShowHelp(string namespaceFilter, HelpNavigator hlpNavigator,
string keyword, string url);
void ShowHelpIndex(string url);
void ShowPopup(Control parent, string text, Point location);
}
The provided sample viewer application is implementing this interface so you can
have a look there on how it can be done.
If you want to use the HelpProviderEx
component linke the
standard HelpProvider
, you have to
-
Set the
HelpNamespace
property of the HelpProviderEx
to a valid value
-
Set the
Viewer
property to null
If you want to use the HelpProviderEx
component in an integrated
help environment, you have to
-
Set the
Viewer
property to an object instance implementing the IHelpViewer
instance
(The HelpNamespace
property will then be ignored)
Since HelpProviderEx
implements the IExtenderProvider
interface
it extends your controls with additional properties:
-
HelpNamespace (only HelpProviderEx)
-
HelpKeyword (standard HelpProvider)
-
HelpNavigator (standard HelpProvider)
-
HelpString (standard HelpProvider)
-
ShowHelp (standard HelpProvider)
You can use the HelpNamespace
property in merged CHM environments
to reduce the amount of data to search (specify the name of the CHM where the
searched keyword/topic/etc. is located).
There are a few tutorials online on how to use the HelpProvider
component
so I won't go deeper here.
7. Useful links
8. Conclusion
The class library holds all the data extracted in the memory. This may speed up
some tasks like searching, but needs memory (e.g. ~10MB for DirectX 9.0 SDK
help file). Since some of the internal file formats are not completely clear
(garbage spaces, unknown values, ...), a binary merge of CHM files will be
difficult. That's the reason why I'm loading the TOC and index into memory and
merging them using internal classes.
I haven't got enough time to fully test the new introduced HelpProviderEx
component, so bug reports are welcome.
Special thanks to Nick Butler who gave alot of bug and feature
inputs for version 0.4 and helped me improving the library ;)
9. History
Version 0.4 |
2004 Aug. 06 |
-
Fixed an issue with reading hhc/hhk files containing brackets ().
-
Fixed an issue with identifying the master hhc in chm's containing multiple hhc
files. In some circumstances this requires a storage enumeration which is slow
on chm's with alot of content files.
-
Fixed an issue with reading see-also items from text-based hhk-index
files
-
Fixed an issue with "See Also" index links. The shipped index user control now
handles these special index items (IndexEventArgs now holds additional see also
information)
-
Fixed an issue with decoding urlstr file.
-
Fixed an issue with parsing text-based index files. Items now have the correct
indent and deeper levels are supported !
-
Changed accessibility of some internal CHM-decoding classes. This will allow
users of the lib to access more nativ CHM information/data structures.
-
Added the properties ContentFile and FileContents to the classes TOCItem,
IndexTopic and TopicEntry. The ContentFile property opens the associated
content file and returns an FileObject instance if succeeded. You can use this
instance to programmatically access the nativ file contents ! The FileContents
property directly reads the contents of the associated file and returns them as
a string. This property automatically applies the correct CHM encoding to
support multiple languages !
|
Version 0.3 |
2004 May 17 |
-
Fixed the number of hits displaying in the search pane. If working with merged
files, the maximum hits were applied per file not for the whole system.
-
Fixed an issue during data decoding of binary table of contents. The system
reads binary TOCs now much faster and doesn't have problems with building the
tree in deeper TOC levels.
-
Optimized index merging. Index merging is no longer member of complexity class
n�. It's now log(n) (using binary search and insertion algorythm)
-
Optimized memory usage. Especially the memmory usage of table of contents and
index items. The title and locale strings are no longer stored as strings in
every item (only in CHMs with binary toc and/or index). The item just holds
offsets into the loaded system file data. After opening/merging a file to the
system and updating the UI you should call GC.Collect() to force a garbage
collection (this will free a view MB depending on the amount of CHM data).
-
Added imagelist of standard CHM-Viewer
-
Added the method ClearContents() to the three internal user controls (toc-tree,
index and search) which allows the user to reset the control's contents if
opening new files.
-
Added support for compressed dumping of data (speeds up CHM reloading).
-
Added support for CHM-Merged file list (see #IDXHDR file). The CHM with the
master TOC MUST be opened first ! The MS standard CHM-viewer can also create
the TOC correctly if you open one of the slave CHMs. This is not supported by
the library (TOC tree will contain all topics, but not in a correct tree) !
-
Added support for merged TOC-Items in hhc files ("Merge" parameter)
-
Added support for information types and categories.
-
Added an extended HelpProvider component for interacting with the
HtmlHelpSystem
-
Added native HelpToolTipWindow for adding Popup-Help support for applications
using HelpProviderEx class.
|
Version 0.2 |
2004 April 25 |
-
Fixed an issue which prevents the classlibrary from loading the correct
table of contents file (text based hhc file). This happens in CHMs where the Contents
file option of the HHP-File (htmlhelp project) is not set and the
HHC-File name is not the default one (<chmfile>.hhc or table of
contents.hhc). Same for text-based index files. (found in
MSDN-Magazin CHMs)
-
Added internaltional support. The library detetcs the LCID (language code id)
and the used codepage and adjusts the encoding used for converting binary
arrays to strings.
-
Fixed the fulltext-search algorythm. Still not 100% working for international
languages (didn't find the error till now)
-
Added a classlibrary CHM to the source zip.
-
Added a ChmFileInfo class for easily getting system information of CHM-Files
(see loaded files in about dialog of the example viewer).
-
Fixed an URL issue which prevents the IE (newest IE patch) to display
linked content files correctly.
|
Version 0.1 |
2004 April 20 |
Article release |