|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
Building table of contents directly out of the CHM file and using a class library internal user control to display it.
Building the help keyword index from text-based hhk-index files or internally stored binary index formats. The class library also contains a user control for displaying the keyword index.
Implementing a full-text search engine which uses the CHMs' internally stored full-text index. As above, the class library contains a user control for easy full-text search integration.
Support for CHMs compiled with internaltional languages like Russian, Hebrew, etc.
Contents
2. IntroductionFirst of all, sorry for my bad English. Its not my native language :) The article is about a class library for reading CHM (Microsoft compiled HTML help 1.0/1.1) files. With the use of this library, you can easily integrate a help system in your application without using the Microsoft UI. The demo project will teach you, how to embed the Microsoft Web-browser control and how to interact with the library. I've created this library during my last project where I had to create a Windows application with a fully integrated help system (help browser window embedded into the application, table of contents, index and search panes like in Visual Studio). Since the default help providing tools which are shipped with VS.NET only work with the Microsoft UI (default Microsoft HTMLHelp viewer), I had to find some other way how to serve a commonly known help system to the users of my application. Searching the net was kind of frustrating, because there was no managed library or sample code available for handling CHM files and their contents. But I've found other resources and articles which made it possible for me to implement my own managed library (see useful links at the end of this article). Note: The class libary is not performance optimized. I'm using a lot of
.NET classes which are not very efficient but easy to use ( 3. Reading CHM-FilesBasically, a CHM file contains its own file system. You can handle read/write streaming using the IStorage interface, which supports the creation and management of structured storage objects. I won't go deeper into the usage of IStorage and the wrapper creation because there is already an article called Decompiling CHM (help) files with C# available. Have a look at it if you need more information about IStorage and how to use it. 4. Files of the HtmlHelp systemIn some cases, especially in bigger help systems, CHM is not the only file extension we are interested in. If your HtmlHelp file tends to become too big, the HTML Help Workshop (usually your help creation tool) offers possibilities to split the help system into multiple files with different kinds of contents which mainly are:
All these different files contain their own "file system" which can be read/written using the IStorage interface. 5.The internal "file system" and its filesAs I already mentioned, the internal "file system" contains content files and system files. Point of our interest are the system files. I'll only talk about the necessary system files, because there are some system files which the library doesn't decode during the data extraction (no interesting content). The following list of internal files gives a short overview about their names, formats, where they can be found and about their contents. The class library implements objects with similar names as the system files to make it more transparent. These classes are responsible for decoding the binary files or parsing text-based contents. For a more detailed description on the different files, their contents and how to decode them, see Pabs unofficial HTML help specification. So here we go: 5.1 The #SYSTEM fileInternal file name: #SYSTEM
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMSystem
This file contains the main information about the help system such as: name of the sitemap contents file, name of the sitemap index file, the default help topic, a flag if full-text searching is supported, flags if the help system contains ALinks and/or KLinks, flags if the system has a binary table of contents and/or a binary index, the compiler version, and many more. All these information are stored in binary format and must be decoded directly from a binary stream. 5.2 The #IDXHDR fileInternal file name: #IDXHDR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMIdxhdr
This file mainly contains links (offsets) to the #STRINGS file such as: offset to the frame name, offset to the window name, offset to the image list, offset to merged files, number of topic nodes including the contents and index files, and some other flags. This file has a fixed sizes of 4096 bytes and must be read binary. 5.3 The #STRINGS fileInternal file name: #STRINGS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMStrings
This file is a list of ANSI/UTF-8 NT (NIL terminator) strings. It contains all topic names, window names and other strings. The very first entry is just a NIL character allowing the help system to specify a zero offset and get a valid string. The internal strings are sliced up into blocks of 4096 bytes length. 5.4 The #TOCIDX fileInternal file name: #TOCIDX
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTocidx
This file only exists in files with a non-empty contents file, binary table of contents = true and compatibility = 1.1. It contains the binary table of contents for the help file in a tree format. 5.5 The #TOPICS fileInternal file name: #TOPICS
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMTopics, HtmlHelp.CHMDecoding.TopicEntry
This file contains information on the topics present. It mainly stores offsets into the #TOCIDX (if binary table of contents is enabled), the #STRINGS and the #URLTBL files. 5.6 The #URLSTR fileInternal file name: #URLSTR
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrlstr
This file contains URL strings and frame names. The URL string is always relative to the storage root. 5.7 The #URLTBL fileInternal file name: #URLTBL
Format: binary
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.CHMUrltable, UrlTableEntry
This file contains a URL table mapping topics to URLs. It mainly stores offsets into the #URLSTR file and an index into the #TOPICS file. 5.8 The $FIftiMain fileInternal file name: $FIftiMain
Format: binary
Found in: CHM, CHQ
Library class: HtmlHelp.CHMDecoding.FullTextEngine
This file stores information for the full-text search. So you do not have to search all content files and topics for words and phrases by your own. This speeds up searching considerably, since the index in this file contains data on which word occurs in which files and at which locations. The file starts with a header. This is followed by index nodes, leaf nodes and word location codes (WCLs). The index and leaf nodes are a fixed size (set in the header) and the WCL entries are variable size (set in the leaf nodes). The class library implements a Read the header, seek to the root index node, search the root index node for a word greater or equal to the desired, descend to the next index level, repeat the previous two steps as many times as the tree is deep, then search the resulting leaf node until the desired word is found, read the correct part of the WCLs for that leaf node and extract the topic numbers for that word. See Pabs unofficial HTML help specification for detailed information about the full-text index file. 5.9 The $WWAssociativeLinks\BTree and $WWKeywordLinks\BTree fileInternal file name: BTree
(located in sub-storages $WWAssociativeLinks and/or $WWKeywordLinks)
Format: binary
Found in: CHM, CHI, CHW
Library class: HtmlHelp.CHMDecoding.CHMBtree
This file stores the binary index of the help system. Depending on the sub-storage, it contains the ALinks or KLinks of the help system. The file format is the same for both index types. The file contains two different types of entries (besides a header): Listing blocks and Index blocks. Decoding the listing blocks forms an index tree where sub-keywords are ", " separated (e.g. main item keyword "Dialog", sub items keywords "Dialog, About" or "Dialog, Find and Replace"). Each listing block has at least one Index block entry. More than one index block entry means this keyword can be found in multiple topics. "See Also" keywords are also stored in this file. 5.10 The Table of Contents fileInternal file name: specified in #SYSTEM
(if not available, search for "Table of contents.hhc" or "<chmname>.hhc" in
the storage's content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHCParser
This file contains the text-based table of contents of the help system. The library class uses regular expression parsing to build the table of contents tree. A sample HHC file: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD><BODY>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Introduction to GraphEdit">
<param name="Local" value="graphedit_help.htm">
</OBJECT>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Building Filter Graphs">
</OBJECT>
<UL>
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Build a File Playback Graph">
<param name="Local" value="build_graph.htm">
</OBJECT>
</UL>
</UL>
</BODY></HTML>
5.11 The Index fileInternal file name: specified in #SYSTEM
(if not available search for "Index.hhk" or "<chmname>.hhk" in the storage's
content files)
Format: text-based sitemap
Found in: CHM, CHI
Library class: HtmlHelp.CHMDecoding.HHKParser
This file contains the text-based index of the help system. The format is the same as in HHC files except that there is only one level allowed in the site map and the first Name entry specifies the keyword which is followed by Name, Local pairs. 5.12 Information types and categoriesInformation types and categories are only supported for CHMs using a text-based
TOC or Index (by HtmlHelp Workshop). You can define information types in
HtmlHelp Workshop and assign them to table of contents nodes and index entries.
This allows a viewer to filter the contents which are displayed to the user.
Information types and categories are stored in the .hhc and .hhk files. For an information how the information types and categories are stored in HHC/HHK files see Table 5.56/5.59 at Pabs unofficial HTML help specification. 6. Using the class libraryAbout 90% of the HtmlHelp class library classes are marked as 6.1 HtmlHelp library/viewer UML
This is the main namespace for you to work with. It contains the main class for
all operations which is Use this class to load files, merge files, access the table of contents, access the index and perform full-text searches. This UML diagram visualizes the HtmlHelp Viewer demo application and how it uses the library. In the center of this diagram, you can see the It instantiates the three library internal user controls ( 6.2 The data dumpingI've introduced this feature in version 0.3. The problem of the library without this feature is, that it takes a huge amount of time to open CHM files with very large text-based table of content files. The reason for that is regular expression parsing of big .hhc files. Using the dumping feature will save about 90%+ of the loading time in such
scenarios. After the first time you load such a CHM the internal If you load the same file a second time, the data dump will be used to load some of the CHM data (depends on your preferences). Using the
See the following code snippets of this chapter. To see how efficient the dumping feature is, I've done a test with the following results: Test done on: Intel Xeon 3.06GHz HT, 1GB Ram, WinXP, SCSI-HDs
DirectX9 SDK CHM (binary index and binary TOC): I think the two examples above shows how the usage of the data dumping can speed
up the loading process.
6.3 Initializing the libraryThe main class for using this library is First of all, you need to create an instance of the private string LM_Key = @"Software\Klaus Weisser\HtmlHelpViewer\";
// The main HtmlHelpSystem class used for handling CHMs
HtmlHelpSystem _reader = null;
// a dumping info class managed by the viewer to specify the data-dumping
6.4 Open a CHM fileTo open a file, use the following lines of code (see Viewer.cs line 1088): // clear current items
tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();
// open the chm-file selected in the OpenFileDialog
// if _dmpInfo == null, the dumping feature will be disabled
_reader.OpenFile( openFileDialog1.FileName, _dmpInfo );
// Enable the toc-tree pane if the opened file has a table of contents
tocTree1.Enabled = _reader.HasTableOfContents;
// Enable the index pane if the opened file has an index
helpIndex1.Enabled = _reader.HasIndex;
// Enable the full-text search pane if the
// opened file supports full-text searching
helpSearch2.Enabled = _reader.FullTextSearch;
// ...
// Build the table of contents tree view in the classlibrary control
// _filter is the information-type/category filter instantiated in the
The variables After this few lines of code, and some additional UI updates, the 6.5 Merging additional CHM filesOnce you have opened a CHM file, the // clear current items
tocTree1.ClearContents();
helpIndex1.ClearContents();
helpSearch2.ClearContents();
// merge the chm file selected in the OpenFileDialog to the existing one
// in the HtmlHelpSystem class
// if _dmpInfo == null, the dumping feature will be disabled
_reader.MergeFile( openFileDialog1.FileName, _dmpInfo );
// Enable the toc-tree pane if the opened file has a table of contents
tocTree1.Enabled = _reader.HasTableOfContents;
// Enable the index pane if the opened file has an index
helpIndex1.Enabled = _reader.HasIndex;
// Enable the full-text search pane if the
// opened file supports full-text searching
helpSearch2.Enabled = _reader.FullTextSearch;
// ...
// Rebuild the table of contents tree view in the classlibrary control
// using the new merged table of contents
// _filter is the information-type/category filter instantiated in the
Notes: Don't forget to rebuild the TOC and index after a merge action, to
get the new entries into the user control's UI. If you use the the 6.6 Accessing the table of contents programmaticallyIf you use the provided user controls of the library, you don't have to care about the table of contents structure since the tree view will be filled by the control itself. In some cases, e.g. implementing your own HelpProvider for the class library, you may want to access the table of contents tree programmatically. The table of contents tree can be accessed using the The following sample code demonstrates how a // non existent method in the classlibrary
// only for demonstration
private void Main()
{
//Get the current table of contents
TableOfContents currentToc = _reader.TableOfContents;
// clear the tree nodes of an existing tree view
tocTreeView.Nodes.Clear();
// recursively build the tree nodes
BuildTOC(currentToc.TOC, tocTreeView.Nodes);
// update the control
tocTreeView.Update();
}
/// <summary>
/// Recursively builds the toc tree and fills the treeview
/// </summary>
/// <param name="tocItems">list of toc-items</param>
/// <param name="col">treenode collection of the current level</param>
private void BuildTOC(ArrayList tocItems, TreeNodeCollection col)
{
foreach( TOCItem curItem in tocItems )
{
TreeNode newNode = new TreeNode( curItem.Name,
curItem.ImageIndex, curItem.ImageIndex );
newNode.Tag = curItem;
if(curItem.Children.Count > 0)
{
BuildTOC(curItem.Children, newNode.Nodes);
}
col.Add(newNode);
}
}
6.7 Accessing the index programmaticallyThe index can be accessed using the The following sample code demonstrates how to fill a // non existent method in the classlibrary
// only for demonstration
private void Main()
{
//Get the current table of contents
Index currentIndex = _reader.Index;
// fill the listbox with the index items
BuildIndex( currentIndex, IndexType.KeywordLinks);
}
/// <summary>
/// Call this method to build the help-index and fill the internal list box
/// </summary>
/// <param name="index">Index instance extracted from the chm file(s)</param>
/// <param name="typeOfIndex">type of index to display</param>
public void BuildIndex(Index index, IndexType typeOfIndex)
{
ArrayList _arrIndex = null;
// get the ArrayList of tf the requested index type
switch(typeOfIndex)
{
case IndexType.AssiciativeLinks: _arrIndex = index.ALinks; break;
case IndexType.KeywordLink: _arrIndex = index.KLinks; break;
}
// clear the current items in the list box
lbIndex.Items.Clear();
// sort the index
_arrIndex.Sort();
foreach(IndexItem curItem in _arrIndex)
{
// Add the index entry to the listbox
lbIndex.Items.Add( GetIndent(curItem.Indent) + curItem.KeyWord );
}
}
6.8 Accessing file contents programmaticallyWith version 0.4 of the library, I've introduced two new properties called The property
// curEntry is any instance of TopicEntry, TOCItem or
IndexTopic !! TopicEntry curEntry = <any instance of TopicEntry>;
// returns the contents of the file as string
string sContent = curEntry.FileContents;
Using the property The property
Note: If you read native contents from a text file like htm,
hhc, hhk etc. you have to make sure to use the appropriate text encoding ! The following code snippet will show you, how to read the text-content of a topic file.
// curEntry is any instance of TopicEntry, TOCItem or IndexTopic !!
TopicEntry curEntry = <any instance of TopicEntry>;
// Get the FileObject instance
FileObject fo = curEntry.ContentFile;
// Check if you've got an instance
if(fo != null)
{
// Check if you can read from the file
if(fo.CanRead)
{
// read the file contents
byte[] fileData = new byte [fo.Length];
fo.Read(fileData, 0, (int)fo.Length);
// CLOSE !! the file (important!)
fo.Close();
// Get the content as string using the correct text-encoding
string sContent = curEntry.TextEncoding.GetString(fileData);
}
else
{
// if not, CLOSE! the file object
fo.Close();
MessageBox.Show("File " + curEntry.Locale + " not readable!");
}
}
else
{
// if not, the content of this file can not be accessed
MessageBox.Show("Couldn't get file object for " + curEntry.Locale);
}
This snippet doesn't implement a file-type checking. It assumes that the content file is a text-file ! 6.9 Use the class library's table of contents user controlAs I've already told you, the class library has some built-in user controls for displaying the 3 main help panes: Table of contents, Index and Search. Using this controls allows you to integrate the HtmlHelp system with a few Drag and Drops and some lines of code in minutes. The user control for the table of contents is implemented in the class The most important method is For interacting with your UI, the control implements an event called //
// tocTree1
//
this.tocTree1.Dock = System.Windows.Forms.DockStyle.Fill;
this.tocTree1.DockPadding.All = 2;
this.tocTree1.Location = new System.Drawing.Point(0, 0);
this.tocTree1.Name = "tocTree1";
this.tocTree1.Size = new System.Drawing.Size(292, 484);
this.tocTree1.TabIndex = 0;
// subscribe to the event
this.tocTree1.TocSelected +=
new TocSelectedEventHandler(this.tocTree1_TocSelected);
// ...
/// <summary>
/// Called if the user selects a new table of contents item
/// </summary>
/// <param name="sender">sender of the event</param>
/// <param name="e">event parameters</param>
private void tocTree1_TocSelected(object sender, TocEventArgs e)
{
// if the selected item contains an url
if( e.Item.Local.Length > 0)
{
// navigate to the url
NavigateBrowser(e.Item.Url);
}
}
6.10 Use the class library's index user controlThe user control for displaying the index pane is located in the class The most important method is For interacting with your UI, this control implements two events. The first one
is named The second event is named //
// helpIndex1
//
this.helpIndex1.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpIndex1.Location = new System.Drawing.Point(0, 0);
this.helpIndex1.Name = "helpIndex1";
this.helpIndex1.Size = new System.Drawing.Size(292, 470);
this.helpIndex1.TabIndex = 0;
this.helpIndex1.IndexSelected +=
new IndexSelectedEventHandler(this.helpIndex1_IndexSelected);
this.helpIndex1.TopicsFound +=
new TopicsFoundEventHandler(this.helpIndex1_TopicsFound);
// ...
/// <summary>
/// Called if the user selects an index topic
/// </summary>
/// <param name="sender">sender of the event</param>
/// <param name="e">event parameters</param>
private void helpIndex1_IndexSelected(object sender, IndexEventArgs e)
{
if(e.URL.Length > 0)
NavigateBrowser(e.URL);
}
/// <summary>
/// Called if the user selects an index with more than one related topics.
/// If you do not handle this event,
/// the HtmlHelp library will show a standard dialog.
/// </summary>
/// <param name="sender">sender of the event</param>
/// <param name="e">event parameters</param>
private void helpIndex1_TopicsFound(object sender, TopicsFoundEventArgs e)
{
// display a UI to the user and let him select one of the found topics
// you can get the list of topics found using
//
// e.Topics which is an ArrayList of HtmlHelp.IndexTopic instances
}
6.11 Use the class library's full-text search user controlThe user control for displaying the full-text search pane can be found in the
class The difference to the other controls is, that you do not have to initialize the content of this control since the user has to enter a search string before searching can be done. For this behavior, the control offers two events. At first, the The
Once you have received the results, you have to "send" them back to the user
control using the method If the user selects a topic from the search results, the event //
// helpSearch2
//
this.helpSearch2.Dock = System.Windows.Forms.DockStyle.Fill;
this.helpSearch2.Location = new System.Drawing.Point(0, 0);
this.helpSearch2.Name = "helpSearch2";
this.helpSearch2.Size = new System.Drawing.Size(292, 470);
this.helpSearch2.TabIndex = 0;
this.helpSearch2.HitSelected +=
new HitSelectedEventHandler(this.helpSearch2_HitSelected);
this.helpSearch2.FTSearch +=
new FTSearchEventHandler(this.helpSearch2_FTSearch);
// ...
/// <summary>
/// Called if the user hits the "Search" button on the full-text search pane
/// </summary>
/// <param name="sender">sender of the event</param>
/// <param name="e">event parameters</param>
private void helpSearch2_FTSearch(object sender, SearchEventArgs e)
{
// display a wait cursor
this.Cursor = Cursors.WaitCursor;
try
{
// initiate the full-text search ( 500 = maximum hits )
DataTable dtResults = _reader.PerformSearch( e.Words,
500, e.PartialWords, e.TitlesOnly);
// "send" the results back to the full-text search pane
// and display them in the listview
helpSearch2.SetResults(dtResults);
}
finally
{
// display the arrow cursor
this.Cursor = Cursors.Arrow;
}
}
/// <summary>
/// Called if the user selects an entry from the search results.
/// </summary>
/// <param name="sender">sender of the event</param>
/// <param name="e">event parameters</param>
private void helpSearch2_HitSelected(object sender, HitEventArgs e)
{
// if the selected topic has an URL
if( e.URL.Length > 0)
{
// Navigate the browser to this URL
NavigateBrowser(e.URL);
}
}
6.12 The HelpProviderEx componentNew in version 0.3 is an extendet The Since you will use /// <summary>
/// The interface <c>IHelpViewer</c> defines methods/properties for a
The provided sample viewer application is implementing this interface so you can have a look there on how it can be done. If you want to use the
If you want to use the
Since
You can use the There are a few tutorials online on how to use the 7. Useful links
8. ConclusionThe class library holds all the data extracted in the memory. This may speed up some tasks like searching, but needs memory (e.g. ~10MB for DirectX 9.0 SDK help file). Since some of the internal file formats are not completely clear (garbage spaces, unknown values, ...), a binary merge of CHM files will be difficult. That's the reason why I'm loading the TOC and index into memory and merging them using internal classes. I haven't got enough time to fully test the new introduced Special thanks to Nick Butler who gave alot of bug and feature inputs for version 0.4 and helped me improving the library ;) 9. History
| |||||||||||||||||||||||||||||||||