Click here to Skip to main content
15,867,686 members
Articles / Programming Languages / C#
Article

Check Help Links Tool

Rate me:
Please Sign up or sign in to vote.
4.97/5 (13 votes)
6 Oct 2005CPOL6 min read 85.5K   17.3K   34   19
A tool to check links across merged help (CHM) files.

Note: You will need the MSHTML library if you have only installed the .NET redistributable, not the SDK (or VS).

Application image

Introduction

This article presents a tool for checking links in help (CHM) files, including merged files. The app checks links in indexes, tables of contents, and in the topic files themselves, and presents the results in a simple tree view.

Background

I have a library which is documented using XML comments. I use NDoc[^] to generate a CHM for the library reference help, but I also have a hand-written "master" CHM file which contains higher-level help. I merge these two help files, and they both contain links to each other. I couldn't find a tool to check these links, so I wrote one :)

This was actually quite easy to write as it is really just a front end for the HtmlHelp library and example viewer[^] by Klaus Weisser[^]. All the details of reading CHM files are handled by the library, which means all I had to do was write the UI and the checking algorithms.

Basically, if you need this tool, it will be invaluable; if you don't, well, thanks for looking.

Using the application

The application is very easy to use - at least I tried to make it that way.

The first thing to do is to open a CHM file. This loads the file and any merged files into the library, and starts to fill out the results tree. You can refresh the file at any time to get back to this state. At this stage, the index, table of contents, and topic files are present, but the links from the topic files are not.

It turns out that getting the links from an HTML file is not that easy. The only article I could find on the subject was on MSDN: Walkthrough: Accessing the DHTML DOM from C#[^]. This basically says to open an IE window, load the file, and then get the links from the DOM. This is slow. And if your CHM file has thousands of pages, this is very slow. However, it is the most accurate method, and is therefore the default.

You can speed things up by orders of magnitude by selecting the "fast parsing" option on the toolbar. This uses regular expressions to search the raw HTML. This works well with simple HTML, but is not perfect, although it does now exclude commented blocks. It is useful as a first pass, but I suggest checking using the slow method before you ship your Help files.

Now you have selected your parsing method, you can start checking by hitting the "Work" toolbar button or pressing Ctrl+W. This process loads and parses all the topic files, and checks all the links found. This process works on the subtree selected in the tree view. By default, the root node is selected, so the whole help collection is checked, but you can select a portion of the tree if you are just interested in that branch. The results are displayed by setting the icon of each tree view item.

Each item in the results has a state, which can be either "unknown", "good", "http", "script", or "broken". The state of the parent items is set according to the state of its child items. So, a parent item is only marked as good if none of its children are marked as broken. This quickly gives you an idea of the correctness of your links, and the detailed metrics are displayed in brackets after each item.

You can then navigate the results using the comprehensive Find feature including regular expressions:

Find dialog image

You can also export the results to CSV files. This produces one overall summary file, and one details file for each CHM.

All this can be automated using command line options. These are:

  • -f / -fast: select fast mode
  • -s / -slow: select slow mode
  • -o / -open: open most recently used CHM
  • -o:"C:\path\xxx.chm": open specified CHM
  • -w / -work: check file
  • -x / -export: export reports to default folder
  • -x:"C:\path\folder": export to specified folder
  • -c / -close: close app

So an example could be: CheckHelpLinks.exe -f -o:"C:\test\test.chm" -w -x -c.

The exit code of the application is zero if there are no errors, and non-zero if there are broken links.

And that's about it. Go and fix your broken links and then check again!

Points of Interest

Reflection Magic

I needed access to a property declared as internal in the library. Obviously, this would normally not be available outside the library, but you can get access to it by using reflection. This is a very powerful practice, but it is not ideal: the library author would be quite right to change his internal implementation in future releases, which would break the client code. So, use at your own risk!

This code comes from the IndexItem.Load method in the ClassesIndex.cs file.

What I wanted to write was this:

C#
CHMFile chmFile = indexItem.ChmFile;

What I actually wrote was this:

C#
Type t = indexItem.GetType();
PropertyInfo p = t.GetProperty( "ChmFile",
    BindingFlags.Instance | BindingFlags.NonPublic );
CHMFile chmFile = ( CHMFile ) p.GetValue( indexItem, null );

Accessing the DHTML DOM

The only way I could find of accessing the DOM was by opening a web browser control, loading the HTML, and then getting an IHTMLDocument2 object from the Document property of the control. Apart from being slow, this is a bit tricky, since the loading process is asynchronous. I solved this by setting a flag before beginning the navigation, and waiting until the DocumentComplete event handler cleared the flag. During the wait, I call Application.DoEvents to keep the UI fresh. I don't know how good this is, as it all happens in a dialog box; what I really wanted was to run the dialog box's message loop, but I couldn't find a way of doing this :(. The code is in the DlgBrowser.cs file, if you want to have a look.

The Regular Expressions

These are the RegEx's used in fast mode. There are two RegEx's for both the anchors and links, which look for elements with and without quotes:

  • Comment: ^(?<before>.*)(?<comment><!--.*?-->)(?<after>.*)$
  • Anchor 1: <\s*A\s[^>]*name\s*=\s*(?<anchor>[^'"].*?)[\s>]
  • Anchor 2: <\s*A\s[^>]*name\s*=\s*(?<quote>['"])(?<anchor>.*?)\k<quote>
  • Link 1: <\s*(?:A|AREA)\s[^>]*href\s*=\s*(?<url>[^'"].*?)[\s>]
  • Link 2: <\s*(?:A|AREA)\s[^>]*href\s*=\s*(?<quote>['"])(?<url>.*?)\k<quote>

References

Acknowledgements

Special thanks to Klaus Weisser[^] for his excellent library and for his help while writing this.

Also, thanks to Ryan Pollack for his help with version 2.

More thanks to Karen Story[^] and Dmitri Posudin[^] for their help with version 3.

History

  • Version 3: 2005 September 30
    • advanced Find feature
    • anchors (bookmarks) now handled
    • improved RegEx's for Fast mode
    • strips comment blocks in Fast mode
    • results export to CSV file
    • view source of any link directly
    • JavaScript links handled separately
    • command line options added
    • now handles many more formats of links correctly
    • new icon
  • Version 2: 2005 August 20
    • links to CHM files with no 'ms-its' prefix now handled
    • links to external (not merged) CHM files now handled
    • HTTP links in CHM files now counted separately
    • a couple of bug fixes
  • Version 1: 2004 August 16
    • First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
I discovered C# and .NET 1.0 Beta 1 in late 2000 and loved them immediately.
I have been writing software professionally in C# ever since

In real life, I have spent 3 years travelling abroad,
I have held a UK Private Pilots Licence for 20 years,
and I am a PADI Divemaster.

I now live near idyllic Bournemouth in England.

I can work 'virtually' anywhere!

Comments and Discussions

 
QuestionApp crashed when I tried to open the chm. Pin
fahad daryaee15-Jan-15 20:15
fahad daryaee15-Jan-15 20:15 
GeneralMy vote of 5 Pin
sinnerinc17-Sep-14 0:37
sinnerinc17-Sep-14 0:37 
GeneralMy vote of 2 Pin
bessettenic30-Jul-14 3:15
bessettenic30-Jul-14 3:15 
QuestionSlow mode always ends in success, even when there are legitimate failures. Pin
steharmon5-Oct-12 7:07
steharmon5-Oct-12 7:07 
QuestionBlank broken links caused by xml? Pin
Karen Story23-Apr-09 13:45
Karen Story23-Apr-09 13:45 
Generalunhandled exception Pin
JannisDre11-Jul-08 1:33
JannisDre11-Jul-08 1:33 
QuestionWhere to place the Microsoft.mshtml.dll ? Pin
frankefm17-Sep-07 5:04
frankefm17-Sep-07 5:04 
Questionhelp Pin
V.Sen...23-Aug-07 2:52
V.Sen...23-Aug-07 2:52 
Questionhelp in program Pin
Farzaneh_t19-Aug-07 20:25
Farzaneh_t19-Aug-07 20:25 
AnswerRe: help in program Pin
V.Sen...23-Aug-07 2:45
V.Sen...23-Aug-07 2:45 
GeneralUnknown links Pin
5735082-Jul-07 5:45
5735082-Jul-07 5:45 
QuestionSearch Didn't Go Deep Enough? Pin
jmcconda30-Oct-06 9:28
jmcconda30-Oct-06 9:28 
Perhaps I'm misunderstanding how the tool works, but we recently had a production issue with a broken link in one of our help files. I ran our main Help CHM document through this tool, but it failed to go deep enough into the tree to find the problem. The parent page of the broken link shows up Green, but has no child links and it was one of these child links that was broken (user gets IE "Page Cannot be Displayed" when clicking this link.)

I would love to be able to use this tool instead of manually clicking through all our help files, but I won't be able to trust it if it can't find our already known error.

Thanks,
John
AnswerRe: Search Didn't Go Deep Enough? Pin
jmcconda2-Nov-06 7:30
jmcconda2-Nov-06 7:30 
GeneralAwesome - got my 5 Pin
Nish Nishant21-Nov-05 8:22
sitebuilderNish Nishant21-Nov-05 8:22 
GeneralRe: Awesome - got my 5 Pin
Nicholas Butler21-Nov-05 9:37
sitebuilderNicholas Butler21-Nov-05 9:37 
GeneralMSHTML library Pin
André Ziegler6-Oct-05 10:55
André Ziegler6-Oct-05 10:55 
GeneralFixed Pin
Nicholas Butler7-Oct-05 6:06
sitebuilderNicholas Butler7-Oct-05 6:06 
Generalsmall bug and divers Pin
helpsoft.ru4-Sep-05 19:12
helpsoft.ru4-Sep-05 19:12 
GeneralRe: small bug and divers Pin
Charlie Williams6-Oct-05 8:06
Charlie Williams6-Oct-05 8:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.