Click here to Skip to main content
Click here to Skip to main content

Decompiling CHM (help) files with C#

, 11 Nov 2003
Rate this:
Please Sign up or sign in to vote.
Introduction to IStorage interface and MS Help file format including sample C# decompilation DLL for CHM files.

Introduction

This articles demonstrates the use of IStorage interface in managed C# code based on a simple CHM (MSHelp 1.0) decompiler.

Decompiling CHM (help) files with C#

Recently I came across a very interesting problem related to IStorage interface. I had to be able to manipulate IStorage container from managed code. Since my knowledge of COM is limited, I thought I would simply find a snippet of code using Google. I did hope that the provided example would allow me to Read/Write files from compound storage structures. I discovered a couple of incomplete snippets posted in newsgroups but that was not enough. Not enough for me to be lazy and enjoy my favorite programming technique of CTRL+C, CTRL+V. I had to do some actual work on my own. I needed a wrapper what would allow me to easily access the internal structure of any compound storage object. By the way, for the latest version of this wrapper don't forget to visit here.

According to Microsoft, IStorage interface supports the creation and management of structured storage objects. Structured storage allows hierarchical storage of information within a single file, and is often referred to as "a file system within a file". Yes, it does sound interesting but a bit complicated. How about a real life example?

Well, the most simple and powerful example of compound storage object would be good old CHM files. The compound file implementation of IStorage allows you to create and manage sub-storages and streams within a storage object residing in a compound file object. You can pack your entire collection of help documents, HTML files, images etc. into a single IStorage object to save space and to provide your users with a standard file that can be viewed with your trusted help-viewer. Typically you would use tools provided by Microsoft such as HTML Help Workshop to manipulate a collection of help related information. You can also take a look at Microsoft HTML Help 1.4 SDK to get a complete picture of what is a CHM help file anyway and why you need it.

How about reversed process? HTML Help Workshop supports decompiling as well, but it is a standalone application after all and you can’t use it if you want to automate certain processes. Let's say you want to be able to access content from a CHM file in the managed code without the “hassle” of Microsoft UI. Naturally we need some kind of a wrapper that will simplify READ/WRITE operations for us.

Let’s try to access the content of IStorage structure using standard COM interfaces provided to us by Microsoft. Microsoft did not provide us with managed classes to manipulate IStorage objects directly, so we’ll have to rely on System.Runtime.InteropServices to import interfaces that we’ll need in our managed code.

Just to give you an idea of what elements are present in our IStorage solution, take a look at the snapshot of Visual Studio Project, above.

We’ll organize our collection of classes into a single RelatedObjects.Storage namespace. We’ll create a IBaseStorageWrapper class that will help us to enumerate all objects packed into an IStorage file.

Ideally we would like to be able to access any stream in a file separately, so we’ll create IStorageWrapper and ITStorageWrapper classes that will inherit from the IBaseStorageWrapper. These classes will help us to handle different types of objects stored in the main compound storage object.

The only difference between ITStorageWrapper and IStorageWrapper is the way they access internally stored objects.

IStorageWrapper is using the StgOpenStorage interface available in Ole32.dll.

public class Ole32
{
    [DllImport("Ole32.dll")]
    public static extern int StgOpenStorage (
        [MarshalAs(UnmanagedType.LPWStr)] string wcsName,
        IStorage pstgPriority,
        int grfMode,               // access method
        IntPtr snbExclude,        // must be NULL
        int    reserved,         // reserved
        out IStorage storage    // returned storage
        );
}

ITStorageWrapper is using the ITStorage.StgOpenStorage available via the ITStorage COM interface.

/// <summary>
/// .NET interface wrapper for InfoTech interface for ITStorage COM object
/// </summary>
[ComImport,
Guid("88CC31DE-27AB-11D0-9DF9-00A0C922E6EC"),
InterfaceType(ComInterfaceType.InterfaceIsIUnknown),
SuppressUnmanagedCodeSecurity]
public interface ITStorage 
{    
    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfile([In, 
            MarshalAs(UnmanagedType.BStr)] string pwcsName, 
            int    grfMode, 
            int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfileOnILockBytes(ILockBytes plkbyt, 
                                int grfMode, int reserved);

    int StgIsStorageFile([In, 
         MarshalAs(UnmanagedType.BStr)] string pwcsName);
    int StgIsStorageILockBytes(ILockBytes plkbyt);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorage([In, 
                MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                IntPtr pstgPriority,
                [In, MarshalAs(UnmanagedType.U4)] int grfMode, 
                IntPtr snbExclude, 
                [In, MarshalAs(UnmanagedType.U4)] int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorageOnILockBytes(ILockBytes plkbyt, 
                IStorage pStgPriority, 
                int grfMode, 
                IntPtr snbExclude, 
                int reserved);

    int StgSetTimes([In, MarshalAs(UnmanagedType.BStr)] string lpszName, 
                    FILETIME pctime, 
                    FILETIME patime, 
                    FILETIME pmtime);
    int    SetControlData(ITS_Control_Data pControlData);
    int    DefaultControlData(ITS_Control_Data ppControlData);
    int    Compact([In, MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                        ECompactionLev iLev);
}

Let's look at the flow of our cute little wrapper by looking at the TEST application included in this project. Our goal in this example is to access content elements stored inside a CHM file. As I mentioned above, a CHM file is a compound storage archive that contains a collection of separate files. We will be using our RelatedObjects.Storage dll (compiled separately) to demonstrate how easy it is to access stream objects stored in IStorage archive from managed code.

[STAThread]
static void Main(string[] args)
{
// Create Instance of ITStorageWrapper.
// During initialization constructor will process CHM file 
// and create collection of file objects stored inside CHM file.
ITStorageWrapper iw = new ITStorageWrapper(@"I:\apps\abslog.chm");

// Loop through collection of objects stored inside IStorage
foreach(IBaseStorageWrapper.FileObjects.FileObject 
                                          fileObject in iw.foCollection)
{
    // Check to make sure we can READ stream of an individual file object
    if (fileObject.CanRead)
    {
        // We only want to extract HTM files 
        //in this example fileObject is our 
        // representation of internal file stored in IStorage
        if (fileObject.FileName.EndsWith(".htm"))
        {
            Console.WriteLine("Path: " + fileObject.FilePath);
            Console.WriteLine("File: " + fileObject.FileName);

            // FileUrl - is an external reference 
            //to the internal object. It allows you to display content 
            //of a single file in Internet Explorer
            // without extracting content from the archive
            Console.WriteLine("Url: " + fileObject.FileUrl);

            string fileString = fileObject.ReadFromFile();
            Console.WriteLine("Text: " + fileString);

            // Direct Extraction sample
            fileObject.Save(@"i:\apps\test1\" + fileObject.FileName);

            // Read first and then save later example
            StreamWriter sw = File.CreateText(@"i:\apps\" + 
                                     fileObject.FileName);
            sw.WriteLine(fileString);
            sw.Close();

            Console.ReadLine();
        }
    }
}
Console.ReadLine();
}

Conclusion

As you can see, this example demonstrates several useful internal file manipulation methods. I'm sure that you could use other methods as well. Our company will continue enhancing some of the functionality to make it even easier to manage IStorage objects. You can visit our site to get latest and greatest version and download updated documentation here.

If you have any questions or need latest source code e-mail me at: support@asprelated.com or click here.

Have a wonderful day!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Yuriy Maksymenko

United States United States
No Biography provided

Comments and Discussions

 
QuestionThe code seems to lock the fisical CHM archive Pinmemberabacrotto23-Oct-12 1:29 
GeneralMy vote of 2 PinmemberMember 45629847-Dec-10 7:09 
AnswerStgOpenStorage throws exception if the file name contains unicode string Pinmemberkevinku19835-Oct-08 22:05 
GeneralReleatedObjects.Storage Code from .NET Reflector PinmemberTom Clement19-Dec-06 12:34 
QuestionfileUrl doesn't work [modified] Pinmembertrupci23-Jul-06 21:20 
AnswerRe: fileUrl doesn't work PinmemberPMarian26-Sep-06 3:18 
GeneralHLP Decompiler PinmemberHasindu1-Feb-06 23:34 
GeneralI have a better way Pinmemberhost2818-Oct-05 0:02 
GeneralRe: I have a better way Pinmembermarc1518-Oct-05 6:20 
GeneralRe: I have a better way Pinmemberhost2818-Oct-05 15:24 
GeneralRe: I have a better way Pinmembermarc1518-Oct-05 22:50 
GeneralRe: I have a better way Pinmemberviny12326-Dec-05 21:35 
GeneralRe: I have a better way [modified] Pinmemberhost2827-Dec-05 14:23 
GeneralRe: I have a better way PinmemberTomas Brennan22-Sep-06 5:51 
GeneralRe: I have a better way PinmemberJon Ide11-Jan-07 6:25 
GeneralRe: I have a better way Pinmemberhost2830-Oct-07 15:45 
GeneralRe: I have a better way Pinmembersppradip28-Feb-09 22:21 
GeneralRe: I have a better way Pinmemberhost2812-Mar-09 15:56 
GeneralRe: I have a better way PinmemberWuJunyin20-Apr-09 16:55 
GeneralRe: I have a better way PinmemberPBSnake22-Aug-11 16:09 
QuestionHow to close the storage. PinmemberGotchyeah200416-Aug-05 5:29 
AnswerRe: How to close the storage. Pinmemberelirech24-Apr-08 19:14 
GeneralRe: How to close the storage. Pinmemberayeah2-Dec-08 13:40 
GeneralRelatedObjects.Storage sources Pinmemberintelligo9-Aug-05 11:52 
GeneralMessage Removed PinmemberAlexander Youmashev23-May-05 22:38 
GeneralRe: who asked for sources? :) Pinsussgwinterslo27-May-05 10:26 
QuestionCan you please update the code with ALL the source PinmemberGwinters6-May-05 11:18 
GeneralWithout source this article is useless PinmemberJoe Woodbury17-Sep-04 7:25 
GeneralPath Not True! Pinmemberlixinghai25-May-04 16:18 
GeneralRe: Path Not True! PinmemberYuriy Maksymenko26-May-04 3:43 
GeneralRe: I'll update the library and post it here Pinmemberlixinghai8-Jun-04 17:54 
GeneralRe: I'll update the library and post it here PinmemberRed_110-Jun-04 6:45 
QuestionSource code for RelatedObjects? Pinmemberjharker995-May-04 10:07 
AnswerRe: Source code for RelatedObjects? PinmemberYuriy Maksymenko26-May-04 3:43 
GeneralRe: Source code for RelatedObjects? Pinmemberilikc0de11-Jan-07 2:59 
Generalwell. PinmemberChen Huisheng21-Apr-04 2:38 
GeneralRe: well. PinmemberChen Huisheng21-Apr-04 3:26 
GeneralRe: well. PinmemberChen Huisheng21-Apr-04 3:38 
GeneralRe: well. Pinmembercn_wpf15-Jun-04 1:14 
GeneralZIP File Seems Incomplete PinmemberPhilWilson18-Nov-03 10:53 
GeneralRe: ZIP File Seems Incomplete PinmemberChen Huisheng21-Apr-04 3:41 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.141022.2 | Last Updated 12 Nov 2003
Article Copyright 2003 by Yuriy Maksymenko
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid