Click here to Skip to main content
15,867,686 members
Articles / Programming Languages / C#
Article

Decompiling CHM (help) files with C#

Rate me:
Please Sign up or sign in to vote.
2.72/5 (28 votes)
11 Nov 20034 min read 187.1K   3.8K   67   41
Introduction to IStorage interface and MS Help file format including sample C# decompilation DLL for CHM files.

Introduction

This articles demonstrates the use of IStorage interface in managed C# code based on a simple CHM (MSHelp 1.0) decompiler.

Decompiling CHM (help) files with C#

Recently I came across a very interesting problem related to IStorage interface. I had to be able to manipulate IStorage container from managed code. Since my knowledge of COM is limited, I thought I would simply find a snippet of code using Google. I did hope that the provided example would allow me to Read/Write files from compound storage structures. I discovered a couple of incomplete snippets posted in newsgroups but that was not enough. Not enough for me to be lazy and enjoy my favorite programming technique of CTRL+C, CTRL+V. I had to do some actual work on my own. I needed a wrapper what would allow me to easily access the internal structure of any compound storage object. By the way, for the latest version of this wrapper don't forget to visit here.

According to Microsoft, IStorage interface supports the creation and management of structured storage objects. Structured storage allows hierarchical storage of information within a single file, and is often referred to as "a file system within a file". Yes, it does sound interesting but a bit complicated. How about a real life example?

Well, the most simple and powerful example of compound storage object would be good old CHM files. The compound file implementation of IStorage allows you to create and manage sub-storages and streams within a storage object residing in a compound file object. You can pack your entire collection of help documents, HTML files, images etc. into a single IStorage object to save space and to provide your users with a standard file that can be viewed with your trusted help-viewer. Typically you would use tools provided by Microsoft such as HTML Help Workshop to manipulate a collection of help related information. You can also take a look at Microsoft HTML Help 1.4 SDK to get a complete picture of what is a CHM help file anyway and why you need it.

How about reversed process? HTML Help Workshop supports decompiling as well, but it is a standalone application after all and you can’t use it if you want to automate certain processes. Let's say you want to be able to access content from a CHM file in the managed code without the “hassle” of Microsoft UI. Naturally we need some kind of a wrapper that will simplify READ/WRITE operations for us.

Let’s try to access the content of IStorage structure using standard COM interfaces provided to us by Microsoft. Microsoft did not provide us with managed classes to manipulate IStorage objects directly, so we’ll have to rely on System.Runtime.InteropServices to import interfaces that we’ll need in our managed code.

Just to give you an idea of what elements are present in our IStorage solution, take a look at the snapshot of Visual Studio Project, above.

We’ll organize our collection of classes into a single RelatedObjects.Storage namespace. We’ll create a IBaseStorageWrapper class that will help us to enumerate all objects packed into an IStorage file.

Ideally we would like to be able to access any stream in a file separately, so we’ll create IStorageWrapper and ITStorageWrapper classes that will inherit from the IBaseStorageWrapper. These classes will help us to handle different types of objects stored in the main compound storage object.

The only difference between ITStorageWrapper and IStorageWrapper is the way they access internally stored objects.

IStorageWrapper is using the StgOpenStorage interface available in Ole32.dll.

public class Ole32
{
    [DllImport("Ole32.dll")]
    public static extern int StgOpenStorage (
        [MarshalAs(UnmanagedType.LPWStr)] string wcsName,
        IStorage pstgPriority,
        int grfMode,               // access method
        IntPtr snbExclude,        // must be NULL
        int    reserved,         // reserved
        out IStorage storage    // returned storage
        );
}

ITStorageWrapper is using the ITStorage.StgOpenStorage available via the ITStorage COM interface.

/// <summary>
/// .NET interface wrapper for InfoTech interface for ITStorage COM object
/// </summary>
[ComImport,
Guid("88CC31DE-27AB-11D0-9DF9-00A0C922E6EC"),
InterfaceType(ComInterfaceType.InterfaceIsIUnknown),
SuppressUnmanagedCodeSecurity]
public interface ITStorage 
{    
    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfile([In, 
            MarshalAs(UnmanagedType.BStr)] string pwcsName, 
            int    grfMode, 
            int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgCreateDocfileOnILockBytes(ILockBytes plkbyt, 
                                int grfMode, int reserved);

    int StgIsStorageFile([In, 
         MarshalAs(UnmanagedType.BStr)] string pwcsName);
    int StgIsStorageILockBytes(ILockBytes plkbyt);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorage([In, 
                MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                IntPtr pstgPriority,
                [In, MarshalAs(UnmanagedType.U4)] int grfMode, 
                IntPtr snbExclude, 
                [In, MarshalAs(UnmanagedType.U4)] int reserved);

    [return:MarshalAs(UnmanagedType.Interface)]
    IStorage StgOpenStorageOnILockBytes(ILockBytes plkbyt, 
                IStorage pStgPriority, 
                int grfMode, 
                IntPtr snbExclude, 
                int reserved);

    int StgSetTimes([In, MarshalAs(UnmanagedType.BStr)] string lpszName, 
                    FILETIME pctime, 
                    FILETIME patime, 
                    FILETIME pmtime);
    int    SetControlData(ITS_Control_Data pControlData);
    int    DefaultControlData(ITS_Control_Data ppControlData);
    int    Compact([In, MarshalAs(UnmanagedType.BStr)] string pwcsName, 
                        ECompactionLev iLev);
}

Let's look at the flow of our cute little wrapper by looking at the TEST application included in this project. Our goal in this example is to access content elements stored inside a CHM file. As I mentioned above, a CHM file is a compound storage archive that contains a collection of separate files. We will be using our RelatedObjects.Storage dll (compiled separately) to demonstrate how easy it is to access stream objects stored in IStorage archive from managed code.

[STAThread]
static void Main(string[] args)
{
// Create Instance of ITStorageWrapper.
// During initialization constructor will process CHM file 
// and create collection of file objects stored inside CHM file.
ITStorageWrapper iw = new ITStorageWrapper(@"I:\apps\abslog.chm");

// Loop through collection of objects stored inside IStorage
foreach(IBaseStorageWrapper.FileObjects.FileObject 
                                          fileObject in iw.foCollection)
{
    // Check to make sure we can READ stream of an individual file object
    if (fileObject.CanRead)
    {
        // We only want to extract HTM files 
        //in this example fileObject is our 
        // representation of internal file stored in IStorage
        if (fileObject.FileName.EndsWith(".htm"))
        {
            Console.WriteLine("Path: " + fileObject.FilePath);
            Console.WriteLine("File: " + fileObject.FileName);

            // FileUrl - is an external reference 
            //to the internal object. It allows you to display content 
            //of a single file in Internet Explorer
            // without extracting content from the archive
            Console.WriteLine("Url: " + fileObject.FileUrl);

            string fileString = fileObject.ReadFromFile();
            Console.WriteLine("Text: " + fileString);

            // Direct Extraction sample
            fileObject.Save(@"i:\apps\test1\" + fileObject.FileName);

            // Read first and then save later example
            StreamWriter sw = File.CreateText(@"i:\apps\" + 
                                     fileObject.FileName);
            sw.WriteLine(fileString);
            sw.Close();

            Console.ReadLine();
        }
    }
}
Console.ReadLine();
}

Conclusion

As you can see, this example demonstrates several useful internal file manipulation methods. I'm sure that you could use other methods as well. Our company will continue enhancing some of the functionality to make it even easier to manage IStorage objects. You can visit our site to get latest and greatest version and download updated documentation here.

If you have any questions or need latest source code e-mail me at: support@asprelated.com or click here.

Have a wonderful day!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionThe code seems to lock the fisical CHM archive Pin
abacrotto23-Oct-12 1:29
abacrotto23-Oct-12 1:29 
GeneralMy vote of 2 Pin
Member 45629847-Dec-10 7:09
Member 45629847-Dec-10 7:09 
AnswerStgOpenStorage throws exception if the file name contains unicode string Pin
kevinku19835-Oct-08 22:05
kevinku19835-Oct-08 22:05 
It's called like ((ITStorage)Obj).StgOpenStorage(workPath, IntPtr.Zero, 32, IntPtr.Zero, 0)
If the workPath contains unicode string like "Ěščasdf", this method will throw exception.

Does anyone know how to fix it.

Best Regards,
Kevin
GeneralReleatedObjects.Storage Code from .NET Reflector Pin
Tom Clement19-Dec-06 12:34
professionalTom Clement19-Dec-06 12:34 
QuestionfileUrl doesn't work [modified] Pin
trupci23-Jul-06 21:20
trupci23-Jul-06 21:20 
AnswerRe: fileUrl doesn't work Pin
PMarian26-Sep-06 3:18
PMarian26-Sep-06 3:18 
GeneralHLP Decompiler Pin
Hasindu1-Feb-06 23:34
Hasindu1-Feb-06 23:34 
GeneralI have a better way Pin
host2818-Oct-05 0:02
host2818-Oct-05 0:02 
GeneralRe: I have a better way Pin
marc1518-Oct-05 6:20
marc1518-Oct-05 6:20 
GeneralRe: I have a better way Pin
host2818-Oct-05 15:24
host2818-Oct-05 15:24 
GeneralRe: I have a better way Pin
marc1518-Oct-05 22:50
marc1518-Oct-05 22:50 
GeneralRe: I have a better way Pin
viny12326-Dec-05 21:35
viny12326-Dec-05 21:35 
GeneralRe: I have a better way [modified] Pin
host2827-Dec-05 14:23
host2827-Dec-05 14:23 
GeneralRe: I have a better way Pin
Tomas Brennan22-Sep-06 5:51
Tomas Brennan22-Sep-06 5:51 
GeneralRe: I have a better way Pin
Jon Ide11-Jan-07 6:25
Jon Ide11-Jan-07 6:25 
GeneralRe: I have a better way Pin
host2830-Oct-07 15:45
host2830-Oct-07 15:45 
GeneralRe: I have a better way Pin
C Is Sharp28-Feb-09 22:21
C Is Sharp28-Feb-09 22:21 
GeneralRe: I have a better way Pin
host2812-Mar-09 15:56
host2812-Mar-09 15:56 
GeneralRe: I have a better way Pin
CooperWu20-Apr-09 16:55
CooperWu20-Apr-09 16:55 
GeneralRe: I have a better way Pin
PBSnake22-Aug-11 16:09
PBSnake22-Aug-11 16:09 
QuestionHow to close the storage. Pin
Gotchyeah200416-Aug-05 5:29
Gotchyeah200416-Aug-05 5:29 
AnswerRe: How to close the storage. Pin
elirech24-Apr-08 19:14
elirech24-Apr-08 19:14 
GeneralRe: How to close the storage. Pin
ayeah2-Dec-08 13:40
ayeah2-Dec-08 13:40 
GeneralRelatedObjects.Storage sources Pin
intelligo9-Aug-05 11:52
intelligo9-Aug-05 11:52 
GeneralMessage Closed Pin
23-May-05 22:38
Alexander Yumashev23-May-05 22:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.