Click here to Skip to main content
Click here to Skip to main content

Automated IE SaveAs MHTML

By , 4 Sep 2002
 

Introduction

The purpose of this article is to show how to automate the fully fledged Save As HTML feature from Internet Explorer, which is normally hidden to those using the Internet Explorer API. Saving the current document as MHTML format is just one of the options available, including:

  • Save As MHTML (whole web page, images, ... in a single file)
  • Save As Full HTML (additional folder for images, ...)
  • Save HTML code only
  • Save As Text

Saving Silently as HTML Using the Internet Explorer API

In fact, the ability to save the current web page for storage without showing a single dialog box is already available to everyone under C++, using the following code, with an important restriction:

LPDISPATCH lpDispatch = NULL;
IPersistFile *lpPersistFile = NULL;

// m_ctrl is an instance of the Web Browser control
lpDispatch = m_ctrl.get_Document();
lpDispatch->QueryInterface(IID_IPersistFile, (void**)&lpPersistFile);

lpPersistFile->Save(L"c:\\htmlpage.html",0);
lpPersistFile->Release();
lpDispatch->Release();

(caption for code above) Saving HTML code only, without dialog boxes

The restriction is that we are talking about the HTML code only, not the web page. Of course, what is interesting is to gain access to full HTML archives with images and so on.

Because there is no "public" or known way to ask for this feature without showing one or more dialog boxes from Internet Explorer, what we are going to do is hook the operating system to listen all window creations, including the dialog boxes. Then we'll ask Internet Explorer for the feature and override the file path from the dialog boxes without being seen. Finally, we'll mimic the user clicking on the Save button to validate the dialog box and unhook ourselves. That's done!

Hooking Internet Explorer to Save As HTML without popping the dialog boxes

This was the short workflow, but there are a few tricks to get along and this article is a unique opportunity to go into detail. By the way, the code is rooted by an article from MS about how to customize Internet Explorer Printing by hooking the Print dialog boxes; see here or here. In our app, we have our own Save As feature:

m_wbSaveAs.Config( CString("c:\\htmlpage.mhtml"), SAVETYPE_ARCHIVE );
m_wbSaveAs.SaveAs();

// where the second parameter is the type of HTML needed :
typedef enum _SaveType
{
    SAVETYPE_HTMLPAGE = 0,
    SAVETYPE_ARCHIVE,
    SAVETYPE_HTMLONLY,
    SAVETYPE_TXTONLY
} SaveType;

We start the SaveAs() implementation by installing the hook:

// prepare SaveAs Dialog hook
//
g_hHook = SetWindowsHookEx(WH_CBT, CbtProc, NULL, GetCurrentThreadId());
if (!g_hHook)
    return false;

// make SaveAs Dialog appear
//
// cmd = OLECMDID_SAVEAS (see ./include/docobj.h)
g_bSuccess = false;
g_pWebBrowserSaveAs = this;
HRESULT hr = m_pWebBrowser->ExecWB(OLECMDID_SAVEAS, 
    OLECMDEXECOPT_PROMPTUSER, NULL, NULL);

// remove hook
UnhookWindowsHookEx(g_hHook);
g_pWebBrowserSaveAs = NULL;
g_hHook = NULL;

The hook callback procedure is just hardcore code; see for yourself:

LRESULT CALLBACK CSaveAsWebbrowser::CbtProc(int nCode, 
    WPARAM wParam, LPARAM lParam) 
{  
    // the windows hook sees for each new window being created :
    // - HCBT_CREATEWND : when the window is about to be created
    //      we check out if it is a dialog box (classid = 0x00008002, 
    //      see Spy++)
    //      and we hide it, likely to be the IE SaveAs dialog
    // - HCBT_ACTIVATE : when the window itself gets activited
    //      we run a separate thread, and let IE do his own init steps in 
    //      the mean time
    switch (nCode)
    {
        case HCBT_CREATEWND:
        {
            HWND hWnd = (HWND)wParam;
            LPCBT_CREATEWND pcbt = (LPCBT_CREATEWND)lParam;
            LPCREATESTRUCT pcs = pcbt->lpcs;
            if ((DWORD)pcs->lpszClass == 0x00008002)
            {
                g_hWnd = hWnd;          // Get hwnd of SaveAs dialog
                pcs->x = -2 * pcs->cx;  // Move dialog off screen
            }
            break;
        }    
        case HCBT_ACTIVATE:
        {
            HWND hwnd = (HWND)wParam;
            if (hwnd == g_hWnd)
            {
                g_hWnd = NULL;
                g_bSuccess = true;

                if (g_pWebBrowserSaveAs->IsSaveAsEnabled())
                {
                    g_pWebBrowserSaveAs->SaveAsDisable();

                    CSaveAsThread *newthread = new CSaveAsThread();
                    newthread->SetKeyWnd(hwnd);
                    newthread->Config( g_pWebBrowserSaveAs->GetFilename(), 
                        g_pWebBrowserSaveAs->GetSaveAsType() );
                    newthread->StartThread();
                }
            }
            break;
        }
    }
    return CallNextHookEx(g_hHook, nCode, wParam, lParam); 
}

In our thread, we wait until the Internet Explorer Save As dialog is ready with filled data:

switch(    ::WaitForSingleObject( m_hComponentReadyEvent, m_WaitTime) )
{
     ...
     if ( ::IsWindowVisible(m_keyhwnd) )
     {
         bSignaled = TRUE;
         bContinue = FALSE;
     }

     MSG msg ;
     while( PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) )
     {
         if (msg.message == WM_QUIT)
         {
              bContinue = FALSE ;
              break ;
         }
         TranslateMessage(&msg);
         DispatchMessage(&msg);
     }
     ...
}

// relaunch our SaveAs class, but now everything is ready to play with
if (bSignaled)
{
    CSaveAsWebbrowser surrenderNow;
    surrenderNow.Config( GetFilename(), GetSaveAsType() );
    surrenderNow.UpdateSaveAs( m_keyhwnd );
}

// kill the thread, we don't care anymore about it
delete this;

We can now override the appropriate data:

void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
    // editbox : filepath (control id = 0x047c)
    // dropdown combo : filetypes (options=complete page;
    //     archive;html only;txt) (control id = 0x0470)
    // save button : control id = 0x0001
    // cancel button : control id = 0x0002


    // select right item in the combobox
    SendMessage(GetDlgItem(hwnd, 0x0470), CB_SETCURSEL, 
        (WPARAM) m_nSaveType, 0);
    SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470,CBN_CLOSEUP), 
        (LPARAM) GetDlgItem(hwnd, 0x0470));

    // set output filename
    SetWindowText(GetDlgItem(hwnd, 0x047c), m_szFilename);

    // Invoke Save button
    SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);  
}

In the code above, it is funny to remark that to select the kind of HTML we want (full HTML, archive, code only or text format), we not only select the adequate entry in the combo-box, we also send Internet Explorer a combo-box CloseUp notification. This is because that's what Internet Explorer has subscribed for to know we want this kind of HTML. This behavior is known by hints-and-trials.

Conclusion

This article describes a technique to gain access to the fully fledged Save As HTML feature exposed by Internet Explorer. I have never seen an article about this topic on the 'net, whereas it's easy to figure out that it is a compelling feature for developers building web applications. Files you may use from the source code provided are:

  • SaveAsWebBrowser.h, *.cpp: hook procedure; fill the dialog box data
  • SaveAsThread.h, *.cpp: auxiliary thread for synchronization with Internet Explorer

The application is just a simple MFC-based CHtmlView application embedding the web browser control.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Addicted to reverse engineering. At work, I am developing business intelligence software in a team of smart people (independent software vendor).
 
Need a fast Excel generation component? Try xlsgen.
 

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralWin98 Different DialogIDmemberTDK90000010-May-04 5:16 
Great article but it fails for me using IE6 on Win98. I think the reason is that the dialog IDS are different. For example GetDlgItem(hwnd, 0x47c) returns NULL for me in UpdateSaveAs.
 
Does anyone know either
1. What the correct dialog IDS for win 98
2. Which dll the resource is in so I can try to find out the dialog ID
3. A way around this problem.
 
Cheers
 

GeneralRe: Win98 Different DialogID PinsussAnonymous11-May-04 0:10 
Appologies for answering my own question but the dll in question is COMDLG32.DLL and the SAVEAS resource in this has the following id change
 
editbox filepage = 0x480 (0x047c on xp)
 
so if you change the code in
void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd) to use this ID when running on 98 it works fine.
 

GeneralRe: Win98 Different DialogID PinmemberStephane Rodriguez.11-May-04 1:18 

Thanks for the follow up. I admit I didn't want to install a vmware with a WIN98 guest OS only to retrieve that damn ctrl id.Wink | ;-)

GeneralRe: Win98 Different DialogID PinsussAnonymous29-Aug-04 16:57 
Thanks, This follow up saved my butt!!
GeneralRe: Win98 Different DialogID PinmemberNorrieTaylor30-Aug-04 19:12 
How would you retrieve that ctrl id? I have a win98 guest running but I can't get spy++ to run on win98. Now I am kinda stuck.
 
Thanks
GeneralRe: Win98 Different DialogID PinmemberStephane Rodriguez.30-Aug-04 20:36 

If you can't get spy++ to run, then this doesn't bode well for the rest. You can find alternatives to spy++ in codeproject.
Also I wonder why the ip provided in this thread (id = 0x480) wouldn't work for you, unless you have a totally different version of IE - a version where the file dialogs would be hooked by a third-party or something like that.
 
I admit the way of trying to achieve the save as mhtml is not that good in practice. It's probably bes tto either rely on a third-party (read old threads), or reuse the IE cache to get the bits and create the mhtml yourself.
GeneralRe: Win98 Different DialogID PinmemberNorrieTaylor31-Aug-04 18:57 
Thanks for the quick reply.
 
It seems that in Win 98 there is a problem getting the handle to the "Save Web Page" dialog.
 
In the callback proc under XP three dialogs actually get processed during the HCBT_CREATEWND case. The "Save Web Page" being the third and last. You can trace the names of these dialogs as they pass through the call back. However, in Win 98 only two two seem to pass through the callback function, none of which have the name "Save Web Page". However, the window still gets moved to the side.
 
Without the third and last window going through the callback proc the handle never get saved. Thus you can't manipulate the window later on.
 
Got any ideas?
 
Thanks again!!!
 

 

 

GeneralRe: Win98 Different DialogID PinmemberNorrieTaylor31-Aug-04 20:16 
Well I found a work around that I think might make this slightly more stable and that would fix my issue mentioned above.
 
In the callback function just replace this:
 
if (hwnd == g_hWnd)
 
with this:
 
TCHAR buff[512];
GetWindowText( hwnd, buff, 512 );
if ( strcmp( buff, "Save Web Page") == 0 )
 
Best regards,
 
Norrie
 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130619.1 | Last Updated 5 Sep 2002
Article Copyright 2002 by Stephane Rodriguez.
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid