Click here to Skip to main content
15,867,594 members
Articles / Desktop Programming / MFC
Article

Automated IE SaveAs MHTML

Rate me:
Please Sign up or sign in to vote.
5.00/5 (29 votes)
4 Sep 20023 min read 460.8K   7.9K   95   112
This article demonstrates how to automate IE's Save As functionality

Introduction

The purpose of this article is to show how to automate the fully fledged Save As HTML feature from Internet Explorer, which is normally hidden to those using the Internet Explorer API. Saving the current document as MHTML format is just one of the options available, including:

  • Save As MHTML (whole web page, images, ... in a single file)
  • Save As Full HTML (additional folder for images, ...)
  • Save HTML code only
  • Save As Text

Saving Silently as HTML Using the Internet Explorer API

In fact, the ability to save the current web page for storage without showing a single dialog box is already available to everyone under C++, using the following code, with an important restriction:

C++
LPDISPATCH lpDispatch = NULL;
IPersistFile *lpPersistFile = NULL;

// m_ctrl is an instance of the Web Browser control
lpDispatch = m_ctrl.get_Document();
lpDispatch->QueryInterface(IID_IPersistFile, (void**)&lpPersistFile);

lpPersistFile->Save(L"c:\\htmlpage.html",0);
lpPersistFile->Release();
lpDispatch->Release();

(caption for code above) Saving HTML code only, without dialog boxes

The restriction is that we are talking about the HTML code only, not the web page. Of course, what is interesting is to gain access to full HTML archives with images and so on.

Because there is no "public" or known way to ask for this feature without showing one or more dialog boxes from Internet Explorer, what we are going to do is hook the operating system to listen all window creations, including the dialog boxes. Then we'll ask Internet Explorer for the feature and override the file path from the dialog boxes without being seen. Finally, we'll mimic the user clicking on the Save button to validate the dialog box and unhook ourselves. That's done!

Image 1

Hooking Internet Explorer to Save As HTML without popping the dialog boxes

This was the short workflow, but there are a few tricks to get along and this article is a unique opportunity to go into detail. By the way, the code is rooted by an article from MS about how to customize Internet Explorer Printing by hooking the Print dialog boxes; see here or here. In our app, we have our own Save As feature:

C++
m_wbSaveAs.Config( CString("c:\\htmlpage.mhtml"), SAVETYPE_ARCHIVE );
m_wbSaveAs.SaveAs();

// where the second parameter is the type of HTML needed :
typedef enum _SaveType
{
    SAVETYPE_HTMLPAGE = 0,
    SAVETYPE_ARCHIVE,
    SAVETYPE_HTMLONLY,
    SAVETYPE_TXTONLY
} SaveType;

We start the SaveAs() implementation by installing the hook:

C++
// prepare SaveAs Dialog hook
//
g_hHook = SetWindowsHookEx(WH_CBT, CbtProc, NULL, GetCurrentThreadId());
if (!g_hHook)
    return false;

// make SaveAs Dialog appear
//
// cmd = OLECMDID_SAVEAS (see ./include/docobj.h)
g_bSuccess = false;
g_pWebBrowserSaveAs = this;
HRESULT hr = m_pWebBrowser->ExecWB(OLECMDID_SAVEAS, 
    OLECMDEXECOPT_PROMPTUSER, NULL, NULL);

// remove hook
UnhookWindowsHookEx(g_hHook);
g_pWebBrowserSaveAs = NULL;
g_hHook = NULL;

The hook callback procedure is just hardcore code; see for yourself:

C++
LRESULT CALLBACK CSaveAsWebbrowser::CbtProc(int nCode, 
    WPARAM wParam, LPARAM lParam) 
{  
    // the windows hook sees for each new window being created :
    // - HCBT_CREATEWND : when the window is about to be created
    //      we check out if it is a dialog box (classid = 0x00008002, 
    //      see Spy++)
    //      and we hide it, likely to be the IE SaveAs dialog
    // - HCBT_ACTIVATE : when the window itself gets activited
    //      we run a separate thread, and let IE do his own init steps in 
    //      the mean time
    switch (nCode)
    {
        case HCBT_CREATEWND:
        {
            HWND hWnd = (HWND)wParam;
            LPCBT_CREATEWND pcbt = (LPCBT_CREATEWND)lParam;
            LPCREATESTRUCT pcs = pcbt->lpcs;
            if ((DWORD)pcs->lpszClass == 0x00008002)
            {
                g_hWnd = hWnd;          // Get hwnd of SaveAs dialog
                pcs->x = -2 * pcs->cx;  // Move dialog off screen
            }
            break;
        }    
        case HCBT_ACTIVATE:
        {
            HWND hwnd = (HWND)wParam;
            if (hwnd == g_hWnd)
            {
                g_hWnd = NULL;
                g_bSuccess = true;

                if (g_pWebBrowserSaveAs->IsSaveAsEnabled())
                {
                    g_pWebBrowserSaveAs->SaveAsDisable();

                    CSaveAsThread *newthread = new CSaveAsThread();
                    newthread->SetKeyWnd(hwnd);
                    newthread->Config( g_pWebBrowserSaveAs->GetFilename(), 
                        g_pWebBrowserSaveAs->GetSaveAsType() );
                    newthread->StartThread();
                }
            }
            break;
        }
    }
    return CallNextHookEx(g_hHook, nCode, wParam, lParam); 
}

In our thread, we wait until the Internet Explorer Save As dialog is ready with filled data:

C++
switch(    ::WaitForSingleObject( m_hComponentReadyEvent, m_WaitTime) )
{
     ...
     if ( ::IsWindowVisible(m_keyhwnd) )
     {
         bSignaled = TRUE;
         bContinue = FALSE;
     }

     MSG msg ;
     while( PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) )
     {
         if (msg.message == WM_QUIT)
         {
              bContinue = FALSE ;
              break ;
         }
         TranslateMessage(&msg);
         DispatchMessage(&msg);
     }
     ...
}

// relaunch our SaveAs class, but now everything is ready to play with
if (bSignaled)
{
    CSaveAsWebbrowser surrenderNow;
    surrenderNow.Config( GetFilename(), GetSaveAsType() );
    surrenderNow.UpdateSaveAs( m_keyhwnd );
}

// kill the thread, we don't care anymore about it
delete this;

We can now override the appropriate data:

C++
void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
    // editbox : filepath (control id = 0x047c)
    // dropdown combo : filetypes (options=complete page;
    //     archive;html only;txt) (control id = 0x0470)
    // save button : control id = 0x0001
    // cancel button : control id = 0x0002


    // select right item in the combobox
    SendMessage(GetDlgItem(hwnd, 0x0470), CB_SETCURSEL, 
        (WPARAM) m_nSaveType, 0);
    SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470,CBN_CLOSEUP), 
        (LPARAM) GetDlgItem(hwnd, 0x0470));

    // set output filename
    SetWindowText(GetDlgItem(hwnd, 0x047c), m_szFilename);

    // Invoke Save button
    SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);  
}

In the code above, it is funny to remark that to select the kind of HTML we want (full HTML, archive, code only or text format), we not only select the adequate entry in the combo-box, we also send Internet Explorer a combo-box CloseUp notification. This is because that's what Internet Explorer has subscribed for to know we want this kind of HTML. This behavior is known by hints-and-trials.

Conclusion

This article describes a technique to gain access to the fully fledged Save As HTML feature exposed by Internet Explorer. I have never seen an article about this topic on the 'net, whereas it's easy to figure out that it is a compelling feature for developers building web applications. Files you may use from the source code provided are:

  • SaveAsWebBrowser.h, *.cpp: hook procedure; fill the dialog box data
  • SaveAsThread.h, *.cpp: auxiliary thread for synchronization with Internet Explorer

The application is just a simple MFC-based CHtmlView application embedding the web browser control.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
France France
Addicted to reverse engineering. At work, I am developing business intelligence software in a team of smart people (independent software vendor).

Need a fast Excel generation component? Try xlsgen.

Comments and Discussions

 
QuestionNow have this working with IE 9, VS 2010 and C++ (no CLI) [modified] Pin
2374129-Jul-11 11:50
2374129-Jul-11 11:50 
AnswerRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) Pin
Member 81411146-Aug-11 15:26
Member 81411146-Aug-11 15:26 
GeneralRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) Pin
237417-Aug-11 11:43
237417-Aug-11 11:43 
AnswerRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) Pin
Member 81411147-Aug-11 7:28
Member 81411147-Aug-11 7:28 
QuestionAutomated IE SaveAs on C# Pin
jgiami240212-Jul-11 7:55
jgiami240212-Jul-11 7:55 
GeneralHere is perfectly working C# code Pin
Madhava Maydipalle19-Nov-09 4:05
Madhava Maydipalle19-Nov-09 4:05 
Thanks for all the people on this thread. here is working example..

Usage :
To use from IE plugin :

SHDocVw.IWebBrowser2 browser = GetCurrentBrowser();
MHTHelper mhtHelper = new MHTHelper();
bool bMhtFile = mhtHelper.SaveAs(browser, filePath, EnumBrowserFileSaveType.SAVETYPE_ARCHIVE);



#region ---- EnumBrowserFileSaveType ----
public enum EnumBrowserFileSaveType
{
SAVETYPE_HTMLPAGE = 0,
SAVETYPE_ARCHIVE,
SAVETYPE_HTMLONLY,
SAVETYPE_TXTONLY
}
#endregion

public class MHTHelper
{
#region ---- Constructor ----
public MHTHelper()
{
}
#endregion
#region ---- Private Attributes ----
private IWebBrowser2 webBrowser;
private string filePath;
private EnumBrowserFileSaveType saveType;
private WindowHookProc HookProcedure;
private int windowHook = 0;
private IntPtr hwndSaveAsDlg = (IntPtr)0;
private MHTHelper saveAsMht = null;

#endregion
#region ---- Public Attributes ----
public IWebBrowser2 WebBrowser
{
get { return webBrowser; }
set { webBrowser = value; }
}

public string FilePath
{
get { return filePath; }
set { filePath = value; }
}
public EnumBrowserFileSaveType SaveType
{
get { return saveType; }
set { saveType = value; }
}
#endregion
#region ---- SaveAs ----
/// <summary>
/// In this function we are automating the following functionality.
/// 1.Select 'File->Save AS' menu item on IE
/// 2.Take SaveAS dialog out of screen so that user cannot interact with it.(we cannot hide it, because of IE security policy.)
/// 3.Set the required file path to the save as dialog.
/// 4.Click on save button.
/// 5.IE automatically display the status bar to show the extraction process of currently displayed web page into mht file.
/// </summary>
/// <param name="webBrowser"></param>
/// <param name="pathFile"></param>
/// <param name="saveType"></param>
/// <returns></returns>
public bool SaveAs(IWebBrowser2 webBrowser, string pathFile, EnumBrowserFileSaveType saveType)
{
try
{
this.WebBrowser = webBrowser;
this.FilePath = pathFile;
this.SaveType = saveType;
//If no path is supplied or file already exists then it prompts for user action.
if (0 == pathFile.Length)
pathFile = "untitled";
if ((null == webBrowser) || (0 != windowHook))
return false;

this.HookProcedure = new WindowHookProc(this.SaveAsHookProc);
// prepare SaveAs dialog hook and activate it.
this.windowHook = SetWindowsHookEx(5 /*WH_CBT*/, HookProcedure, (IntPtr)0, AppDomain.GetCurrentThreadId());
if (this.windowHook == 0)
return false;
try
{
// This following code shows the save as dialog
this.saveAsMht = this;
object o = null;
webBrowser.ExecWB(SHDocVw.OLECMDID.OLECMDID_SAVEAS, SHDocVw.OLECMDEXECOPT.OLECMDEXECOPT_PROMPTUSER, ref o, ref o);
this.saveAsMht = null;
return true;
}
finally
{
//Here we took out our hook .
UnhookWindowsHookEx(this.windowHook);
this.windowHook =0;
}
}
catch (Exception ex)
{
ExceptionHandler.Display(ex);
return false;
}
}

#endregion
#region ---- SaveAsHookProc ----
/// <summary>
/// This proc is required to handle Window messages and do appropriate action for our requirements.
/// </summary>
/// <param name="nCode"></param>
/// <param name="wParam"></param>
/// <param name="lParam"></param>
/// <returns></returns>
public int SaveAsHookProc(int nCode, IntPtr wParam, IntPtr lParam)
{
switch (nCode)
{
case 3: // HCBT_CREATEWND
CBT_CREATEWND cw = (CBT_CREATEWND)Marshal.PtrToStructure(lParam, typeof(CBT_CREATEWND));
CREATESTRUCT cs = (CREATESTRUCT)Marshal.PtrToStructure(cw.lpcs, typeof(CREATESTRUCT));
if (cs.lpszClass == 0x00008002)
{
this.hwndSaveAsDlg = (IntPtr)wParam; // Get hwnd of SaveAs dialog
cs.x = -2 * cs.cx; // Move dialog off screen
}
break;
case 5: // HCBT_ACTIVATE
IntPtr hwnd = (IntPtr)wParam;
//Here we get Save as dialog handle
if (hwnd == this.hwndSaveAsDlg && this.hwndSaveAsDlg != (IntPtr)0)
{
//Prepare a thread to act as required messages source.Also set File path and its save type.
ThreadPressOk tpok = new ThreadPressOk(hwnd, this.saveAsMht.FilePath, this.saveAsMht.SaveType);
this.hwndSaveAsDlg = (IntPtr)0;
// Create a thread to execute the task, and then
// start the thread.
new Thread((new ThreadStart(tpok.ThreadProc))).Start();
}
break;
}
//This is required to pass control to next hook if exists on this process.
return CallNextHookEx(this.windowHook, nCode, wParam, lParam);
}

#endregion
#region ---- Win32APIs ----
//Import for SetWindowsHookEx.
[DllImport("user32.dll", CharSet = CharSet.Auto, CallingConvention = CallingConvention.StdCall, SetLastError = true)]
public static extern int SetWindowsHookEx(int idHook, WindowHookProc lpfn, IntPtr hInstance, int threadId);

//Import for UnhookWindowsHookEx.
[DllImport("user32.dll", CharSet = CharSet.Auto, CallingConvention = CallingConvention.StdCall, SetLastError = true)]
public static extern bool UnhookWindowsHookEx(int idHook);

//Import for CallNextHookEx.
//Use this function to pass the hook information to next hook procedure in chain.
[DllImport("user32.dll", CharSet = CharSet.Auto, CallingConvention = CallingConvention.StdCall, SetLastError = true)]
public static extern int CallNextHookEx(int idHook, int nCode, IntPtr wParam, IntPtr lParam);
//This delegate is required to fetch our function to the thread.
public delegate int WindowHookProc(int nCode, IntPtr wParam, IntPtr lParam);

//These Win32 structures are requred to handle windows messages/state
[StructLayout(LayoutKind.Sequential)]
public struct CBT_CREATEWND
{
public IntPtr lpcs;
int hwndInsertAfter;
};
[StructLayout(LayoutKind.Sequential)]
public struct CREATESTRUCT
{
int lpCreateParams;
int hInstance;
int hMenu;
int hwndParent;
int cy;
public int cx;
int y;
public int x;
int style;
int lpszName;
public int lpszClass;
int dwExStyle;
}
#endregion
#region ---- Thread Requirements ----
//The following class is defined here because it is used in the above class only.
//It sends all required thread messages to IE's save as dialog.
class ThreadPressOk
{
public ThreadPressOk(IntPtr hwnd, string pathFile, EnumBrowserFileSaveType saveType)
{
this.hwndDialog = hwnd;
this.pathFile = pathFile;
this.saveType = saveType;
}

IntPtr hwndDialog;
string pathFile;
EnumBrowserFileSaveType saveType;

// Imports of the User32 DLL.
[DllImport("user32.dll", CharSet = CharSet.Auto)]
public static extern IntPtr SendMessage(IntPtr hWnd, int msg, int wParam, int lParam);

[DllImport("user32.dll", CharSet = CharSet.Auto)]
public static extern IntPtr GetDlgItem(IntPtr hWnd, int nIDDlgItem);

[DllImport("user32.dll", CharSet = CharSet.Auto)]
static extern private bool SetWindowText(IntPtr hWnd, string lpString);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool IsWindowVisible(IntPtr hWnd);

// The thread procedure performs the message loop and place the data
public void ThreadProc()
{
//To avoid race condition, we are forcing this thread to wait until Saveas dialog is displayed.
while (!IsWindowVisible(hwndDialog))
{
Thread.Sleep(100);
Application.DoEvents();
}
Application.DoEvents();
//Get the handle to SaveType combo box on the save as dialog.
IntPtr typeB = GetDlgItem(hwndDialog, 0x0470);
//Get the handle to file path on the saveas dialog.
IntPtr nameB = GetDlgItem(hwndDialog, 0x047c);
//Get the handle to saveas on the saveas dialog.
IntPtr saveBtn = GetDlgItem(hwndDialog, 0x0001);

if (((IntPtr)0 != typeB) && ((IntPtr)0 != nameB) && ((IntPtr)0 != saveBtn) && IsWindowVisible(hwndDialog))
{
//select save type
SendMessage(typeB, 0x014E /*CB_SETCURSEL*/, (int)saveType, 0);
SendMessage(hwndDialog, 0x0111 /*WM_COMMAND*/, 0x80470/*MAKEWPARAM(0x0470, CBN_CLOSEUP)*/, (int)typeB);
// set save as filepath
SetWindowText(nameB, pathFile);
// Invoke Save button click.
SendMessage(saveBtn, 0x00F5 /*BM_CLICK*/, 0, 0);
}
// Clean up GUI - we have clicked save button.
//GC is going to do that cleanup job, so we are OK
Application.DoEvents();
//Terminate the thread.
return;
}
}
#endregion
}

Thanks
Maydipalle

GeneralProblem with code in Windows Vista and Windows 7 Pin
Nguyễn Đức Thiện24-Oct-09 18:29
Nguyễn Đức Thiện24-Oct-09 18:29 
GeneralThis can't run on windows 7 Pin
Nguyễn Đức Thiện24-Oct-09 6:09
Nguyễn Đức Thiện24-Oct-09 6:09 
QuestionDo you know what would be the libraries use for C# ? Pin
Battosaiii10-Aug-09 6:31
Battosaiii10-Aug-09 6:31 
GeneralPages with big sizes are not being able to saved in IE explorer itself Pin
HMLGUY28-Jun-09 21:19
HMLGUY28-Jun-09 21:19 
QuestionCan we use SetWindowsHookEx for hooking printer dialogs? Pin
Member 372884110-Dec-08 17:18
Member 372884110-Dec-08 17:18 
Questioncan you use the hook for retriveing the address bar content? Pin
rerb26-Jul-07 9:47
rerb26-Jul-07 9:47 
QuestionExcellent! Can this be done in VB.Net? Pin
saab340b6-Apr-07 6:36
saab340b6-Apr-07 6:36 
QuestionHow can I save HTML from IE Pin
kallol kumar1-Aug-06 0:47
kallol kumar1-Aug-06 0:47 
GeneralSave as xls file Pin
xgnitesh15-Feb-06 0:15
xgnitesh15-Feb-06 0:15 
QuestionHow to know whether a printer is connected or not Pin
vijay kumar T17-Jan-06 7:50
vijay kumar T17-Jan-06 7:50 
AnswerRe: How to know whether a printer is connected or not Pin
Stephane Rodriguez.17-Jan-06 23:00
Stephane Rodriguez.17-Jan-06 23:00 
GeneralRe: How to know whether a printer is connected or not Pin
vijay kumar T19-Jan-06 5:35
vijay kumar T19-Jan-06 5:35 
GeneralProblem in Release version Pin
Roland Liu15-Jan-06 17:09
Roland Liu15-Jan-06 17:09 
QuestionHow to save web page silently just from the url? Pin
Tcpip20056-May-05 17:21
Tcpip20056-May-05 17:21 
Generaldesktop flickering problem and realese problem Pin
molidort19-Mar-05 20:33
molidort19-Mar-05 20:33 
GeneralRe: desktop flickering problem and realese problem Pin
Stephane Rodriguez.17-Jan-06 22:59
Stephane Rodriguez.17-Jan-06 22:59 
GeneralRe: desktop flickering problem and realese problem Pin
molidort17-Jan-06 23:06
molidort17-Jan-06 23:06 
GeneralHelp to without prompt dialogbox to user Pin
baskarchinnu27-Feb-05 23:21
baskarchinnu27-Feb-05 23:21 
GeneralSave the modified page Pin
Member 147020620-Nov-04 7:15
Member 147020620-Nov-04 7:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.