Click here to Skip to main content
Click here to Skip to main content

Automated IE SaveAs MHTML

, 4 Sep 2002
Rate this:
Please Sign up or sign in to vote.
This article demonstrates how to automate IE's Save As functionality

Introduction

The purpose of this article is to show how to automate the fully fledged Save As HTML feature from Internet Explorer, which is normally hidden to those using the Internet Explorer API. Saving the current document as MHTML format is just one of the options available, including:

  • Save As MHTML (whole web page, images, ... in a single file)
  • Save As Full HTML (additional folder for images, ...)
  • Save HTML code only
  • Save As Text

Saving Silently as HTML Using the Internet Explorer API

In fact, the ability to save the current web page for storage without showing a single dialog box is already available to everyone under C++, using the following code, with an important restriction:

LPDISPATCH lpDispatch = NULL;
IPersistFile *lpPersistFile = NULL;

// m_ctrl is an instance of the Web Browser control
lpDispatch = m_ctrl.get_Document();
lpDispatch->QueryInterface(IID_IPersistFile, (void**)&lpPersistFile);

lpPersistFile->Save(L"c:\\htmlpage.html",0);
lpPersistFile->Release();
lpDispatch->Release();

(caption for code above) Saving HTML code only, without dialog boxes

The restriction is that we are talking about the HTML code only, not the web page. Of course, what is interesting is to gain access to full HTML archives with images and so on.

Because there is no "public" or known way to ask for this feature without showing one or more dialog boxes from Internet Explorer, what we are going to do is hook the operating system to listen all window creations, including the dialog boxes. Then we'll ask Internet Explorer for the feature and override the file path from the dialog boxes without being seen. Finally, we'll mimic the user clicking on the Save button to validate the dialog box and unhook ourselves. That's done!

Hooking Internet Explorer to Save As HTML without popping the dialog boxes

This was the short workflow, but there are a few tricks to get along and this article is a unique opportunity to go into detail. By the way, the code is rooted by an article from MS about how to customize Internet Explorer Printing by hooking the Print dialog boxes; see here or here. In our app, we have our own Save As feature:

m_wbSaveAs.Config( CString("c:\\htmlpage.mhtml"), SAVETYPE_ARCHIVE );
m_wbSaveAs.SaveAs();

// where the second parameter is the type of HTML needed :
typedef enum _SaveType
{
    SAVETYPE_HTMLPAGE = 0,
    SAVETYPE_ARCHIVE,
    SAVETYPE_HTMLONLY,
    SAVETYPE_TXTONLY
} SaveType;

We start the SaveAs() implementation by installing the hook:

// prepare SaveAs Dialog hook
//
g_hHook = SetWindowsHookEx(WH_CBT, CbtProc, NULL, GetCurrentThreadId());
if (!g_hHook)
    return false;

// make SaveAs Dialog appear
//
// cmd = OLECMDID_SAVEAS (see ./include/docobj.h)
g_bSuccess = false;
g_pWebBrowserSaveAs = this;
HRESULT hr = m_pWebBrowser->ExecWB(OLECMDID_SAVEAS, 
    OLECMDEXECOPT_PROMPTUSER, NULL, NULL);

// remove hook
UnhookWindowsHookEx(g_hHook);
g_pWebBrowserSaveAs = NULL;
g_hHook = NULL;

The hook callback procedure is just hardcore code; see for yourself:

LRESULT CALLBACK CSaveAsWebbrowser::CbtProc(int nCode, 
    WPARAM wParam, LPARAM lParam) 
{  
    // the windows hook sees for each new window being created :
    // - HCBT_CREATEWND : when the window is about to be created
    //      we check out if it is a dialog box (classid = 0x00008002, 
    //      see Spy++)
    //      and we hide it, likely to be the IE SaveAs dialog
    // - HCBT_ACTIVATE : when the window itself gets activited
    //      we run a separate thread, and let IE do his own init steps in 
    //      the mean time
    switch (nCode)
    {
        case HCBT_CREATEWND:
        {
            HWND hWnd = (HWND)wParam;
            LPCBT_CREATEWND pcbt = (LPCBT_CREATEWND)lParam;
            LPCREATESTRUCT pcs = pcbt->lpcs;
            if ((DWORD)pcs->lpszClass == 0x00008002)
            {
                g_hWnd = hWnd;          // Get hwnd of SaveAs dialog
                pcs->x = -2 * pcs->cx;  // Move dialog off screen
            }
            break;
        }    
        case HCBT_ACTIVATE:
        {
            HWND hwnd = (HWND)wParam;
            if (hwnd == g_hWnd)
            {
                g_hWnd = NULL;
                g_bSuccess = true;

                if (g_pWebBrowserSaveAs->IsSaveAsEnabled())
                {
                    g_pWebBrowserSaveAs->SaveAsDisable();

                    CSaveAsThread *newthread = new CSaveAsThread();
                    newthread->SetKeyWnd(hwnd);
                    newthread->Config( g_pWebBrowserSaveAs->GetFilename(), 
                        g_pWebBrowserSaveAs->GetSaveAsType() );
                    newthread->StartThread();
                }
            }
            break;
        }
    }
    return CallNextHookEx(g_hHook, nCode, wParam, lParam); 
}

In our thread, we wait until the Internet Explorer Save As dialog is ready with filled data:

switch(    ::WaitForSingleObject( m_hComponentReadyEvent, m_WaitTime) )
{
     ...
     if ( ::IsWindowVisible(m_keyhwnd) )
     {
         bSignaled = TRUE;
         bContinue = FALSE;
     }

     MSG msg ;
     while( PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) )
     {
         if (msg.message == WM_QUIT)
         {
              bContinue = FALSE ;
              break ;
         }
         TranslateMessage(&msg);
         DispatchMessage(&msg);
     }
     ...
}

// relaunch our SaveAs class, but now everything is ready to play with
if (bSignaled)
{
    CSaveAsWebbrowser surrenderNow;
    surrenderNow.Config( GetFilename(), GetSaveAsType() );
    surrenderNow.UpdateSaveAs( m_keyhwnd );
}

// kill the thread, we don't care anymore about it
delete this;

We can now override the appropriate data:

void CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
    // editbox : filepath (control id = 0x047c)
    // dropdown combo : filetypes (options=complete page;
    //     archive;html only;txt) (control id = 0x0470)
    // save button : control id = 0x0001
    // cancel button : control id = 0x0002


    // select right item in the combobox
    SendMessage(GetDlgItem(hwnd, 0x0470), CB_SETCURSEL, 
        (WPARAM) m_nSaveType, 0);
    SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470,CBN_CLOSEUP), 
        (LPARAM) GetDlgItem(hwnd, 0x0470));

    // set output filename
    SetWindowText(GetDlgItem(hwnd, 0x047c), m_szFilename);

    // Invoke Save button
    SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);  
}

In the code above, it is funny to remark that to select the kind of HTML we want (full HTML, archive, code only or text format), we not only select the adequate entry in the combo-box, we also send Internet Explorer a combo-box CloseUp notification. This is because that's what Internet Explorer has subscribed for to know we want this kind of HTML. This behavior is known by hints-and-trials.

Conclusion

This article describes a technique to gain access to the fully fledged Save As HTML feature exposed by Internet Explorer. I have never seen an article about this topic on the 'net, whereas it's easy to figure out that it is a compelling feature for developers building web applications. Files you may use from the source code provided are:

  • SaveAsWebBrowser.h, *.cpp: hook procedure; fill the dialog box data
  • SaveAsThread.h, *.cpp: auxiliary thread for synchronization with Internet Explorer

The application is just a simple MFC-based CHtmlView application embedding the web browser control.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Addicted to reverse engineering. At work, I am developing business intelligence software in a team of smart people (independent software vendor).
 
Need a fast Excel generation component? Try xlsgen.
 

Comments and Discussions

 
QuestionNow have this working with IE 9, VS 2010 and C++ (no CLI) [modified] PinmemberRene Pilon29-Jul-11 11:50 
AnswerRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) PinmemberMember 81411146-Aug-11 15:26 
GeneralRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) PinmemberRene Pilon7-Aug-11 11:43 
AnswerRe: Now have this working with IE 9, VS 2010 and C++ (no CLI) PinmemberMember 81411147-Aug-11 7:28 
Having taken a second look at this I have now solved my requests to a satisfactory degree.
 
1. In
void CSaveAsWebbrowser::AssignFileType(HWND hwnd)
{ 
   SendMessage(hwnd, CB_SETCURSEL, (WPARAM) 1, 0);     
   SendMessage(GetParent(hwnd), WM_COMMAND, MAKEWPARAM(GetWindowLong(hwnd, GWL_ID),CBN_CLOSEUP), (LPARAM) hwnd);
}
The first SendMessage() should read
SendMessage(hwnd, CB_SETCURSEL, (WPARAM) m_nSaveType, 0);
if you want your website to be saved in a format other than an archive.
 
2. As regards the order of the selection of the controls the Save as dialog box turned out to be rather bitchy.
Say you want your website to be saved under the name "Source.txt" with save type HTMLONLY then the program would fill in "Source.txt" under filename and then select "HTML only" from the combobox, thus changing the filename to "Source.htm".
So I tried to reverse the order by storing the windows handle for the edit control in a variable and performing the selection of the save type first. The filename gets filled in correctly but the dialog box doesn't use it and reverts to the default name "yahoo.htm" ("yahoo.com" being the website of the example code).
Second approach: Performing the EnumChildWindows() function twice, first for the combobox (works) then for the edit control (didn't do anything). Resulting filename: "yahoo.htm"
Back to method 1: Filling in the edit control twice: "Source.txt" (first edit) -> "Source.htm" (after combobox selection) -> "Source.txt" (second edit) -> "Source.htm" (resulting filename).
 
Obviously the edit control needs something else before accepting the submitted string (like the combobox needing a CBN_CLOSEUP message) but I haven't figured out what this might be.
I did find out however, that performing a "Select All" (Ctrl+A) on the edit control does the trick.
 
Well, maybe not the most elegant solution by someone who knows what he is doing and what is going on in the internals of MS Windows, but here are the code changes:
 
Add the variable m_hFileNameEditWnd of type HWND as a public member to the class CSaveAsWebbrowser.
 
Add 3 lines to UpdateSaveAs()
bool CSaveAsWebbrowser::UpdateSaveAs(HWND hwnd)
{
  // editbox : filepath (control id = 0x047c)
  // dropdown combo : filetypes (options=complete page;archive;html only;txt) (control id = 0x0470)
  // save button : control id = 0x0001
  // cancel button : control id = 0x0002

  try
  {
    if(hwnd)
      m_bUpdateUI = TRUE;
 
    // select right item in the combobox
    HWND hSaveTypeComboWnd = GetDlgItem(hwnd, 0x0470);
    if(IsWindow(hSaveTypeComboWnd))
    {
      // Old Style
      SendMessage(hSaveTypeComboWnd, CB_SETCURSEL, (WPARAM) m_nSaveType, 0);
      SendMessage(hwnd, WM_COMMAND, MAKEWPARAM(0x0470, CBN_CLOSEUP), (LPARAM) hSaveTypeComboWnd);
      HWND hFileNameEditCtrl = GetDlgItem(hwnd, 0x047c);
      if(!hFileNameEditCtrl || !IsWindow(hFileNameEditCtrl))
        hFileNameEditCtrl = GetDlgItem(hwnd, 0x03e9);
      // set output filename
      if(hFileNameEditCtrl && IsWindow(hFileNameEditCtrl))
        ExchangeEditText(hFileNameEditCtrl, m_szFilename);
    }
    else
    {
      HWND hWnd1 = FindWindowEx(hwnd,0,"DUIViewWndClassName","");
      if(hWnd1)
      {
        HWND hWnd2 = FindWindowEx(hWnd1,0,"DirectUIHWND","");
        if(hWnd2)
        {
          m_hFileNameEditWnd = NULL;
          EnumChildWindows(hWnd2, FloatNotifySinkChildEnumProc, (LPARAM) this);
          if(m_hFileNameEditWnd)
            AssignFileName(m_hFileNameEditWnd);
        }
      }
    }
    if (m_bUpdateUI)
      SendMessage(GetDlgItem(hwnd, 0x0001), BM_CLICK, 0, 0);   // Invoke Save button
    else
      SendMessage(GetDlgItem(hwnd, 0x0002), BM_CLICK, 0, 0);   // Invoke Cancel button
  }
  catch(...){;}
  return true;
}
Change 1 line of FloatNotifySinkChildEnumProc()
BOOL CALLBACK FloatNotifySinkChildEnumProc(HWND hwnd, LPARAM lParam)
{
  if (GetWindow(hwnd, GW_OWNER))            // Check for icon title
    return TRUE;
  CSaveAsWebbrowser *saWb = (CSaveAsWebbrowser *) lParam;
  if(saWb)
  {
    // 1st, check to see if this window is of ComboBox class
    char szWndClassName[MAX_PATH];
    memset(&szWndClassName, 0, MAX_PATH);
    GetClassNameA(hwnd, (LPSTR)&szWndClassName, MAX_PATH);
    if(lstrcmp((LPCSTR)&szWndClassName, "FloatNotifySink") == 0)
    {
      HWND hChildControl = (HWND) GetWindow(hwnd, GW_CHILD);
      if(hChildControl)
      {
        char szChildClassName[MAX_PATH];
        memset(&szChildClassName, 0, MAX_PATH);
        GetClassName(hChildControl, (LPSTR) &szChildClassName, MAX_PATH);
        if(lstrcmpi("COMBOBOX", (LPSTR)&szChildClassName) == 0)
        {
          // if it is a combobox - see if it has a "Edit" child window, if it does - it's the filename window, otherwise - it's the file type window
          HWND hFileNameEditWnd = FindWindowEx(hChildControl, 0, "Edit", "");
          if(hFileNameEditWnd)
            saWb->m_hFileNameEditWnd = hFileNameEditWnd;
          else
          {
            /* Uncomment to peek at the current combobox's selection and selected text
            int iSelected = 0;
            iSelected = (int) SendMessage(hChildControl, CB_GETCURSEL, 0, 0);
            if(iSelected != CB_ERR)
            {
              int iBufLen = SendMessage(hChildControl, CB_GETLBTEXTLEN, 0, 0);
              int iMallocLen = (iBufLen * 2) + 2;
              char *szBuffer = (char *) malloc(iMallocLen);
              if(szBuffer)
              {
                memset(szBuffer, 0, iMallocLen);
                SendMessage(hChildControl, CB_GETLBTEXT, (WPARAM)iSelected, (LPARAM)(LPSTR)szBuffer);
                free(szBuffer);
              }                  
            }*/
            saWb->AssignFileType(hChildControl);
          }
        }
      }
    }
  }
  return TRUE;
}
Finally to get AssignFileName() to send Ctrl+A:
void CSaveAsWebbrowser::AssignFileName(HWND hwnd)
{
	ExchangeEditText(hwnd, m_szFilename);
	SendMessage(hwnd, WM_KEYDOWN, 0x00000011, 0x001D0001); // 'CTRL'
	SendMessage(hwnd, WM_KEYDOWN, 0x00000041, 0x001E0001); // 'A'
	SendMessage(hwnd, WM_CHAR, 0x00000001, 0x001E0001);
	SendMessage(hwnd, WM_KEYUP, 0x00000041, 0xC01E0001);
	SendMessage(hwnd, WM_KEYUP, 0x00000011, 0xC01D0001);
}

QuestionAutomated IE SaveAs on C# Pinmemberjgiami240212-Jul-11 7:55 
GeneralHere is perfectly working C# code PinmemberMadhava maydipalle19-Nov-09 4:05 
GeneralProblem with code in Windows Vista and Windows 7 PinmemberNguyễn Đức Thiện24-Oct-09 18:29 
GeneralThis can't run on windows 7 PinmemberNguyễn Đức Thiện24-Oct-09 6:09 
QuestionDo you know what would be the libraries use for C# ? PinmemberBattosaiii10-Aug-09 6:31 
GeneralPages with big sizes are not being able to saved in IE explorer itself PinmemberHMLGUY28-Jun-09 21:19 
QuestionCan we use SetWindowsHookEx for hooking printer dialogs? PinmemberMember 372884110-Dec-08 17:18 
Questioncan you use the hook for retriveing the address bar content? Pinmemberrerb26-Jul-07 9:47 
QuestionExcellent! Can this be done in VB.Net? Pinmembersaab340b6-Apr-07 6:36 
QuestionHow can I save HTML from IE Pinmemberkallol kumar1-Aug-06 0:47 
GeneralSave as xls file Pinmemberxgnitesh15-Feb-06 0:15 
QuestionHow to know whether a printer is connected or not Pinmembervijay kumar T17-Jan-06 7:50 
AnswerRe: How to know whether a printer is connected or not PinmemberStephane Rodriguez.17-Jan-06 23:00 
GeneralRe: How to know whether a printer is connected or not Pinmembervijay kumar T19-Jan-06 5:35 
GeneralProblem in Release version PinmemberRoland Liu15-Jan-06 17:09 
QuestionHow to save web page silently just from the url? PinmemberTcpip20056-May-05 17:21 
Generaldesktop flickering problem and realese problem PinmemberLibi123419-Mar-05 20:33 
GeneralRe: desktop flickering problem and realese problem PinmemberStephane Rodriguez.17-Jan-06 22:59 
GeneralRe: desktop flickering problem and realese problem PinmemberLibi123417-Jan-06 23:06 
GeneralHelp to without prompt dialogbox to user Pinmemberbaskarchinnu27-Feb-05 23:21 
GeneralSave the modified page PinmemberBerkeley Wong20-Nov-04 7:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.141022.2 | Last Updated 5 Sep 2002
Article Copyright 2002 by Stephane Rodriguez.
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid