Capture an HTML document as an image

Rob Manderson

4.88/5 (53 votes)

Apr 4, 2004

CPOL

15 min read

645100

12911

Capturing HTML documents as images

Download source files - 4 Kb
Download demo project - 44 Kb written by Jubjub[^]

Introduction

My application allows limited editing of HTML pages using MSHTML. Each HTML page is based on a template file and the range of things the end user can do to that template file is limited. At no time is the user able to create an empty HTML page.

So obviously there has to be a mechanism in my application to allow the user to select which template a new page should be based on.

I wanted to present the user with a list of thumbnail images, each representing a template page. In order to do that I had to devise a way of taking an HTML page and converting it to an image. The alternative of presenting the user with a simple listbox with the names of the templates is a tad too early 90's.

This article is the result.

A false start

Fortunately for me my application sets a specific size limit on page size. The entire page must fit into an 800 by 600 frame without scrollbars.

My initial approach was to render the page using MSHTML, create a memory bitmap, get a handle to the MSHTML display window and do a BitBlt from the display window to my memory bitmap, then scale and save the results.

It worked well but for one minor detail.

In order to render an HTML page into an image file using the BitBlt method the page has to be visible on the screen. BitBlt can only grab bits from a device context that's had something drawn on it, and if the device context represents something that's not actually visible on the screen the Windows WM_PAINT optimisations kick in and exclude those areas from the update region. The result is that MSHTML doesn't paint onto those portions of a device context that aren't visible.

If you want to create images of something that's already on the screen well and good. Otherwise, to create an image, you have to present that something on the screen. This makes for an awful lot of flashing as one renders HTML pages to the screen for just long enough to grab their bits via BitBlt.

Even so, I was almost happy with the result. The flashing didn't look too awful. I even ran it past a few people, showing them what it looked like as it updated images and they didn't seem to mind it too much. But it irked me. There had to be a better way.

A second approach

Some digging around in MSDN revealed the IHTMLElementRender interface. Sounds hopeful. It has a member function called DrawToDC() that sounds like a perfect fit. Which it is indeed. Once you obtain an IHTMLElementRender interface you can supply your own device context and get MSHTML to render the element to it. And once you've done that it's trivial to scale and save to a file.

As you've probably guessed, it wasn't quite as simple as that.

I'm going to present the class a little differently this time. We'll start with a simple version of the class (not present in the download) and add complexity to it as we encounter issues.

The simple version of CCreateHTMLImage

looks like this.

class CCreateHTMLImage
{
public:
    enum eOutputImageFormat
    {
        eBMP = 0,
        eJPG,
        eGIF,
        eTIFF,
        ePNG,
        eImgSize
    };

                    CCreateHTMLImage();
    virtual         ~CCreateHTMLImage();

    BOOL            SetSaveImageFormat(eOutputImageFormat format);

    BOOL            CreateImage(
                        IHTMLDocument2 *pDoc, 
                        LPCTSTR szDestFilename, 
                        CSize srcSize, 
                        CSize outputSize);

protected:
    int             GetEncoderClsid(const WCHAR* format, CLSID* pClsid);

private:
    static LPCTSTR  m_ImageFormats[eImgSize];
    CLSID           m_encoderClsid;
};

This version of the class creates an image from an existing HTML document. The constructor initialises the saved image format as a jpeg file (you can override this by calling SetSaveImageFormat() passing one of the eOutputImageFormat constants). The guts of the work is done in the CreateImage() member function which looks like this.

BOOL CCreateHTMLImage::CreateImage(
        IHTMLDocument2 *pDoc, 
        LPCTSTR szDestFilename, 
        CSize srcSize, 
        CSize outputSize)
{
    USES_CONVERSION;
    ASSERT(szDestFilename);
    ASSERT(AfxIsValidString(szDestFilename));
    ASSERT(pDoc);

    //  Get our interfaces before we create anything else
    IHTMLElement       *pElement = (IHTMLElement *) NULL;
    IHTMLElementRender *pRender = (IHTMLElementRender *) NULL;

    //  Let's be paranoid...
    if (pDoc == (IHTMLElement *) NULL
        return FALSE;

    pDoc->get_body(&pElement);

    if (pElement == (IHTMLElement *) NULL)
        return FALSE;

    pElement->QueryInterface(IID_IHTMLElementRender, (void **) &pRender);

    if (pRender == (IHTMLElementRender *) NULL)
        return FALSE;

    CFileSpec fsDest(szDestFilename);
    CBitmapDC destDC(srcSize.cx, srcSize.cy);

    pRender->DrawToDC(destDC);

    CBitmap *pBM = destDC.Close();
    
    Bitmap *gdiBMP = Bitmap::FromHBITMAP(HBITMAP(pBM->GetSafeHandle()), NULL);
    Image  *gdiThumb = gdiBMP->GetimageImage(outputSize.cx, outputSize.cy);

    gdiThumb->Save(T2W(fsDest.GetFullSpec()), &m_encoderClsid);
    delete gdiBMP;
    delete gdiThumb;
    delete pBM;
    return TRUE;
}

This takes a pointer to an IHTMLDocument2 interface, an output filename and a couple of CSize objects. You'd have obtained the IHTMLDocument2 interface from a loaded HTML document in an instance of MSHTML somewhere in your program. For example, if you wanted to create an image of the document in an app that used CHtmlView you'd obtain the interface by calling GetHTMLDocument() on that view.

We do my usual bunch of ASSERTs on the parameters. Then we get a pointer to an IHTMLElement interface that represents the body of the HTML document. Once we've got that we can do a QueryInterface() for an IHTMLElementRender interface which represents all the visual aspects of the document. We can't get the interface directly from the document because the document isn't an element, it contains elements.

If we got this far without encountering an error it's time to create the device context we want to paint the document into. For this I used Anneke Sicherer-Roetman's excellent CBitmapDC class which you can find here[^]. The srcSize object is used to set the size of the destination device context. The IHTMLElementRender::DrawToDC() function doesn't do scaling. If the source HTML needs 1000 pixels of width to draw the entire horizontal extent but you pass it a device context only 500 pixels wide you'll get only the left half of the HTML.

Once MSHTML has rendered our IHTMLElement into the device context we create a GDI+ Bitmap object using the contents of the CBitmapDC and then create an image from the Bitmap object using the outputSize object to specify the image dimensions. GDI+ takes care of scaling the full size image to the size we want. A save, a bit of cleanup and we're done.

The other members of this class take care of the details of the saved image format and, since they're protected they're of little interest unless we plan to derive a new class from this one.

The GetEncoderClsid() function is taken from MSDN documentation and is used to get the correct image codec for the image format we want.

Hang on a moment!

Surely this class presents exactly the same problems as the false start approach discussed above? It can only create an image from an existing HTML document already on the screen. That's right. But this is the simple version of the class.

If we want to create an image from a document stored somewhere else (hard disk or intranet or internet) we have to do a little more work. We have to load the document using MSHTML, get an IHTMLDocument2 interface on the document and then call our class to create the image.

The full version of CCreateHTMLImage

which is included in the download, looks like this.

class CCreateHTMLImage : public CWnd
{
protected:
    DECLARE_DYNCREATE(CCreateHTMLImage)
    DECLARE_EVENTSINK_MAP()
    enum eEnums
    {
        CHILDBROWSER = 100,
    };
public:
    enum eOutputImageFormat
    {
        eBMP = 0,
        eJPG,
        eGIF,
        eTIFF,
        ePNG,
        eImgSize
    };

                    CCreateHTMLImage();
    virtual         ~CCreateHTMLImage();

    BOOL            Create(CWnd *pParent);
    BOOL            SetSaveImageFormat(eOutputImageFormat format);

    BOOL            CreateImage(
                        IHTMLDocument2 *pDoc, 
                        LPCTSTR szDestFilename, 
                        CSize srcSize, 
                        CSize outputSize);
    BOOL            CreateImage(
                        LPCTSTR szSrcFilename, 
                        LPCTSTR szDestFilename, 
                        CSize srcSize, 
                        CSize outputSize);

protected:
    CComPtr m_pBrowser;
    CWnd            m_pBrowserWnd;

    virtual BOOL    CreateControlSite(
                        COleControlContainer* pContainer, 
                        COleControlSite** ppSite, 
                        UINT nID, 
                        REFCLSID clsid);
    virtual void    DocumentComplete(LPDISPATCH pDisp, VARIANT* URL);
    int             GetEncoderClsid(const WCHAR* format, CLSID* pClsid);

private:
    static LPCTSTR  m_ImageFormats[eImgSize];
    CLSID           m_encoderClsid;
};

A few changes should jump out at you. The first is that the full version of the class is derived from CWnd whereas the simple version wasn't. This indicates that at least some of the changes I made to allow the conversion of an HTML document to an image somehow involve the creation of a window. You don't yet know the half of it!

All functions that were present in the simple version of the class are unchanged in the full version. You'll see that I added another CreateImage() overload. This one takes a source document name instead of an IHTMLDocument2 interface pointer.

This new function is the reason I added all the new stuff to the full version of the class, so let's start with it and work outwards.

Loading an external document

Initially I started out trying to use the IHTMLDocument2 interface directly. Something like this.

IHTMLDocument2 *pDoc = (IHTMLDocument2 *) NULL;

if (CoCreateInstance(
        CLSID_HTMLDocument, 
        NULL, 
        CLSCTX_INPROC_SERVER, 
        IID_IHTMLDocument2, 
        (void**) &pDoc) == S_OK)
{
    if (pDoc != (IHTMLDocument2 *) NULL)
    {
        //  Do stuff
    }
}

This works and we get a document interface we can work with. There's one small problem. There's no way to load a document directly. We can call IHTMLDocument2::write() to render a string containing HTML but that means we have to load our document contents into a string. That'll work just fine with local files but what if you want to image a website on the net? All I want is to create images - not write a full blown http: protocol handler.

Ok, scratch that approach. What about using an IWebBrowser2 interface? The code to instantiate one is almost identical to the preceding code snippet so I won't repeat it, just substitute IWebBrowser2 wherever you see IHTMLDocument2. To load a document we simply navigate to it using either IWebBrowser2::Navigate() or IWebBrowser2::Navigate2().

So I coded it up and tested. The Navigate2() call returned success but the document didn't load. Or at least, if it did, the interface's ReadyState never changed to let me know it had finished. Obviously we can't go rendering the document into a device context until we know it's loaded and indeed, querying the IWebBrowser2 interface for the IHTMLDocument2 interface we need always returned a NULL interface pointer, indicating that the document doesn't yet exist.

Repeating the test on a dummy application based on CHtmlView reveals what we already knew. The IWebBrowser2::ReadyState does change as the document loads and, once the document has finished loading, we can query the IWebBrowser2 interface for an IHTMLDocument2 interface and get back a valid interface pointer.

Hmm, so what are we doing differently? Well the first and most obvious difference is that we're instantiating an instance of IWebBrowser2 without a matching display window. As we'll see a little later, this window is quite important to the IWebBrowser2 interface even though nowhere in the documentation is this stated.

It was time to investigate how CHtmlView does things.

CHtmlView

is an MFC class. Fortunately we have the source code to MFC. That means we can go look at a working example of something and figure out what we're doing wrong or not doing at all.

The first thing we find (in afxhtml.h) is the class definition. There's a lot of stuff in there, most of which doesn't concern us. What's of interest is a CWnd member variable called m_wndBrowser. Aha. We know that CHtmlView is ultimately derived from CWnd and is therefore already a window. So why does it need a member variable of type CWnd? Let's have a look at the relevant code in CHtmlView::Create() to see what's going on (viewhtml.cpp).

BOOL CHtmlView::Create(LPCTSTR lpszClassName, LPCTSTR lpszWindowName,
                        DWORD dwStyle, const RECT& rect, CWnd* pParentWnd,
                        UINT nID, CCreateContext* pContext)
{
    // create the view window itself
    m_pCreateContext = pContext;

    if (!CView::Create(lpszClassName, lpszWindowName,
                dwStyle, rect, pParentWnd,  nID, pContext))
    {
        return FALSE;
    }

    // assure that control containment is on
    AfxEnableControlContainer();

    RECT rectClient;
    GetClientRect(&rectClient);

    // create the control window
    // AFX_IDW_PANE_FIRST is a safe but arbitrary ID
    if (!m_wndBrowser.CreateControl(CLSID_WebBrowser, lpszWindowName,
                WS_VISIBLE | WS_CHILD, rectClient, this, AFX_IDW_PANE_FIRST))
    {
        DestroyWindow();
        return FALSE;
    }

    // cache the dispinterface
    LPUNKNOWN lpUnk = m_wndBrowser.GetControlUnknown();
    HRESULT hr = lpUnk->QueryInterface(IID_IWebBrowser2, (void**) &m_pBrowserApp);

    if (!SUCCEEDED(hr))
    {
        m_pBrowserApp = NULL;
        m_wndBrowser.DestroyWindow();
        DestroyWindow();
        return FALSE;
    }

    return TRUE;
}

The view window creates itself and then creates a child control as an ActiveX object using the CLSID_WebBrowser identifier. If that succeeds it queries the child for the Web Browser's IUnknown interface and uses that interface to get an IWebBrowser2 interface which it caches away for later use.

Ok, things are starting to fall into place. Instead of blindly creating an IWebBrowser2 interface out of thin air we should create an instance of the Web Browser control and get our IWebBrowser2 interface from it.

First lesson from CHtmlView

Let's duplicate what CHtmlView does and create our Web Browser control as a child control of our class. We'll discuss why the extra level of indirection a little later in the article.

Our creation sequence is (if we want to create images for pages we haven't already got loaded in some instance of MSHTML somewhere):

Create an instance of the CCreateHTMLImage class.
Call the Create() method on the class.
Call the CreateImage() method once for each image we want to create

Once this is done we can call Navigate2() on the Web Browser child window and expect the document to load. Which it does. Let's have a look at the function.

BOOL CCreateHTMLImage::CreateImage(
            LPCTSTR szSrcFilename, 
            LPCTSTR szDestFilename, 
            CSize srcSize, 
            CSize outputSize)
{
    ASSERT(GetSafeHwnd());
    ASSERT(IsWindow(GetSafeHwnd()));
    ASSERT(szSrcFilename);
    ASSERT(AfxIsValidString(szSrcFilename));
    ASSERT(szDestFilename);
    ASSERT(AfxIsValidString(szDestFilename));

    CRect rect(CPoint(0, 0), srcSize);

    //  The WebBrowswer window size must be set to our srcSize
    //  else it won't render everything
    MoveWindow(&rect);
    m_pBrowserWnd.MoveWindow(&rect);

    COleVariant   vUrl(szSrcFilename, VT_BSTR),
                  vFlags(long(navNoHistory | 
                              navNoReadFromCache | 
                              navNoWriteToCache), VT_I4),
                  vNull(LPCTSTR(NULL), VT_BSTR);
    COleSafeArray vPostData;

    if (m_pBrowser->Navigate2(&vUrl, &vFlags, &vNull, &vPostData, &vNull) == S_OK)
        //  We have to pump messages to ensure the event handler (DocumentComplete)
        //  is called.
        RunModalLoop();
    else
        return FALSE;

    //  We only get here when DocumentComplete has been called, which calls 
    //  EndModalLoop and causes RunModalLoop to exit.
    IDispatch *pDoc = (IDispatch *) NULL;
    HRESULT   hr = m_pBrowser->get_Document(&pDoc);

    if (FAILED(hr))
        return FALSE;

    return CreateImage((IHTMLDocument2 *) pDoc, szDestFilename, srcSize, outputSize);
}

If we get through the gauntlet of my usual ASSERT checks on the input parameters we create a rectangle with the dimensions implied by the srcSize parameter and set the Web Browser to those dimensions. If we don't set the Web Browser size correctly we won't get an image that accurately reflects the contents of the HTML document. Then we set up a bunch of COleVariant objects with our source document name, some flags and call the Navigate2() method. If that method succeeds we fall into a call to the CWnd::RunModalLoop() function.

This is very important. My first stabs at this solution used a combination of Sleep() and some polling to try and determine when the document had finished loading. The result was deadlock. It turns out that once you've initiated a Navigate2() operation (and many other operations on the Web Browser control) you have to let the message pump run. The message pump can be the applications main pump but if you use that one you have no way to synchronously interact with the CCreateHTMLImage class. This won't matter if all you want to create is one image. But if you have a list of images you want to create, one after the other, you have to wait for the first one to complete before you can even start on the second one.

So we start the navigation and then drop into a RunModalLoop(). Some time later the document will finish loading and fire the DocumentComplete() event. That event handler in our class does nothing much except call ExitModalLoop(), which lets us fall out of the RunModalLoop() function and allow processing in the CCreateHTMLImage::CreateImage() function to continue. That processing consists of obtaining an IHTMLDocument2 interface and calling code we've already discussed.

So why do we have an embedded Browser Window instead of being a Web Browser ourselves?

If you've made it this far you might be wondering why our class mimics CHtmlView to the extent of having an embedded CWnd variable that is the real Web Browser control.

It wasn't obvious to me as I wrote the code that this would be necessary. I wrote the class so that it, itself, became an instance of the Web Browser and was able to create images of HTML documents without those documents ever flashing up on the screen. It all looked good.

Trouble in paradise

But on closer examination of the images I suddenly realised there were a couple of artifacts that shouldn't have been there. Scrollbars!

If you've used CHtmlView you know there's a function, OnGetHostInfo() that gives you the opportunity to modify visual aspects of the Web Browser, including whether the browser shows scrollbars or not. So now it was time to dive back into the implementation of CHtmlView, once again, to see how I could duplicate that functionality.

It turns out that the reason CHtmlView embeds an instance of the Web Browser, instead of being an instance itself, is so that it can act as the Web Browsers parent. This is important because it means that CHtmlView (or our class) can create an COleControlSite to host the Web Browser control and respond to interface queries. If our class was directly an instance of the Web Browser that would mean that queries from the Web Browser would be directed to our parent, to code we don't necessarily control, and code that almost certainly doesn't know that we want to respond to the GetHostInfo query with the answer that scrollbars oughtn't to be displayed.

I'll concede that you probably do have control over the parent object and could implement a COleControlSite in the parent. But that's the definitely the wrong place to do it. Why should the parent have to know that some arbitrary class it uses needs a COleControlSite, let alone the specifics of a response to a particular query?

Without going into an extended discussion of how CWnd derived classes host ActiveX controls let's look at what we have to implement in our class to make it all work. Our Create() call creates the child ActiveX control by calling CreateControl() specifying the CLSID for the specific ActiveX control we want.

CreateControl() does various things including calling the virtual function CreateControlSite(). The default implementation in CWnd does nothing except return a NULL site pointer. When we look at CHtmlViews implementation we see that it sets the site pointer to an instance of CHtmlControlSite. Ok, let's go look at that class. It turns out to be derived from COleControlSite (which shouldn't be a surprise) but it also implements all the member functions of the IDocHostUIHandler interface. This matters because the way the Web Browser control interrogates it's container to determine UI states such as whether or not to show scroll bars is via a QueryInterface() on it's parents automation interface, requesting an IDocHostUIHandler interface

Unfortunately we can't use CHtmlControlSite in our class for two reasons. The first is that the class definition appears in viewhtml.cpp rather than in a header file we can include. We could work around that easily enough by cutting and pasting the definition into our own header file. But it still won't work because the class has builtin knowledge of CHtmlView virtual methods. We could mess around in our class trying to duplcate the vtable structure to reuse CHtmlControlSite but it's just not worth the trouble (to say nothing of being a maintenance nightmare). Instead I cut and pasted the entire class definition and implementation, changed the name and modified the member functions to do what I needed. In fact, the only member function I want doing anything is the GetHostInfo() function. All the others do nothing (but must be implemented anyway).

Phew! That's a lot of work just to get rid of some scrollbars on a tiny thumbnail image. But remember that the class can be used to capture full sized images simply by specifying the output image size. On a full sized image the scrollbars are probably undesirable.

Using the class

Is almost trivial. Include the header file, declare an instance of CCreateHTMLImage and use it. Remember that there are two ways to use it. The first is when you want to capture an image of an existing page already rendered somewhere in your application.

CCreateHTMLImage cht;

cht.CreateImage(m_pDoc, csOutputFile, CSize(800, 600), CSize(80, 60));

which assumes that m_pDoc is a pointer to an IHTMLDocument2 interface. This example captures the image at 800 by 600 but saves a thumbnail of 80 by 60.

The second way to use the class is when all you have is a filename or URL to the page you want to capture.

CCreateHTMLImage cht;

cht.Create(this);
cht.CreateImage(csSourceFile, csOutputFile, CSize(800, 600), CSize(80, 60));

which does the same except that it takes care of loading the source file (or URL) and then captures the image output to a file. The Create() function needs a pointer to a CWnd derived object which must be a top-level window. I use my CMainFrame window.

Oh, don't forget to initialise GDI+ - you can find out how in this excellent article[^]

Dependencies

The class uses some other code found on CodeProject.

CBitmapDC[^]
CFileSpec[^]

History

4 April 2004 - Initial Version.

4 April 2004 - Updated the download to include the required header files.

9 April 2004 - Added a demo project written by Jubjub[^]

19 May 2004 - Updated the demo project.