<!-- Download Links -->
<!-- Add the rest of your HTML here -->
My application allows limited editing of HTML pages using MSHTML. Each HTML
page is based on a template file and the range of things the end user can do to
that template file is limited. At no time is the user able to create an empty
So obviously there has to be a mechanism in my application to allow the user
to select which template a new page should be based on.
I wanted to present the user with a list of thumbnail images, each representing
a template page. In order to do that I had to devise a way of taking an HTML page and
converting it to an image. The alternative of presenting the user with a simple
listbox with the names of the templates is a tad too early 90's.
This article is the result.
A false start
Fortunately for me my application sets a specific size limit on page size. The entire
page must fit into an 800 by 600 frame without scrollbars.
My initial approach was to render the page using MSHTML, create a memory bitmap, get a handle to the
MSHTML display window and do a
BitBlt from the display window to my memory bitmap,
then scale and save the results.
It worked well but for one minor detail.
In order to render an HTML page into an image file using the
the page has to be visible on the screen.
BitBlt can only grab bits from a
device context that's had something drawn on it, and if the device context represents something
that's not actually visible on the screen the Windows
WM_PAINT optimisations kick
in and exclude those areas from the update region. The result is that MSHTML doesn't paint onto
those portions of a device context that aren't visible.
If you want to create images of something that's already on the screen well and good.
Otherwise, to create an image, you have to present that something on the screen. This makes
for an awful lot of flashing as one renders HTML pages to the screen for just long enough to grab their
Even so, I was almost happy with the result. The flashing didn't look too awful. I even ran it past
a few people, showing them what it looked like as it updated images and they didn't seem to mind it
too much. But it irked me. There had to be a better way.
A second approach
Some digging around in MSDN revealed the
interface. Sounds hopeful. It
has a member function called
that sounds like a perfect fit. Which it is indeed.
Once you obtain an
interface you can supply your own device context and
get MSHTML to render the element to it. And once you've done that it's trivial to scale and save to a
As you've probably guessed, it wasn't quite as simple as that.
I'm going to present the class a little differently this time. We'll start with a simple version of the
class (not present in the download) and add complexity to it as we encounter issues.
The simple version of CCreateHTMLImage
looks like this.
eBMP = 0,
BOOL SetSaveImageFormat(eOutputImageFormat format);
int GetEncoderClsid(const WCHAR* format, CLSID* pClsid);
static LPCTSTR m_ImageFormats[eImgSize];
This version of the class creates an image from an existing HTML document. The constructor initialises the saved image
format as a jpeg file (you can override this by calling
passing one of the
constants). The guts of the work is done in the
function which looks like this.
// Get our interfaces before we create anything else
IHTMLElement *pElement = (IHTMLElement *) NULL;
IHTMLElementRender *pRender = (IHTMLElementRender *) NULL;
// Let's be paranoid...
if (pDoc == (IHTMLElement *) NULL
if (pElement == (IHTMLElement *) NULL)
pElement->QueryInterface(IID_IHTMLElementRender, (void **) &pRender);
if (pRender == (IHTMLElementRender *) NULL)
CBitmapDC destDC(srcSize.cx, srcSize.cy);
CBitmap *pBM = destDC.Close();
Bitmap *gdiBMP = Bitmap::FromHBITMAP(HBITMAP(pBM->GetSafeHandle()), NULL);
Image *gdiThumb = gdiBMP->GetimageImage(outputSize.cx, outputSize.cy);
This takes a pointer to an
interface, an output filename and a couple of
You'd have obtained the
interface from a loaded HTML document in an instance of MSHTML somewhere in
your program. For example, if you wanted to create an image of the document in an app that used
obtain the interface by calling
on that view.
We do my usual bunch of
ASSERTs on the parameters. Then we get a pointer to an
interface that represents the body of the HTML document. Once we've got that we can do a
IHTMLElementRender interface which represents all the visual aspects of the document. We can't get the
interface directly from the document because the document isn't an element, it contains elements.
If we got this far without encountering an error it's time to create the device context we want to paint the document into.
For this I used Anneke Sicherer-Roetman's excellent
CBitmapDC class which you can find
srcSize object is used to set
the size of the destination device context. The
IHTMLElementRender::DrawToDC() function doesn't do scaling. If the
source HTML needs 1000 pixels of width to draw the entire horizontal extent but you pass it a device context only 500 pixels
wide you'll get only the left half of the HTML.
Once MSHTML has rendered our
IHTMLElement into the device context we create a GDI+
using the contents of the
CBitmapDC and then create an image from the
Bitmap object using the
outputSize object to specify the image dimensions. GDI+ takes care of scaling the full size image to the size
we want. A save, a bit of cleanup and we're done.
The other members of this class take care of the details of the saved image format and, since they're
they're of little interest unless we plan to derive a new class from this one.
GetEncoderClsid() function is taken from MSDN documentation and is used to get the correct image codec for
the image format we want.
Hang on a moment!
Surely this class presents exactly the same problems as the false start approach discussed above? It can only create an image
from an existing HTML document already on the screen. That's right. But this is the simple version of the class.
If we want to create an image from a document stored somewhere else (hard disk or intranet or internet) we have to do a
little more work. We have to load the document using MSHTML, get an
IHTMLDocument2 interface on the document and
then call our class to create the image.
The full version of CCreateHTMLImage
which is included in the download, looks like this.
class CCreateHTMLImage : public CWnd
CHILDBROWSER = 100,
eBMP = 0,
BOOL Create(CWnd *pParent);
BOOL SetSaveImageFormat(eOutputImageFormat format);
virtual BOOL CreateControlSite(
virtual void DocumentComplete(LPDISPATCH pDisp, VARIANT* URL);
int GetEncoderClsid(const WCHAR* format, CLSID* pClsid);
static LPCTSTR m_ImageFormats[eImgSize];
A few changes should jump out at you. The first is that the full version of the class is derived from
the simple version wasn't. This indicates that at least some of the changes I made to allow the conversion of an HTML document
to an image somehow involve the creation of a window. You don't yet know the half of it!
All functions that were present in the simple version of the class are unchanged in the full version. You'll see that I
CreateImage() overload. This one takes a source document name instead of an
IHTMLDocument2 interface pointer.
This new function is the reason I added all the new stuff to the full version of the class, so let's start with it and
Loading an external document
Initially I started out trying to use the
interface directly. Something like this.
IHTMLDocument2 *pDoc = (IHTMLDocument2 *) NULL;
(void**) &pDoc) == S_OK)
if (pDoc != (IHTMLDocument2 *) NULL)
// Do stuff
This works and we get a document interface we can work with. There's one small problem. There's no way to load a document
directly. We can call
to render a string containing HTML but that means we have to load
our document contents into a string. That'll work just fine with local files but what if you want to image a website on the
net? All I want is to create images - not write a full blown
Ok, scratch that approach. What about using an
IWebBrowser2 interface? The code to instantiate one is almost
identical to the preceding code snippet so I won't repeat it, just substitute
IWebBrowser2 wherever you see
IHTMLDocument2. To load a document we simply navigate to it using either
So I coded it up and tested. The
Navigate2() call returned success but the document didn't load. Or at least,
if it did, the interface's
ReadyState never changed to let me know it had finished. Obviously we can't go rendering
the document into a device context until we know it's loaded and indeed, querying the
IWebBrowser2 interface for
IHTMLDocument2 interface we need always returned a NULL interface pointer, indicating that the document doesn't
Repeating the test on a dummy application based on
CHtmlView reveals what we already knew. The
IWebBrowser2::ReadyState does change as the document loads and, once the document has finished
loading, we can query the
IWebBrowser2 interface for an
IHTMLDocument2 interface and get back a
valid interface pointer.
Hmm, so what are we doing differently? Well the first and most obvious difference is that we're instantiating an instance
IWebBrowser2 without a matching display window. As we'll see a little later, this window is quite
important to the
IWebBrowser2 interface even though nowhere in the documentation is this stated.
It was time to investigate how
CHtmlView does things.
is an MFC class. Fortunately we have the source code to MFC. That means we can go look at a working example of
something and figure out what we're doing wrong or not doing at all.
The first thing we find (in
afxhtml.h) is the class definition. There's a lot of stuff in there, most of which
doesn't concern us. What's of interest is a
CWnd member variable called
m_wndBrowser. Aha. We
CHtmlView is ultimately derived from
CWnd and is therefore already a window. So why does it
need a member variable of type
CWnd? Let's have a look at the relevant code in
see what's going on (
BOOL CHtmlView::Create(LPCTSTR lpszClassName, LPCTSTR lpszWindowName,
DWORD dwStyle, const RECT& rect, CWnd* pParentWnd,
UINT nID, CCreateContext* pContext)
// create the view window itself
m_pCreateContext = pContext;
if (!CView::Create(lpszClassName, lpszWindowName,
dwStyle, rect, pParentWnd, nID, pContext))
// assure that control containment is on
// create the control window
// AFX_IDW_PANE_FIRST is a safe but arbitrary ID
if (!m_wndBrowser.CreateControl(CLSID_WebBrowser, lpszWindowName,
WS_VISIBLE | WS_CHILD, rectClient, this, AFX_IDW_PANE_FIRST))
// cache the dispinterface
LPUNKNOWN lpUnk = m_wndBrowser.GetControlUnknown();
HRESULT hr = lpUnk->QueryInterface(IID_IWebBrowser2, (void**) &m_pBrowserApp);
m_pBrowserApp = NULL;
The view window creates itself and then creates a child control as an ActiveX object using the
identifier. If that succeeds it queries the child for the Web Browser's
IUnknown interface and uses that interface
to get an
IWebBrowser2 interface which it caches away for later use.
Ok, things are starting to fall into place. Instead of blindly creating an
IWebBrowser2 interface out of
thin air we should create an instance of the Web Browser control and get our
IWebBrowser2 interface from it.
First lesson from CHtmlView
Let's duplicate what
CHtmlView does and create our Web Browser control as a child control of our class.
We'll discuss why the extra level of indirection a little later in the article.
Our creation sequence is (if we want to create images for pages we haven't already got loaded in some instance of MSHTML
- Create an instance of the
- Call the
Create() method on the class.
- Call the
CreateImage() method once for each image we want to create
Once this is done we can call
on the Web Browser child window and expect the document to load. Which
it does. Let's have a look at the function.
CRect rect(CPoint(0, 0), srcSize);
// The WebBrowswer window size must be set to our srcSize
// else it won't render everything
COleVariant vUrl(szSrcFilename, VT_BSTR),
if (m_pBrowser->Navigate2(&vUrl, &vFlags, &vNull, &vPostData, &vNull) == S_OK)
// We have to pump messages to ensure the event handler (DocumentComplete)
// is called.
// We only get here when DocumentComplete has been called, which calls
// EndModalLoop and causes RunModalLoop to exit.
IDispatch *pDoc = (IDispatch *) NULL;
HRESULT hr = m_pBrowser->get_Document(&pDoc);
return CreateImage((IHTMLDocument2 *) pDoc, szDestFilename, srcSize, outputSize);
If we get through the gauntlet of my usual
checks on the input parameters we create a rectangle
with the dimensions implied by the
parameter and set the Web Browser to those dimensions. If we don't
set the Web Browser size correctly we won't get an image that accurately reflects the contents of the HTML document.
Then we set up a bunch of
objects with our source document name, some flags and call the
method. If that method succeeds we fall into a call to the
This is very important. My first stabs at this solution used a combination of
Sleep() and some polling to
try and determine when the document had finished loading. The result was deadlock. It turns out that once you've initiated a
Navigate2() operation (and many other operations on the Web Browser control) you have to let the message pump
run. The message pump can be the applications main pump but if you use that one you have no way to synchronously interact with
CCreateHTMLImage class. This won't matter if all you want to create is one image. But if you have a
list of images you want to create, one after the other, you have to wait for the first one to complete before you can even
start on the second one.
So we start the navigation and then drop into a
RunModalLoop(). Some time later the document will finish
loading and fire the
DocumentComplete() event. That event handler in our class does nothing much except call
ExitModalLoop(), which lets us fall out of the
RunModalLoop() function and allow processing in the
CCreateHTMLImage::CreateImage() function to continue. That processing consists of obtaining an
IHTMLDocument2 interface and calling code we've already discussed.
So why do we have an embedded Browser Window instead of being a Web Browser ourselves?
If you've made it this far you might be wondering why our class mimics
CHtmlView to the
extent of having an embedded
CWnd variable that is the real Web Browser control.
It wasn't obvious to me as I wrote the code that this would be necessary. I wrote the class so that it, itself, became
an instance of the Web Browser and was able to create images of HTML documents without those documents ever flashing up
on the screen. It all looked good.
Trouble in paradise
But on closer examination of the images I suddenly realised there were a couple of artifacts that shouldn't
have been there. Scrollbars!
If you've used
CHtmlView you know there's a function,
OnGetHostInfo() that gives you the
opportunity to modify visual aspects of the Web Browser, including whether the browser shows scrollbars or not. So now it was
time to dive back into the implementation of
CHtmlView, once again, to see how I could duplicate that
It turns out that the reason
CHtmlView embeds an instance of the Web Browser, instead of being an instance itself,
is so that it can act as the Web Browsers parent. This is important because it means that
CHtmlView (or our class)
can create an
COleControlSite to host the Web Browser control and respond to interface queries. If our class was
directly an instance of the Web Browser that would mean that queries from the Web Browser would be directed to our parent,
to code we don't necessarily control, and code that almost certainly doesn't know that we want to respond to the
GetHostInfo query with the answer that scrollbars oughtn't to be displayed.
I'll concede that you probably do have control over the parent object and could implement a
the parent. But that's the definitely the wrong place to do it. Why should the parent have to know that some arbitrary class
it uses needs a
COleControlSite, let alone the specifics of a response to a particular query?
Without going into an extended discussion of how
CWnd derived classes host ActiveX controls let's look at what
we have to implement in our class to make it all work. Our
Create() call creates the child ActiveX control by
CreateControl() specifying the
CLSID for the specific ActiveX control we want.
CreateControl() does various things including calling the virtual function
default implementation in
CWnd does nothing except return a NULL site pointer. When we look at
CHtmlViews implementation we see that it sets the site pointer to an instance of
Ok, let's go look at that class. It turns out to be derived from
COleControlSite (which shouldn't be a surprise)
but it also implements all the member functions of the
IDocHostUIHandler interface. This matters because the way
the Web Browser control interrogates it's container to determine UI states such as whether or not to show scroll bars is via a
QueryInterface() on it's parents automation interface, requesting an
Unfortunately we can't use
CHtmlControlSite in our class for two reasons. The first is that the class definition
viewhtml.cpp rather than in a header file we can include. We could work around that easily enough by
cutting and pasting the definition into our own header file. But it still won't work because the class has builtin knowledge
CHtmlView virtual methods. We could mess around in our class trying to duplcate the
structure to reuse
CHtmlControlSite but it's just not worth the trouble (to say nothing of being a maintenance
nightmare). Instead I cut and pasted the entire class definition and implementation, changed the name and modified the member
functions to do what I needed. In fact, the only member function I want doing anything is the
function. All the others do nothing (but must be implemented anyway).
Phew! That's a lot of work just to get rid of some scrollbars on a tiny thumbnail image. But remember that the class
can be used to capture full sized images simply by specifying the output image size. On a full sized image the scrollbars are
Using the class
Is almost trivial. Include the header file, declare an instance of
and use it. Remember that there
are two ways to use it. The first is when you want to capture an image of an existing page already rendered somewhere in your
cht.CreateImage(m_pDoc, csOutputFile, CSize(800, 600), CSize(80, 60));
which assumes that
is a pointer to an
interface. This example captures the image
at 800 by 600 but saves a thumbnail of 80 by 60.
The second way to use the class is when all you have is a filename or URL to the page you want to capture.
cht.CreateImage(csSourceFile, csOutputFile, CSize(800, 600), CSize(80, 60));
which does the same except that it takes care of loading the source file (or URL) and then captures the image output to a
function needs a pointer to a
derived object which must be a top-level window.
I use my
Oh, don't forget to initialise GDI+ - you can find out how in this excellent article[^]
The class uses some other code found on CodeProject.
4 April 2004 - Initial Version.
4 April 2004 - Updated the download to include the required header files.
9 April 2004 - Added a demo project written by Jubjub[^]
19 May 2004 - Updated the demo project.