Rendering an IHTMLElement to an Image File Using GDI+

Han Bo Sun

Rate me:

4.73/5 (11 votes)

4 Dec 20068 min read

110.8K

1.2K

Capture an HTML document as an image and save it.

Download demo project - 28.4 Kb

Introduction

I recently found an article called "Capture an HTML Document as an Image". I really liked the article. However, there are two things I didn't like about it. One thing I didn't like is the implementation of rendering the image to a file. The article offers a solution using a third party GDI wrapper class to handle the image rendering. I thought it would be cleaner to do the same thing via GDI+. The second thing I didn't like is that the article failed to show how the width and height of the snapshot is calculated. In fact, when I did my experimental implementation, I ran into this issue. I got an answer for this.

The biggest challenge I encountered is that I couldn't remember how to render an image to file using GDI+. The Image object can then use its method Save() to store its data into file. I went back to online research again. Two hours later, I found another article that discussed how to add a water mark on to an image, then save the image back to a file. This article is called "Creating a Watermarked Photograph with GDI+ for .NET". Combining the understanding of both articles, I was able to create a clean solution. I am ready to share this to everyone.

What is the Good Use of This Article

It depends on how you view the situation. I think the best use of this code is to take a snapshot of a webpage and store as an image for viewing later. This use can be applied to automated web testing, so that from time to time, the automated test can take snapshots of a web page for a tester to verify the automated test progress.

How the Design Works

Whenever I write code, especially when I attempt to solve some complicated development problem, I would write the method that contains all the code that is needed to solve the problem, then re-factor the code into smaller blocks. The goal of re-factoring the code is so that the original code block might become objects that are loosely coupled and highly cohesive. No more wasting of time, let's go see the first implementation:

The Source Code of the First Implementation

C++

void CMainFrame::OnBnClickedButtonSnapshot()
{
   // TODO: Add your control notification handler code here
   _TCHAR BASED_CODE szFilter[] = 
      _T("JPEG Files (*.jpg;*.jpeg)" ) 
      _T("|*.jpg; *.jpeg|All Files (*.*)|*.*||");
   CFileDialog dlg(FALSE, _T("*.jpg; *.jpeg"), _T(""), 
                   OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, 
                   szFilter, this, 0);
   if (dlg.DoModal() != IDOK)
   {
      return;
   }

   CString sFileName = dlg.GetPathName();

   CHtmlView* pView = (CHtmlView*)this->GetActiveView();
   // should use reinterpret_cast.

   CComPtr<IDispatch> spDisp(pView->GetHtmlDocument());
   CComPtr<IHTMLDocument2> spDoc;
   if (FAILED(spDisp->QueryInterface(IID_IHTMLDocument2, 
                                    (void**)&spDoc)))
   {
      AfxMessageBox(_T("Unable to get the HTML Document off the browser."));
      return;
   }

   CComPtr<IHTMLElement> spBody;
   if (FAILED(spDoc->get_body(&spBody)))
   {
      AfxMessageBox(_T("Unable to get the body of the HTML Document."));
      return;
   }

   CComPtr<IHTMLElementRender> spElemRender;
   if (FAILED(spBody->QueryInterface(IID_IHTMLElementRender, 
             (void**)&spElemRender)))
   {
      AfxMessageBox(_T("Unable to create render of the body element."));
      return;
   }

   long cx=0, cy=0;
   spBody->get_offsetWidth(&cx);
   spBody->get_offsetHeight(&cy);

   Bitmap myBmp(cx, cy);
   Graphics g(&myBmp);
   HDC mydc = g.GetHDC();
   if (mydc != NULL)
   {
      spElemRender->DrawToDC(mydc);
      g.ReleaseHDC(mydc);
   }

   CLSID jpegClsid;
   GetEncoderClsid(_T("image/jpeg"), &jpegClsid);
   myBmp.Save((LPCTSTR)sFileName, &jpegClsid, NULL);
}

Additional Functions Used

I also used a function called GetEncoderClsid(). This function is home brewed. I got it from MSDN. Here is what it looks like:

C++

int GetEncoderClsid(const WCHAR* format, CLSID* pClsid)
{
   UINT  num = 0;  // number of image encoders
   UINT  size = 0; // size of the image encoder array in bytes

   ImageCodecInfo* pImageCodecInfo = NULL;

   GetImageEncodersSize(&num, &size);
   if(size == 0)
      return -1;  // Failure

   pImageCodecInfo = (ImageCodecInfo*)(malloc(size));
   if(pImageCodecInfo == NULL)
      return -1;  // Failure

   GetImageEncoders(num, size, pImageCodecInfo);

   for(UINT j = 0; j < num; ++j)
   {
      if( wcscmp(pImageCodecInfo[j].MimeType, format) == 0 )
      {
         *pClsid = pImageCodecInfo[j].Clsid;
         free(pImageCodecInfo);
         return j;  // Success
      }
   }

   free(pImageCodecInfo);
   return -1;  // Failure
}

Code Walkthrough

After I read the two articles I found (listed in the "Introduction"), I did some trial-and-error type of coding. They were really bad code. I am not going to show them here since they are lost to oblivion (they are replaced by the first iteration of implementation). In the end, I figured out how the whole process works. Here is how it works:

Use the CHTMLView in the MFC project. This helps us to get access to the web page.
Do some COM related operations to get the reference to the IHTMLElement of the entire web page. I will explain how this will be done later, if you have not figured out this already.
Get a reference to the IHTMLelementRender of the entire web page as an IHTMLElement object.
Initialize GDI+.
Create a GDI+ Bitmap object. This object should have the size of the web page. When it is created, the Bitmap has nothing on it. If you save it, you will see that it is a bitmap filled with black color.
Create a GDI+ Graphics object. When you do the creation, you could pass in the address of the Bitmap object into the constructor of the Graphics object. What this does is to associate the Bitmap object with the Graphics object. I will explain this more later.
Get the direct access of the device context of the Graphics object. This is done by calling the GetDC() method of the Graphics object.
On the device context, use the IHTMLelementRender object's DrawToDC() method to render the HTML element to the device context.
The program releases the device context; the drawing is completed.
Finally, use the Bitmap object's Save() method to save the drawing to the file. The operation is completed.

Note that all the methods from the classes of GDI+ uses wide characters. In the sample code above, I used _T("string value"). Basically, what I have done is set my project to UNICODE rather than multi-bytes. This forces the compiler and linker to use wide characters for my application instead of using multi-byte characters.

Why Associate the Bitmap with the Graphics Object

I will give you a vague idea why it is necessary to associate the bitmap with the Graphics object. You should think the two objects with two separate responsibilities. It helps with some imaginative association. First, you consider the Bitmap object as the canvas, and the Graphics object as a human painter. The Bitmap object provides a rendering context in which the drawing can be applied, be stored, and viewed later. The Graphics object provides the rendering operations to the canvas, the Bitmap object. In addition to providing the rendering context, the Bitmap object also provides the means to save the rendering context to disk.

Sample Snapshot Using the Test App

Let me show you a sample snapshot I took on the Yahoo! home page:

The Implementation After Code Re-factoring

Honestly, after the code re-factoring, I was not satisfied with the implementation details. Since this is a tutorial, I don't believe I have to clean up the code to the point that the code looks production ready. What I have done is take the first implementation and separate it to pieces so that each piece becomes independent. Although there are dependencies between the pieces, each piece is not as closely coupled as before. Now I will explain them one piece at a time. The first piece is the pseudo factory that can be used to manufacture a CLSID for different image types. It looks like this:

C++

// header ImageRender.h
BOOL GetEncoderClsid(LPCWSTR format, CLSID* pClsid);

class ImageFormatFactory
{
public:
    static enum IMAGEFORMAT { JPEG, GIF, TIFF, BMP };
    static BOOL GetFormatCLSID(IMAGEFORMAT fmt,
        CLSID* CLSIDVal);
};

....
// source file: ImageRender.cpp
BOOL GetEncoderClsid(LPCWSTR format, CLSID* pClsid)
{
   UINT  num = 0;          // number of image encoders
   UINT  size = 0;         // size of the image encoder array in bytes

   ImageCodecInfo* pImageCodecInfo = NULL;

   GetImageEncodersSize(&num, &size);
   if(size == 0)
   {
      return FALSE;
   }

   pImageCodecInfo = (ImageCodecInfo*)(malloc(size));
   if(pImageCodecInfo == NULL)
   {
      return FALSE;
   }

   GetImageEncoders(num, size, pImageCodecInfo);

   for(UINT j = 0; j < num; ++j)
   {
      if(wcscmp(pImageCodecInfo[j].MimeType, format) == 0)
      {
         *pClsid = pImageCodecInfo[j].Clsid;
         free(pImageCodecInfo);
         return TRUE;  // Success
      }    
   }

   free(pImageCodecInfo);
   return FALSE;  // Failure
}

///////////////////////////////////////////////////
BOOL ImageFormatFactory::GetFormatCLSID(ImageFormatFactory::IMAGEFORMAT fmt,
   CLSID* CLSIDVal)
{
   // for my own project and for the sake of demonstration, we
   // only support 4 types for now.
   BOOL retVal = FALSE;
   switch(fmt)
   {
   case JPEG:
      retVal = GetEncoderClsid(L"image/jpeg", CLSIDVal);
      break;
   case GIF:
      retVal = GetEncoderClsid(L"image/gif", CLSIDVal);
      break;
   case TIFF:
      retVal = GetEncoderClsid(L"image/tiff", CLSIDVal);
      break;
   case BMP:
      retVal = GetEncoderClsid(L"image/bmp", CLSIDVal);
      break;
   default:
      retVal = FALSE;
      break;
   }

   return retVal;
}

I extended the original design by adding the support of returning different CLSIDs for different image file formats. The first implementation only supports JPEG image files. The implementation shows that it is easy to use ImageFormatFactory::GetFormatCLSID to choose four types of CLSID for image file formats than using the GetEncoderClsid() that takes a string parameter. What I can do with the above implementation is test each layer to make sure each of them works correctly, and integrate correctly to limit the user of the implementation to use ImageFormatFactory::GetFormatCLSID(), to make the implementation a bit safer. I might want to remove the declaration of GetEncoderClsid from the header file. But I like to make it available to the user so that the user can choose not to use ImageFormatFactory::GetFormatCLSID(), instead use GetEncoderClsid() to choose additional formats. This is a dangerous thing to do. It is easily fixed. You should note that the above implementation of a factory is not a very good one. It is only sufficient to get to the result I needed.

Next, I wrote a class called ImageRender. It wraps the functionality of GDI+ and only exposes enough interfaces for an outside class to do the rendering. It looks like this:

C++

// header ImageRender.h
class ImageRender
{
private:
   Bitmap* bmp;
   Graphics* g;
   HDC bmpHdc;

protected:
   Bitmap* GetBitmap();
   Graphics* GetGraphics();

public:
   ImageRender();
   ImageRender(int cx, int cy);
   virtual ~ImageRender();
   void Destroy();
   BOOL CreateImage(int cx, int cy);
   HDC GetDC();
   void ReleaseDC();
   BOOL SaveToFile(LPCWSTR fileName,
      const CLSID* clsidVal);
};

...

// source file: ImageRender.cpp
ImageRender::ImageRender()
   : bmp(NULL),
   g(NULL),
   bmpHdc(NULL)
{
}

ImageRender::ImageRender(int cx, int cy)
   : bmp(new Bitmap(cx, cy)),
   g(new Graphics(bmp)),
   bmpHdc(NULL)
{
}

ImageRender::~ImageRender()
{
   Destroy();
}

void ImageRender::Destroy()
{
   if (bmpHdc != NULL && bmp != NULL && g != NULL)
   {
      g->ReleaseHDC(bmpHdc);
      bmpHdc = NULL;
   }
    
   if (bmp != NULL)
   {
      delete bmp;
      bmp = NULL;
   }

   if (g != NULL)
   {
      delete g;
      g = NULL;
   }
}

BOOL ImageRender::CreateImage(int cx, int cy)
{
   if (bmp == NULL && g == NULL)
   {
      bmp = new Bitmap(cx, cy);
      g = new Graphics(bmp);
      return TRUE;
   }

   return FALSE;
}

HDC ImageRender::GetDC()
{
   if (g == NULL || bmp == NULL)
   {
      return NULL;
   }
   bmpHdc = g->GetHDC();
   return bmpHdc; 
}

void ImageRender::ReleaseDC()
{
   if (bmpHdc == NULL || g == NULL || bmp == NULL)
   {
      return;
   }
   g->ReleaseHDC(bmpHdc);
   bmpHdc = NULL;
}

BOOL ImageRender::SaveToFile(LPCWSTR fileName,
   const CLSID* clsidVal)
{
   Status retVal = bmp->Save(fileName, clsidVal, NULL);
   return (retVal == Ok);
}

This implementation is fun. My intention is to make this wrapper expose only the HDC to the caller. The caller can make any drawing to the DC, then release it after using it. The wrapper then takes care of saving the image to file with the SaveToFile() method. I think this is good since if I want to extend the design, I can add additional methods to it and wrap the functionality of GDI+ underneath. The user does not have to worry about how to do GDI+ operations. All they need to know is find the right method of the wrapper, and call it. I can also test the implementation by writing a sample application and use this wrapper.

Finally, let's take a look at the pieces that are glued together into the application that can take a snapshot of the web page:

C++

void CMainFrame::OnBnClickedButtonSnapshot()
{
   CString sFileName;
   if (!GetFullPathFileName(this, sFileName))
   {
      return;
   }

   long cx=0, cy=0;
   GetWebBowserCtrlSize(this, cx, cy);

   CComPtr<IHTMLElementRender> spElemRender;
   if (!GetHtmlPageBodyRender(this, &spElemRender))
   {
      return;
   }

   CLSID jpegClsid;
   if (!GetEncoderClsid(_T("image/jpeg"), &jpegClsid))
   {
      AfxMessageBox(_T("Unable to get the CLSID for JPEG."));
      return;
   }

   ImageRender ir;
   if (ir.CreateImage((int)cx, (int)cy))
   {
      HDC renderDC = ir.GetDC();
      if (renderDC != NULL)
      {
         spElemRender->DrawToDC(renderDC);
         ir.ReleaseDC();

         if (!ir.SaveToFile(sFileName, &jpegClsid))
         {
            AfxMessageBox(_T("Unable to save the JPEG image."));
            return;
         }
      }
   }
}

As you can see, within the class CMainFrame, I did some code re-factoring too. I extracted several different operations out and made them into different local functions. Here they are; the first one is the one that gets the full path of the file name we want to save:

C++

BOOL GetFullPathFileName(CMainFrame* appFrame, CString& retFileName)
{
   _TCHAR BASED_CODE szFilter[] = 
     _T("JPEG Files (*.jpg;*.jpeg)|*.jpg;") 
     _T(" *.jpeg|All Files (*.*)|*.*||");
   CFileDialog dlg(FALSE, _T("*.jpg; *.jpeg"), _T(""), 
                   OFN_HIDEREADONLY | OFN_OVERWRITEPROMPT, 
                   szFilter, appFrame, 0);
   if (dlg.DoModal() != IDOK)
   {
      return FALSE;
   }

   retFileName = dlg.GetPathName();
   return TRUE;
}

The next one would be the function that returns a COM pointer of the rendering used by the body of the HTML document:

C++

BOOL GetHtmlPageBodyRender(CMainFrame* appFrame, IHTMLElementRender** retRender)
{
   CHtmlView* pView = (CHtmlView*)appFrame->GetActiveView();
   // should use reinterpret_cast.

   CComPtr<IDispatch> spDisp(pView->GetHtmlDocument());
   CComPtr<IHTMLDocument2> spDoc;
   if (FAILED(spDisp->QueryInterface(IID_IHTMLDocument2, (void**)&spDoc)))
   {
      AfxMessageBox(_T("Unable to get the HTML Document off the browser."));
      return FALSE;
   }

   CComPtr<IHTMLElement> spBody;
   if (FAILED(spDoc->get_body(&spBody)))
   {
      AfxMessageBox(_T("Unable to get the body of the HTML Document."));
      return FALSE;
   }

   CComPtr<IHTMLElementRender> spElemRender;
   if (FAILED(spBody->QueryInterface(IID_IHTMLElementRender, 
      (void**)&spElemRender)))
   {
      AfxMessageBox(_T("Unable to create render of the body element."));
      return FALSE;
   }

   *retRender = spElemRender;

   return TRUE;
}

At last, I fixed a bug I found in my original design. Rather than taking the size (width and height) of the entire document, I only take the size of the visible portion of the document. The problem I found is that if I attempt to take the snap shot of the entire page, some portion of the image I get back is black. I think it is only possible to take the snapshot of the visible portion of the web page. Here it is, a function that returns the size of the web browser control:

C++

void GetWebBowserCtrlSize(CMainFrame* appFrame, long& cx, long& cy)
{
   CHtmlView* pView = (CHtmlView*)appFrame->GetActiveView();
                      // should use reinterpret_cast.
   cx = pView->GetWidth();
   cy = pView->GetHeight();
}

Last Thoughts

This is it. I hope you have enjoyed the article. The cool thing about this tutorial is that any web page element can be rendered to a file as long as the IHTMLElement can be acquired. Another cool thing you have learnt is how to capture an image and save it as a file through GDI+. I have certainly learned these while working on this simple project.

Bugs

If you think there is any bug, I like to know about it. Please feel free to leave a comment below this article. I will fixe them. Thanks.

History

First draft - 9/15/2006.
Finished - 11/26/2006.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Han Bo Sun

Team Leader The Judge Group

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.