5,276,406 members and growing! (16,178 online)
Email Password   helpLost your password?
Multimedia » General Graphics » Image classes     Intermediate

Image Capture Whole Web Page using C#

By Douglas M. Weems

Capture whole web pages as a single image using C#.
C#.NET 1.1, WinXP, Windows, .NETVS, VS.NET2003, Dev, QA

Posted: 22 Jun 2005
Updated: 22 Jun 2005
Views: 160,053
Announcements
Want a new Job?



Search    
Advanced Search
Sitemap
40 votes for this Article.
Popularity: 6.89 Rating: 4.30 out of 5
3 votes, 7.5%
1
0 votes, 0.0%
2
1 vote, 2.5%
3
7 votes, 17.5%
4
29 votes, 72.5%
5

Sample Image - capture.gif

Introduction

This article presents a C# routine for capturing an entire web page as an image. Many capture examples show how to grab a screen shot, but do not show how to gather information that is below the scrolling region of an application. The most common example of a scrolling problem or “run-over” program is a web page.

This application grabs the page, plus, as a bonus, it demonstrates how to let the client adjust the size of the image and the quality of the JPEG. It shows how to write the name of the webpage onto the image, draw Standard Resolution Guides, save a bitmap as a JPEG and open the directory where the captures are stored.

Background

In a recent application, I wanted to provide our Quality Assurance testers the ability to capture an entire web page. I wanted them to do this by clicking a button from within a BHO (Browser Helper Object) that is used for another testing task. I also wanted to reduce the size of the capture, because the images are e-mailed and can quickly fill up our mailbox quotas.

Using the code

The easiest way to use this code is to download the source, trim out the code functions that may not be wanted (quality of capture, size of image, URL writing, guides, or the open directory function). After the code is trimmed down and the program can compile without errors, copy the source and its dependencies into the desired project.

The first issue to face when copying the source code into a project is the need to refer SHDocVw.dll and MSHTML.dll. In Visual Studio, go to Project, Add Reference, and then select the COM tab. Now, go down to the Microsoft section and look for "Microsoft Internet Controls". Select it, and then find "Microsoft HTML Object Library" (see the above image).

After adding the references, add these necessary directives into the project. (A few other directives are needed, if the code is not loaded into a form.)

using System.Text;
using System.Runtime.InteropServices;
using System.Diagnostics;
using System.IO;
using System.Drawing.Imaging;
using SHDocVw;
using mshtml;

Import user32 functions

[DllImport("user32.dll", CharSet=CharSet.Auto)]
public static extern IntPtr FindWindowEx(IntPtr parent /*HWND*/, 
  IntPtr next /*HWND*/, string sClassName, IntPtr sWindowTitle);

[DllImport("user32.dll", ExactSpelling=true, CharSet=CharSet.Auto)] 
public static extern IntPtr GetWindow(IntPtr hWnd, int uCmd); 

[DllImport("user32.Dll")]
public static extern void GetClassName(int h, StringBuilder s, int nMaxCount);

[DllImport("user32.dll")]
private static extern bool PrintWindow(IntPtr hwnd, IntPtr hdcBlt, uint nFlags);

public const int GW_CHILD = 5; 
public const int GW_HWNDNEXT = 2;

Find an open browser and assign a browser document for it.

 SHDocVw.WebBrowser m_browser = null;
 SHDocVw.ShellWindows shellWindows = new SHDocVw.ShellWindowsClass();
 
 //Find first availble browser window.

 //Application can easily be modified to loop through and 

 //capture all open windows.

 string filename;
  foreach (SHDocVw.WebBrowser ie in shellWindows)
  {
      filename = Path.GetFileNameWithoutExtension(ie.FullName).ToLower();
      if (filename.Equals("iexplore"))
      {
          m_browser = ie;
          break;  
      }
  }
  if (m_browser == null)
  {   
      MessageBox.Show("No Browser Open");
      return;
  }

  //Assign Browser Document

  mshtml.IHTMLDocument2 myDoc = (mshtml.IHTMLDocument2)m_browser.Document;

The width and height of the web page must be determined along with the resolution settings of the clients screen.

 //Set scrolling on.

 myDoc.body.setAttribute("scroll", "yes", 0);
 
 //Get Browser Window Height

 int heightsize = (int)myDoc.body.getAttribute("scrollHeight", 0);
 int widthsize = (int)myDoc.body.getAttribute("scrollWidth", 0);
 
 //Get Screen Height

 int screenHeight = (int)myDoc.body.getAttribute("clientHeight", 0);
 int screenWidth = (int)myDoc.body.getAttribute("clientWidth", 0);

To capture the whole web page, fragments of the page will have to be grabbed and stitched together to make the whole page. After the first fragment is captured, the browser is scrolled down for the next capture. As the fragments are captured, they are stitched into a target bitmap. The process is repeated until the whole page is captured. For pages that are wider than the clients screen, the page gets scrolled over horizontally, and then the above process is repeated.

 //Get bitmap to hold screen fragment.

 Bitmap bm = new Bitmap(screenWidth, screenHeight, 
    System.Drawing.Imaging.PixelFormat.Format16bppRgb555);
 
 //Create a target bitmap to draw into.

 Bitmap bm2 = new Bitmap(widthsize + URLExtraLeft, heightsize + 
    URLExtraHeight - trimHeight, 
         System.Drawing.Imaging.PixelFormat.Format16bppRgb555);
 Graphics g2 = Graphics.FromImage(bm2);
 
 Graphics g = null;
 IntPtr hdc;
 Image screenfrag = null;
 int brwTop = 0;
 int brwLeft = 0;
 int myPage = 0;
 IntPtr myIntptr = (IntPtr)m_browser.HWND;
 
 //Get inner browser window.

 int hwndInt = myIntptr.ToInt32();
 IntPtr hwnd = myIntptr;
 hwnd = GetWindow(hwnd, GW_CHILD); 
 StringBuilder sbc = new StringBuilder(256);
 
 //Get Browser "Document" Handle

 while (hwndInt != 0) 
 { 
     hwndInt = hwnd.ToInt32();
     GetClassName(hwndInt, sbc, 256);
 
     if(sbc.ToString().IndexOf("Shell DocObject View", 0) > -1)
     {
         hwnd = FindWindowEx(hwnd, IntPtr.Zero, 
             "Internet Explorer_Server", IntPtr.Zero);
         break;
     }                
     hwnd = GetWindow(hwnd, GW_HWNDNEXT);
  } 
 
 //Get Screen Height (for bottom up screen drawing)

 while ((myPage * screenHeight) < heightsize)
 {
     myDoc.body.setAttribute("scrollTop", (screenHeight - 5) * myPage, 0);
     ++myPage;
 }
 
 //Rollback the page count by one

 --myPage;
 
 int myPageWidth = 0;
  while ((myPageWidth * screenWidth) < widthsize)
 {
     myDoc.body.setAttribute("scrollLeft", (screenWidth - 5) * myPageWidth, 0);
     brwLeft = (int)myDoc.body.getAttribute("scrollLeft", 0);
     for (int i = myPage; i >= 0; --i)
     {
         //Shoot visible window

         g = Graphics.FromImage(bm);
         hdc = g.GetHdc();
         myDoc.body.setAttribute("scrollTop", (screenHeight - 5) * i, 0);
         brwTop = (int)myDoc.body.getAttribute("scrollTop", 0);
         PrintWindow(hwnd, hdc, 0);
         g.ReleaseHdc(hdc);
         g.Flush();
         screenfrag = Image.FromHbitmap(bm.GetHbitmap());
         g2.DrawImage(screenfrag, brwLeft + URLExtraLeft, brwTop + 
            URLExtraHeight);
     }
     ++myPageWidth;
 }

Finally, save the above target to a time stamped JPEG file.

Points of Interest

I had a lot of fun and suffered a lot of frustration with this project. The captures are really nice. Try it out on one of the "Code Project" pages.

Not shown in this article, but available in the source is the saving of the file to JPEG. I tried GIF and bitmap, but settled on JPEG for size. The main goal was to be able to e-mail these files without taking up a lot of our mailbox quota.

In the actual application, I have an option to copy the file to the clipboard. I never was able to get the clipboard image into a "device dependent bitmap" state that didn't take up much size. I would copy the image, and then paste it into my Outlook e-mail, only to have the e-mail be about a MB big. When I would open the JPEG in Photoshop, then select it, copy it and paste it into Outlook, the Adobe device dependent bitmap was under 100 KB. The same happened with the simple Windows Paintbrush application.

Because of time constraints, I settled on just copying the JPEG file to Outlook. Any solutions on how to turn a large device independent bitmap into a bitmap with a small memory footprint would be welcomed.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Douglas M. Weems


My experience with programming began with Turbo Pascal while working on my Physics degree back in 1989.

After getting out of school, I used pre-VBA Excel macros to write some really fancy applications to help with the job I was doing. This inspired me to try to write "Windows" programs and to search out Visual Basic 3.0.

I wrote a bunch of small applications and ran them against Access and FoxPro. However, this still wasn't my primary job.

In 1994, I went on my first contract, a 3-month deal that turned into 3-years. I learned a lot more about development. Development was in VB3, VB4 and ASP. I got a chance to admin NT4 and SQL Server 6 and 6.5.

After moving on to another company, I spent another 2 years with VB and then 5 years with Java and JSP.

In March of 2004, I installed Visual Studio 2003. I tasted C#, and became hopelessly addicted.

My other interests are my 3 sons, my wife , metal detecting, yard work, travel and learning new things.

location: Atlanta, Georgia
Occupation: Web Developer
Location: Canada Canada

Other popular General Graphics articles:

  • A flexible charting library for .NET
    Looking for a way to draw 2D line graphs with C#? Here's yet another charting class library with a high degree of configurability, that is also easy to use.
  • CxImage
    CxImage is a C++ class to load, save, display, transform BMP, JPEG, GIF, PNG, TIFF, MNG, ICO, PCX, TGA, WMF, WBMP, JBG, J2K images.
  • 3D Pie Chart
    A class library for drawing 3D pie charts.
  • Really cool visual FX
    A set of classes for doing stunning visual effects, including water, plasma and fire.
  • ImageStone
    An article on a library for image manipulation.

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 25 of 118 (Total in Forum: 118) (Refresh)FirstPrevNext
Subject  Author Date 
GeneralEnhancmentmemberavinashkor22:33 1 Jul '08  
GeneralCapturing entire web page like msn.commemberNavid Akhtar22:40 19 May '08  
GeneralHelpmembernvang17:21 6 May '08  
QuestionerrormemberAshley Staggs9:04 16 Apr '08  
QuestionI also got a black image with IE6. Please help..memberstotti_no11:27 10 Mar '08  
GeneralRe: I also got a black image with IE6. Please help..memberstotti_no15:11 10 Mar '08  
GeneralRe: I also got a black image with IE6. Please help..memberDouglas M. Weems16:35 10 Mar '08  
QuestionShow Black ImagememberVachanC23:57 9 Mar '08  
GeneralI have not revisted this code in a whilememberDouglas M. Weems16:33 10 Mar '08  
QuestionDemo project not workingmembergrbala15:00 11 Dec '07  
GeneralRe: Demo project not workingmembergordingin9:49 16 Feb '08  
Generalsaving web page contents on hard diskmemberJD811:10 5 Nov '07  
QuestionControlling Sidebar from BHOmemberAlex Shneyder18:10 9 Oct '07  
Generalhow to convert a PDF to JPGmemberExelioindia1:46 8 Oct '07  
GeneralRe: how to convert a PDF to JPGmemberPete O'Hanlon2:16 8 Oct '07  
GeneralRe: how to convert a PDF to JPGmemberExelioindia3:24 8 Oct '07  
GeneralHow to get image of link page by page insted of whole page ?memberJagdish Vasani1:12 10 Sep '07  
GeneralReused your code in WatiNmemberjvmenen12:10 8 Sep '07  
GeneralRe: Reused your code in WatiNmemberDouglas M. Weems16:37 10 Mar '08  
GeneralPlease Post Clipboard code!!memberPaulRuby8:59 17 Aug '07  
GeneralIncredibly useful except for the black imagesmembersbarrett5:23 26 Jul '07  
GeneralIt seems it doesn't work with IE7memberholyhellman18:19 18 Jul '07  
GeneralRe: It seems it doesn't work with IE7memberMichael Sync0:40 6 Aug '07  
GeneralRe: It seems it doesn't work with IE7membermichael. zhao1:06 7 Sep '07  
GeneralRe: It seems it doesn't work with IE7memberDaniel Landers1:08 10 Apr '08  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 22 Jun 2005
Editor: Smitha Vijayan