Click here to Skip to main content
15,900,254 members
Articles / Web Development / ASP.NET
Article

Microsoft Web Browser Automation using C#

Rate me:
Please Sign up or sign in to vote.
4.81/5 (88 votes)
16 Nov 20032 min read 856.1K   19.5K   164   178
An article on axWebBrowser/MSHTML automation using Visual C#.

Sample Image - mshtml_automation.jpg

Introduction

The Microsoft Web Browser COM control adds browsing, document, viewing, and downloading capabilities to your applications. Parsing and rendering of HTML documents in the WebBrowser control is handled by the MSHTML component which is an Active Document Dynamic HTML (DHTML) Object Model hosting ActiveX Controls and script languages. The WebBrowser control merely acts as a container for the MSHTML component and implements navigations and related functions. MSHTML can be automated using IDispatch and IConnectionPointContainer-style automation interfaces. These interfaces enable a host to automate MSHTML through the object model.

Note

If you are not using the Visual Studio .NET IDE; use Windows Forms ActiveX Control Importer (Aximp.exe) to convert type definitions in a COM type library for an ActiveX control into a Windows Forms control. For instance: to generate the interop DLL's for the ActiveX browser component using the command line run aximp ..\system32\shdocvw.dll relative to your system32 path. Compilation of a form that uses the AxSHDocVw.AxWebBrowser class would be as follows: csc /r:SHDocVw.dll,AxSHDocVw.dll YourForm.cs.

Using the code

Simple Automation scenario:

Image 2

In order to automate this task, first add a Microsoft Web Browser object to an empty C# Windows application. In the Visual Studio .NET IDE, this is done by using the "Customize Toolbox..." context menu (on the Toolbox), pick "Microsoft Web Browser" from the COM components list. This will add an "Explorer" control in the "General" section of the Toolbox.

C#
//
// navigate to google on Form load
//
private void Form1_Load(object sender, System.EventArgs e)
{
    object loc = "<A href="http://www.google.com/">http://www.google.com/</A>";
    object null_obj_str = "";
    System.Object null_obj = 0;
    this.axWebBrowser1.Navigate2(ref loc , ref null_obj, 
          ref null_obj, ref null_obj_str, ref null_obj_str);
}

Next open the solution explorer and add a reference to the Microsoft HTML Object Library (MSHTML) from the COM components list and implement the following code.

C#
//
// Global variable Task used to prevent recursive code executions.
// 

using mshtml;

private int Task = 1; // global

private void axWebBrowser1_DocumentComplete(object sender, 
         AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent e)

{
switch(Task)
    {
        case 1:

            HTMLDocument myDoc = new HTMLDocumentClass();
            myDoc = (HTMLDocument) axWebBrowser1.Document;

            // a quick look at the google html source reveals: 
            // <INPUT maxLength="256" size="55" name="q">
            //
            HTMLInputElement otxtSearchBox = 
               (HTMLInputElement) myDoc.all.item("q", 0);

            otxtSearchBox.value = "intel corp";

            // google html source for the I'm Feeling Lucky Button:
            // <INPUT type=submit value="I'm Feeling Lucky" name=btnI>
            //
            HTMLInputElement btnSearch = 
               (HTMLInputElement) myDoc.all.item("btnI", 0);
            btnSearch.click();

            Task++;
            break;

        case 2:

            // continuation of automated tasks...
            break;
    }
}

References

MSDN

History

  • Version 1.0 - November 16th 2003 - Original Submission
  • Version 1.1 - November 17th 2003 - Modified axWebBrowser event

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Kentdome LLC
United States United States
Biography in progress Wink | ;-)

Comments and Discussions

 
GeneralRe: HELP Pin
vbhandari18-Feb-07 21:52
vbhandari18-Feb-07 21:52 
GeneralCool article... Pin
Kentamanos17-Nov-03 10:18
Kentamanos17-Nov-03 10:18 
GeneralCorrections Pin
Brian Shifrin17-Nov-03 4:54
Brian Shifrin17-Nov-03 4:54 
GeneralRe: Corrections Pin
Alexander Kent17-Nov-03 7:24
Alexander Kent17-Nov-03 7:24 
GeneralRe: Corrections Pin
Brian Shifrin17-Nov-03 14:08
Brian Shifrin17-Nov-03 14:08 
GeneralRe: Corrections Pin
Alexander Kent17-Nov-03 14:50
Alexander Kent17-Nov-03 14:50 
GeneralRe: Corrections Pin
Frank Meffert18-Nov-03 11:39
Frank Meffert18-Nov-03 11:39 
GeneralRe: Corrections Pin
mjzalewski19-Nov-03 12:42
mjzalewski19-Nov-03 12:42 
See if this makes sense

The reason why Download_Complete works is because on this page, (www.google.com) there are only two downloads. 1) The main HTML page and 2) the Google image.

The only way you get into trouble is if 1) the Google image fires Download_Complete first or 2) the .Document property axWebBrowser1.Document is not available immediately after the HTML page downloads.

So when you tell the browser component to navigate to http://www.google.com, it begins downloading the HTML page. This page is very short, and probably gets served from cache. So I'm thinking that the HTML page downloads completely before the browser even begins downloading the image. So 1) probably never occurs.

Maybe some expert out there can tell me if 2) ever applies (ie, the Download_Complete for a HTML page fires, but the .Document property is not available until the Document_Complete event fires. My guess is that accessing the .Document property probably is synchronized some how so that it always returns the valid DOM object.

I know from my own experience that it is quite possible to navigate through the DOM object before all the images finish. So that tells me that .Document is available after the Download_Complete of the HTML page. But I do agree that the original code (which used Download_Complete, but did not check that the completed download was the HTML page) was in error.

But here is why I think C# is OK with that: In this case, the first Download_Complete will always correspond to the HTML page, which is short and simple. If you follow my analysis, it should be possible to write the same program with C++, (and attach to the Download_Complete event), and C++ will work the same way as C#. Although in either case, if you started with something more complicated than www.google.com, chances are the original example will not work.

To me, it seems clear that the reason C# works is not because C# is so slow that it cannot execute 3 statements in the time between the first Download_Complete and Document_Complete event. I would like to see a C++ example, written to attach to the Download_Complete event, which fails in the manner described.
GeneralRe: Corrections Pin
Brian Shifrin27-Nov-03 15:22
Brian Shifrin27-Nov-03 15:22 
GeneralRe: Corrections Pin
rcsrinivas22-Jan-04 10:50
rcsrinivas22-Jan-04 10:50 
GeneralRe: Corrections Pin
Jasper4C#11-Nov-04 3:52
Jasper4C#11-Nov-04 3:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.