Click here to Skip to main content
15,870,165 members
Articles / Programming Languages / Javascript
Article

HTML Gets Hooked

Rate me:
Please Sign up or sign in to vote.
4.17/5 (6 votes)
3 Sep 20026 min read 174.7K   3K   60   30
This article shows how to take control of the content browsed in web pages while surfing

Image 1
Interactively highlight blocks in any arbitrary web page

This article shows how to take control of the content browsed in web pages while surfing. A few applications are described and provided in the demo app. Though the demo app is written using C#, the audience is much larger as the code involved can be directly translated to C++, VB or other languages. Besides that, HTML hooking is mostly done by Javascript.

HTML hooking ?

After 6 major releases of Internet Explorer, users are still stuck with a rather basic browser. As long as your only need is browse, everything is ok. Internet Explorer allows one to surf easily but what if you want to reuse an interesting part of a web page, subscribe to content updates, automate processing and so on ? In fact none of this is addressed in Internet Explorer 6.0 and those who try to take control of HTML face a giant gap :

  • HTML is a language with display semantics. HTML can not describe components, table of contents, and so on. That's why the industry is hard trying to introduce new languages. However significant years will be required to eradicate HTML, let aside high-tech investment downturn.
  • HTML is provided by stateful web servers that take advantage of cookies and session ids to serve content, making web pages in turn hard to work with.
  • HTML is the processing result of higher level server-side languages used by web developers. HTML is thus a flatten form with no other aim or ability than be displayed as is.
  • There is no simple HTML used in web sites these days, as every web developer uses a sausage of javascript, dynamic HTML, multi-layered event-driven flavor of HTML. To reuse HTML in your application, you need both of these run-times.
HTML hooking technically speaking is a way for developers to subscribe for specific browser events in the goal of providing end-users with browser++ software, applications whose aim is to browse as smartly as possible and make the web a better, more reliable, place to work with.

HTML is in some way already fully hookable, as almost every HTML tag can be attached behaviours associated with clic events. But this doesn't really result in applications because the events work with HTML in the same highly protected web page space, giving very few if any hooking capabilities to the developer, and in turn very few additional features to the end-user.

The Internet Explorer API allows us to host a web browser instance and subscribe for specific events such like being signaled a page has been loaded and is in interactive mode. By taking advantage of this event, a few other tweakings and the fact that the Internet Explorer API provides the Document Object Model (as well), we are going to apply changes to HTML code between the moment a web page is just loaded and the moment the web page is ready and displayed, giving us the ability to control what is actually seen and how it behaves. Let us begin with a first example.

Highlighting blocks in an arbitrary web page

Starting from a standard Form-based C# application, we drop the web browser control onto it, and subscribe for the event fired when the web page is ready, namely OnNavigateComplete:

Image 2
Subscribing for the page-ready event

When the page is ready, if we want to change the HTML code or apply events, we can take advantage of a method called execScript available at the IHTMLWindow level and provide it with javascript code :
C#
// event called when the web browser updates its view and finishes 
// parsing a new web page
private void OnNavigateComplete(object sender, 
                   AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event e)
{
    String code = <...some javascript code...>

    // exec Javascript
    //
    IHTMLDocument2 doc = (IHTMLDocument2) this.axWebBrowser1.Document;
    if (doc != null)
    {
        IHTMLWindow2 parentWindow = doc.parentWindow;
        if (parentWindow != null)
            parentWindow.execScript(code, "javascript");
    }
}

That's why we now need the javascript magic. How can we highlight blocks? This raises two questions: what is a block when all we have is a hierarchical tree of HTML tags (the infamous DOM) ? how to do highlighting ?

The first question answer is obvious from the experienced web designer point of view. Needless to say that 90% of web pages use <table> tags to position the content in the web page. Lucky we are, we are able to assume that table blocks are in fact web components, for instance navigation bars, main content, credit bar, and so on. Of course, this is not always true, but this is way true. Just try it, that's demonstration by the example !

HTML reverse engineering will be discussed in another article.

The second answer follows the first. We are going to check HTML elements under the mouse cursor. The processing needs to be fast enough to avoid to uselessly slow down the surfing experience. We simply use the DOM capabilities to traverse element parents from the current element and we seek for a <table> tag. Once we have got it, we just change on-the-fly its border and background color so it highlights. We are of course lucky guys because each change we do is automatically reflected in the web page without full refresh, that's one of the benefits of dynamic HTML. Here we go with the javascript code (boxify.js) :

JavaScript
  document.onmouseover = dohighlight;
  document.onmouseout = dohighlightoff;

  var BGCOLOR = "#444444";
  var BORDERCOLOR = "#FF0000";

  function dohighlight()
  {
    var elem = window.event.srcElement;

    while (elem!=null && elem.tagName!="TABLE")
        elem = elem.parentElement;

    if (elem==null) return;

    if (elem.border==0)
    {
        elem.border = 1;

        // store current values in custom tag attributes
        //
        elem.oldcolor = elem.style.backgroundColor; // store backgroundcolor
        elem.style.backgroundColor = BGCOLOR; // new background color

        elem.oldbordercolor = elem.style.borderColor; // same with bordercolor
        elem.style.borderColor = BORDERCOLOR;

        var rng = document.body.createTextRange();
        rng.moveToElementText(elem);

// following code is in comment but ready to use if required
// -> it can select the highlighted box content
// -> or put automatically the content in the clipboard to ease copy/paste
/*      var bCopyToClipboardMode = 1;
        if (!bCopyToClipboardMode)
            rng.select();
        else
            rng.execCommand("Copy"); */
    }
  }

  function dohighlightoff()
  {
    var elem = window.event.srcElement;

    while (elem!=null && elem.tagName!="TABLE")
        elem = elem.parentElement;

    if (elem==null) return;

    if (elem.border==1)
    {
        elem.border = 0;

        // recover values from custom tag attribute values
        elem.style.backgroundColor = elem.oldcolor;
        elem.style.borderColor = elem.oldbordercolor;
    }
  }
To play with interactive highlighting, we have a combobox in the right hand-corner on the application. Here is how the combobox has been developed : we have dropped this component from the Toolbox Window onto the Form, then inserted the selectable options in the Items Collection from the Properties Window, and chose "DropDownList" as combo-box style to disable edition. One thing we could'nt do from the Properties Window was to select the initial index, and had to manually add code for it : this.comboBox1.SelectedIndex = 0;. Resulting in that combo-box of me :

Image 3
Adding an unusual combo-box to a web browser app

As the combo-box inherently means, in this article we are here with a few other hookings to play with. Let me first introduce how state switching is managed :

C#
protected enum NavState
{
    None,
    NoPopup,
    Boxify
};


// event called when selection changes in the combobox
private void OnNavigationModeChanged(object sender, 
                                     System.EventArgs e)
{
    if ( comboBox1.Text=="NoPopup" )
    {
        NavigationState = NavState.NoPopup;
    }
    else if ( comboBox1.Text=="Boxify" )
    {
        NavigationState = NavState.Boxify;
    }
    else
    {
        NavigationState = NavState.None;
    }

    // synchronize UI
    SyncUI("");
}


// event called when the web browser updates its view and finishes 
// parsing a new web page
private void OnNavigateComplete(object sender, 
               AxSHDocVw.DWebBrowserEvents2_NavigateComplete2Event e)
{
    String sURL = (string) e.uRL;
    if (sURL=="about:blank")
        return;

    SyncUI( sURL );

}



// applogic
//

protected void SyncUI(String sURL)
{
    if (sURL.Length>0)
        textBox1.Text = sURL; // update UI

    String code;

    if ( NavigationState == NavState.NoPopup )
    {
        // squeeze down onload events (when web page loads)
        String code1 =	"document.onload=null;" +
                        "window.onload=null;" +
                        "for (i=0; i<window.frames.length; i++) { " +
                        " window.frames[i].document.onload=null;" + 
                        "window.frames[i].onload=null; };";

        // squeeze down onunload events (when web page is closed)
        String code2 =	"document.onunload=null;" +
                        "window.onunload=null;" +
                        "for (i=0; i<window.frames.length; i++) { " +
                        " window.frames[i].document.onunload=null;" + 
                        "window.frames[i].onunload=null; };";

        code = code1 + code2;

     }
     else if ( NavigationState == NavState.Boxify )
     {
         // read boxify.js
         FileStream fin = new FileStream("boxify.js", FileMode.Open, 
                                    FileAccess.Read, FileShare.ReadWrite) ;
         StreamReader tr = new StreamReader(fin) ;
         code = tr.ReadToEnd();
         tr.Close();
         fin.Close();

         if (code.Length==0) Console.WriteLine("Cannot find boxify.js file");
     }
     else
     {
         // stop boxify.js
         //
         code = "document.onmouseover = null; document.onmouseout = null;"  ;
     }

     // exec Javascript
     //
     IHTMLDocument2 doc = (IHTMLDocument2) this.axWebBrowser1.Document;
     if (doc != null)
     {
          IHTMLWindow2 parentWindow = doc.parentWindow;
          if (parentWindow != null)
               parentWindow.execScript(code, "javascript");
     }

}

Banning popups

Another nice HTML hooking technique is the one for preventing popups from opening. Web designers are used to the technique of executing javascript when quitting the current web page, and a lot of them use it on the purpose of opening popup pages (especially p0rn). What we do is, once the page is ready, overwrite these "callbacks" and force them to null.

Because the DOM is a rather richer object model, it is not enough to force null at the document level (the object representing the web page), we need to do this at the window level, and at any subwindow levels, known as frames.

See code above.

Saving HTML for reuse

Even if saving HTML for reuse deserves an article for itself, let me just initiate the few lines of code needed to do just that. In fact if we used C++ we would have casted the IHTMLDocument interface to IPersistFile and applied the Save() method on it, but in C# the replacement for IPersistFile is known as UCOMIPersistFile, in the System.Runtime.InteropServices namespace. What follows is what is needed to store the HTML code on your hard drive using C# :
C#
IHTMLDocument2 doc = (IHTMLDocument2) this.axWebBrowser1.Document;
UCOMIPersistFile pf = (UCOMIPersistFile) doc;
pf.Save(@"c:\myhtmlpage.html",true);
It's that easy.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
France France
Addicted to reverse engineering. At work, I am developing business intelligence software in a team of smart people (independent software vendor).

Need a fast Excel generation component? Try xlsgen.

Comments and Discussions

 
Question(VERY URGENT) how to collect selected table all TD values in an array or dataset [modified] Pin
k30720024-Jan-10 19:49
k30720024-Jan-10 19:49 
Questionhow i can save url to Mhtml file? Pin
Mohammad Hammad18-Jun-07 5:27
Mohammad Hammad18-Jun-07 5:27 
QuestionAny idea about MSIE 7.0? Pin
Akash Kava2-Nov-06 13:10
Akash Kava2-Nov-06 13:10 
QuestionHow to get the return value from parentWindow.execScript function [modified] Pin
Mukkesh K6-Jun-06 9:21
Mukkesh K6-Jun-06 9:21 
AnswerRe: How to get the return value from parentWindow.execScript function Pin
amedeo31-Jul-07 2:13
amedeo31-Jul-07 2:13 
Questionhow 2 work with combo box using axWebBrowser1 ? Pin
vedmack3-Jan-06 22:37
vedmack3-Jan-06 22:37 
QuestionValue of a JavaScript variable inside the webBrowser control? Pin
rwelte27-Apr-05 0:34
rwelte27-Apr-05 0:34 
AnswerRe: Value of a JavaScript variable inside the webBrowser control? Pin
Priyank Bolia29-Apr-05 3:29
Priyank Bolia29-Apr-05 3:29 
QuestionIs there any way of notification that Javascript has changes the HTML DOM ? Pin
nabil_shams19-Dec-04 22:52
nabil_shams19-Dec-04 22:52 
GeneralCapture all alert messages and close them Pin
nabil_shams9-Dec-04 3:00
nabil_shams9-Dec-04 3:00 
GeneralRe: Capture all alert messages and close them Pin
Stephane Rodriguez.14-Dec-04 9:43
Stephane Rodriguez.14-Dec-04 9:43 
GeneralRe: Capture all alert messages and close them Pin
mstbcn23-Aug-06 2:09
mstbcn23-Aug-06 2:09 
QuestionTrapping image display? Pin
Narendra Chandel1-Apr-04 22:30
Narendra Chandel1-Apr-04 22:30 
AnswerRe: Trapping image display? Pin
Stephane Rodriguez.2-Apr-04 8:33
Stephane Rodriguez.2-Apr-04 8:33 
QuestionHow to save image in webpage? Pin
w1424324-Mar-04 22:48
w1424324-Mar-04 22:48 
Rod,

In your article, 'Saving HTML for reuse' is very useful for me. I have a question on how to save image in WebBrowser control. This is my code:

oDocument = (mshtml.IHTMLDocument2)this.TheWebBrowser.Document;
int i = 0;
string sname;

UCOMIPersistFile f;

foreach( mshtml.HTMLImgClass img in oDocument.images )
{
f = (UCOMIPersistFile)img; // fail
sname = "j:\\z\\" + i.ToString() + ".jpg";
f.Save( sname, true );
i++;
}

But it failed.

How to use similar method to save an image, just like 'Save image as...' in IE image context menu? The 'Save as' dialog should not be displayed.

Would you please give me some advice?

My email is w14243@email.mot.com

Regards,

AnswerRe: How to save image in webpage? Pin
Stephane Rodriguez.25-Mar-04 1:02
Stephane Rodriguez.25-Mar-04 1:02 
GeneralRe: How to save image in webpage? Pin
w1424325-Mar-04 2:08
w1424325-Mar-04 2:08 
GeneralRe: How to save image in webpage? Pin
rrrado18-Apr-04 21:06
rrrado18-Apr-04 21:06 
GeneralRe: How to save image in webpage? Pin
iamduyu14-May-07 8:08
iamduyu14-May-07 8:08 
GeneralRe: How to save image in webpage? Pin
rrrado15-May-07 2:34
rrrado15-May-07 2:34 
GeneralRe: How to save image in webpage? Pin
Berdon Magnus24-May-07 18:54
Berdon Magnus24-May-07 18:54 
GeneralTD AND TR Pin
wblairIV26-Aug-03 11:21
wblairIV26-Aug-03 11:21 
GeneralRe: TD AND TR Pin
Stephane Rodriguez.26-Aug-03 20:22
Stephane Rodriguez.26-Aug-03 20:22 
GeneralRe: TD AND TR Pin
claudioaparecido87979810-Oct-04 20:03
claudioaparecido87979810-Oct-04 20:03 
GeneralVersion in VC++ 6.0 Pin
tsz20-Jan-03 14:24
tsz20-Jan-03 14:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.