Mouse Gestures for Internet Explorer

Ralph Hare

4.84/5 (91 votes)

Sep 28, 2003

14 min read

1304604

13366

Adding mouse gesture recognition to Internet Explorer.

Introduction
Implementation
Tracking the mouse
Gesture recognition
Browser actions
- Hit test - IE 6.0
- Hit test - IE 5.0
Conflicts with other Browser Helper Objects
Conflict with right-left scripts
Using Mouse Gestures
Improvements
Revision History

Introduction

I've read a lot about Opera, so a couple of months ago I finally got around to downloading it. There's loads of stuff that I really liked about Opera, but the real killer feature was the mouse gesture support. After a few hours of browsing using mouse gestures I was really hooked.

Unfortunately I found that a number of pages, especially ones that I have to use regularly for work, weren't being rendered properly (I'm not sure if it was Opera rendering the page incorrectly, or the HTML that was incorrectly formatted - I suspect the latter).

This article discusses how I've added mouse gesture support to Internet Explorer. The code presented isn't intended to detail every step in the implementation (hopefully the source code will do this); rather it's intended to provide a quick sketch of the steps involved.

Important: To build the source project you'll need to have WTL 7.0 installed. You can download it here. See also Easy installation of WTL.

Implementation

I identified there were three key things I'd have to be able to do to add mouse gesture support to Internet Explorer:

Track the movement of the mouse in the browser window
Pass the mouse track through a gesture recognition algorithm
Provide actions for the most common mouse gestures

Tracking the mouse

Using the Spy application that comes with Visual Studio I identified that the window that renders the HTML (and therefore the one I'd want to track the mouse movement within) was of class Internet Explorer_Server. All I needed to do was to somehow subclass this window (see Create your own controls - the art of subclassing and Subclassing controls in ATL dialogs using WTL): I'd be able to add a message handler for all the mouse messages and be well on my way.

There were two ways I could see of being able to subclass the Internet Explorer_Server window:

Embed a WebBrowser control in (for example) an MFC dialog application, and subclass the window in the OnInitDialog method
Somehow inject my code into Internet Explorer

Option 2 was far more attractive - who would want yet another web browser application?

I was familiar with Jeffrey Richter's technique of injecting DLLs into another process (see API hooking revealed) when I came across John Osborn's Popup Window Blocker. In his article, John discusses how to write a Browser Helper Object (BHO - a DLL that will attach itself to every instance of Internet Explorer) in ATL. Just what I wanted.

I knocked up a quick ATL project, implemented the IObjectWithSite interface, and pretty soon I had access to the HWND of the web-browser:

    STDMETHODIMP BrowserHelperObject::SetSite( IUnknown *pUnkSite )
    {
        CComQIPtr< IWebBrowser2 >   pWebBrowser = pUnkSite;
        if( pWebBrowser )
        {
            HWND    hWnd;
            pWebBrowser->get_HWND( (long*)&hWnd );
        }
    }

The HWND is a handle to a window of class IEFrame. A quick look with Spy showed that IE 6.0 implemented the following (window) class hierarchy:

Child window creation

At the point when the SetSite method of our browser helper object is called, the Shell DocObject View and Internet Explorer_Server windows haven't yet been created.

Version 1.0 of the plug-in, watched for the creation of the Shell DocObject View window by (temporarily) subclassing the IEFrame window and watching for the WM_PARENTNOTIFY message:

    class Observer
    {
    public:
        virtual void OnChildCreate( HWND hWnd )
        {
            enum { BUFFER_SIZE = 128 };

            TCHAR   buffer[ BUFFER_SIZE ];
            ::GetClassName( hWnd, BUFFER_SIZE );
        }
    };

    class ParentNotifyTracker : public CWindow< ParentNotifyTracker >
    {
    public:
        void Advise( HWND hWnd, Observer *pObserver )
        {
        //
        // Method to subclass the window hWnd and to associate
        // the m_pObserver member with the call back interface pointer
        // pObserver
        //
        }

        BEGIN_MSG_MAP( ParentNotifyTracker )
            MESSAGE_HANDLER( WM_PARENTNOTIFY, OnParentNotify )
        END_MSG_MAP()

        LRESULT OnParentNotify( UINT, WPARAM wParam, 
                          LPARAM lParam, BOOL &bHandled )
        {
            if( LOWORD( wParam ) == WM_CREATE )
            {
                m_pObserver->OnChildCreate( (HWND)lParam );
            }

        //
        // set bHandled to FALSE and return zero to ensure the message is 
        // propagated through to the real parent
        //
            bHandled = FALSE;
            return 0;
        }
    };

This worked quite well for IE 6.0, but I found that when in a multiframe document, IE 5.0 implements the following window hierarchy (IE 6.0 always uses the simple hierarchy above):

Each frame is displayed in the lower level Internet Explorer_Server windows. Initially, all I thought I would have to do is to extend the ParentNotifyTracker mechanism to watch for the creation of Shell Embedding windows, and track the mouse gestures in multiple Internet Explorer_Server windows.

Unfortunately, when the Shell Embedding window is created it doesn't send a WM_PARENTNOTIFY message (it would appear that the WS_EX_NOPARENTNOTIFY style is set on the Shell Embedding window). An alternative approach to watching window creation was required.

Windows Hooks

I knew from reading API hooking revealed that I could add a hook to trap all the window creation messages. However as stated in MSDN (and also in the API hooking article):

Hooks tend to slow down the system because they increase the amount of processing the system must perform for each message. You should install a hook only when necessary, and remove it as soon as possible.

As far as I'm aware, Internet Explorer is a multi SDI application; i.e. multiple documents (or web browsers) are opened within the same process, each document having its own thread to pump messages. Therefore, I've only got to hook all the messages for a specific thread (rather than installing a global hook) to receive notifications of windows being created. While this does impose a greater overhead than subclassing a window, there doesn't appear to be a noticeable increase in load using this technique.

If there's sufficient interest, I'd consider writing an article describing in more detail how I've utilised the hook (I do some interesting stuff with thread local storage) but for now WindowHook.cpp and WindowHook.h in the source code should provide enough info to be going on with.

In version 1.0 of the plug-in, I didn't realise that when Internet Explorer displays an HTML document, and is then updated to display (for example) an XML document, a new Internet Explorer_Server window is created. At this point, the plug-in stopped being able to track the mouse. Using the hook mechanism meant that I was able to see the creation/destruction of all Internet Explorer_Server windows.

MouseTracker

Once we've been notified of the creation of our Shell DocObject View window, we can use the same class to watch this window for the creation of the Internet Explorer_Server window. Now that we've got our Internet Explorer_Server window, we can permanently subclass the window to receive all mouse events:

    typedef std::vector< POINT >    Path;

    class MouseTracker : public CWindowImpl< MouseTracker >
    {
        struct Watcher
        {
            virtual void OnLeftButtonUp( const Path &path );
            virtual void OnRightButtonUp( const Path &path );
        };

        BEGIN_MSG_MAP( MouseTracker )
            MESSAGE_HANDLER( WM_MOUSEMOVE, OnMouseMove )
            MESSAGE_HANDLER( WM_RBUTTONDOWN, OnRightButtonDown )
            MESSAGE_HANDLER( WM_RBUTTONUP, OnRightButtonUp )
            MESSAGE_HANDLER( WM_LBUTTONDOWN, OnLeftButtonDown )
            MESSAGE_HANDLER( WM_LBUTTONUP, OnLeftButtonUp )
            MESSAGE_HANDLER( WM_MOUSEWHEEL, OnMouseWheel )
        END_MSG_MAP()
    };

We start tracking the mouse when the user holds the mouse down (WM_LBUTTONDOWN and WM_RBUTTONDOWN). While the button is held down, we store each point the mouse visits (from the WM_MOUSEMOVE message) in some suitable container (e.g. an STL vector), and we complete tracking when the user lets go off the mouse (WM_LBUTTONUP and WM_RBUTTONUP). When the user releases the mouse we inform our client of the mouse track via the OnLeftButtonUp and OnRightButtonUp methods on the Watcher callback interface.

If the user moves the mouse out of the window rectangle and releases the mouse while we're tracking, we won't receive the button up notification (it will get sent to whatever window the mouse is currently over). However, once we've started tracking a gesture, we want to keep tracking until the user lets go off the mouse button. We solve this problem by calling ::SetCapture when we start tracking (WM_LBUTTONDOWN and WM_RBUTTONDOWN) and ::ReleaseCapture when we've finished tracking (WM_LBUTTONUP and WM_RBUTTONUP).

BrowserWatcher

The browser watcher class maintains a list (but, implemented using std::map) of MouseTracker objects - one for each Internet Explorer_Server window created. Whenever a Internet Explorer_Server is destroyed, the corresponding MouseTracker object is removed from the list and destroyed.

Gesture recognition

So far, we've been able to subclass the IE web browser window and record the mouse movements within. Now to make sense of the movements, but where to start? I did a quick search of the CodeProject web site, which turned up a fantastic article on Mouse gesture recognition by Konstantin Boukreev. With a minimal amount of tweaking, I was able to integrate the source from his GestureApp into my browser helper object:

    enum GesturePattern;

    class GestureTracker : public MouseTracker::Watcher
    {
    public:
        struct Observer
        {
            virtual void OnMouseGesture( GesturePattern pattern );
        };

    // MouseTracker::Watcher
    private:
        void OnLeftButtonUp( const Path &path );

        void OnRightButtonUp( const Path &path )
        {
            int pattern = m_gesture.Recognize( path );
            if( pattern != -1 )
            {
                m_pObserver->OnGesture( pattern );
            }
        }

    private:
        Gesture     m_gesture;
        Observer    *m_pObserver;
    };

See GestureTracker.cpp and GestureTracker.h in the source code for full implementation details.

Many thanks to Konstantin who has kindly allowed me to reuse his source in my plug-in.

Browser actions

My intention here was to replicate most of the functionality provided by Opera. For the most part, this proved really easy to implement, either exercising the functionality exposed by the IWebBrowser interface, or by sending Windows messages directly to the IE frame.

Hit test - IE 6.0

The gesture I was most keen to replicate was the ability to open a link in a new window (hold the right button and move down). Initially this seemed quite simple to implement:

    void HitTest( IWebBrowser2 *pWB, const POINT &pt )
    {
        HRESULT     hr;
    //
    // retrieve the active document from the Web Browser
    //
        CComPtr< IDispatch >    pDisp;
        hr = pWB->get_Document( &pDisp );

    //
    // is it an HTML document?
    //
        CComQIPtr< IHTMLDocument2 > pDoc = pDisp;
        if( pDoc )
        {
        //
        // retrieve the element (if any) at the point
        //
            CComPtr< IHTMLElement > pElement;
            hr = pDoc->elementFromPoint( pt.x, pt.y, &pElement );

        //
        // is the element (or any of its parents) an anchor element
        // (the user could have clicked on an image embedded within an
        // an anchor:
        //
        //      <a href="blah"><img src="img.jpg"></a>
        //
        // In the first instance, elementFromPoint will return the IMG
        // element.
        //
        // The implementation of GetHRefFromAnchor is omitted here, but
        // it ascends an element hierarchy until reaches an anchor element
        // or the top of the document. GetHRefAnchor
        // returns the HREF attribute
        // of the anchor element, or an empty
        // string if no anchor could be found.
        //
            CComBSTR    url = GetHRefFromAnchor( pElement );

            if( url.Length() )
            {
                CComPtr< IWebBrowser2 >   pApp;

                hr = pApp.CoCreateInstance( __uuidof( InternetExplorer ) );
                // not real function signature -
                // for illustration only!!
                hr = pApp->Navigate2( &url );    
                                                
            }
        }
    }

This worked fine when viewing single frame pages, but when viewing multi frame pages (e.g. Google's USENET browser), I'd often end up navigating to either my default home page, worse, a random link on the document.

Consider a single frame document:

Let's assume that the position of our link is (x₀,y₀) relative to the top left corner of the browser window.

If we embed the same page in a multiframe document:

Let's assume that the position of our link is (x₁,y₁) relative to the top left corner of the browser window.

Now when we call elementFromPoint with (x₁,y₁), we're returned a pointer to the frame element (which also happens to be of type IWebBrowser2). We need to call elementFromPoint against this new web browser window, but adjusting our point by the offset of the frame (x_f,y_f) within the main document.

Our code now looks something like this:

    void HitTest( IWebBrowser2 *pWB, const POINT &pt )
    {
        HRESULT     hr;

        CComPtr< IDispatch >    pDisp;
        hr = pWB->get_Document( &pDisp );

        CComQIPtr< IHTMLDocument2 > pDoc = pDisp;
        if( pDoc )
        {
            CComPtr< IHTMLElement > pElement;
            hr = pDoc->elementFromPoint( pt.x, pt.y, &pElement );

            CComQIPtr< IWebBrowser2 >   pFrame = pElement;
            if( pFrame )
            {
            //
            // GetElementOffset retrieves the offset of the given
            // element relative to the origin of the web browser
            //
                SIZE    offset    = GetElementOffset( pElement );
                POINT   ptInFrame = { pt.x - offset.cx, pt.y - offset.cy };
                return HitTest( pFrame, ptInFrame );
            }

            CComBSTR    url = GetHRefFromAnchor( pElement );

            if( url.Length() )
            {
                CComPtr< IWebBrowser2 >   pApp;

                hr = pApp.CoCreateInstance( __uuidof( InternetExplorer ) );
                // not real function signature -
                // for illustration only!!
                hr = pApp->Navigate2( &url );    
                                                
            }
        }
    }

Hit Test - IE 5.0

All this worked really well with IE 6.0 but you'll recall that IE 5.0 has a different multiframe window hierarchy to IE 6.0:

As we've now got a different window hierarchy, I had to reinvestigate the HitTest for the 'open link in a new window' feature (remember we don't have embedded frame elements any more, we have embedded windows). A quick Google and I came across the following mechanism for retrieving the document associated with a particular window:

    const UINT  WM_HTML_GETOBJECT = ::RegisterWindowMessage( 
        _T( "WM_HTML_GETOBJECT" ) );

    HRESULT GetDocument( HWND hWnd, IHTMLDocument2 **ppDocument )
    {
        DWORD   res = 0;
        if( ::SendMessageTimeout( 
                            hWnd, 
                            WM_HTML_GETOBJECT, 
                            0, 
                            0, 
                            SMTO_ABORTIFHUNG, 
                            1000, 
                            &res 
                            ) == 0 )
        {
            return E_FAIL;
        }

        return ::ObjectFromLresult( 
                                res, 
                                IID_IHTMLDocument2, 
                                0, 
                                reinterpret_cast< void ** >( ppDocument )
                                );
    }

Our hit test can now be implemented as:

    HRESULT HitTest( HWND hWnd, const POINT &pt )
    {
        HRESULT hr;
        CComPtr< IHTMLDocument2 >    pDoc;
        hr = GetDocument( hWnd, &pDoc );

        if( pDoc )
        {
            CComPtr< IHTMLElement > pElement;
            hr = pDoc->elementFromPoint( pt.x, pt.y, &pElement );

            //
            // Extract the url and create the web browser window as before
            //
        }

        return hr;
    }

I've now run the plug-in against IE 5.5 and (thankfully!) it has the same window class hierarchy as IE 6.0. Note that IE 4.0 does not support browser helper objects so the plug-in won't work on this platform.

Conflicts with other Browser Helper Objects

A number of users of version 1.0 of the plug-in informed me that it crashed whenever they performed any gesture using the right mouse button. Of course, everything worked fine on my development machine ;-)!!

tong_du noted that while he had the Google toolbar installed the mouse gestures were dead. As soon as he uninstalled the Google toolbar, the mouse gestures worked fine.

Jeff Fitzsimons debugged the code and found the SynthesiseRightClick method failed to (temporarily) unsubclass the window resulting in the plug-in sending WM_RBUTTONDOWN/WM_RBUTTONUP causing the SynthesiseRightClick function to be called recursively, resulting in a stack overflow. (See Re: Right click problem: too.)

The call to UnsubclassWindow was failing because the WNDPROC stored by the window (see GetWindowLong and GWL_WNDPROC) differed to the WNDPROC stored in the window's thunk (see p423-429 of ATL Internals for an explanation of how ATL employs thunks). I'm guessing, but I suspect the reason that UnsubclassWindow fails is that another ATL CWindowImplBaseT also subclasses the Internet Explorer_Server.

I got around this by setting a flag (m_isSuspended) in the MouseWatcher class and when the flag is set all messages are ignored. See the BEGIN_MSG_MAP_EX macro in MouseTracker.h.

Many thanks to tong_du, Jeff Fitzsimons and Rich Buckley for their help in tracking this problem. Cheers Guys!!!

Conflicts with right-left scripts

A few of the users of the plug-in had noted that when using the plug-in with right-left scripts (like Hebrew or Azerri(Cyrillic)), keyboard input was getting screwed up. After a couple of frustrating months being unable to recreate this on my dev machine(s) I finally managed to track down the problem. The original mouse gestures plug-in used the MBCS character set. As soon as I rebuilt the plug-in to use the Unicode character set, keyboard input worked the same with or without the plug-in.

I guess having MBCS and Unicode message handlers on a window isn't a good idea! Many thanks to lkj1, orlink and Eli Stern who gave me much valuable feedback in trying to resolve this bug.

Using Mouse Gestures

Once the mouse gestures plug-in has been installed/registered, all you need to do is open a new instance of Internet Explorer. If everything is working OK, you should have a new entry 'Mouse Gestures...' in IE's Tools menu:

Clicking on this option should bring up the following dialog box:

To get the idea of how to use mouse gestures:

Move the mouse over the bitmap (currently showing two cars in a circle)
Hold down the right mouse button, and drag the mouse down - you won't have to move the mouse too far
Let go off the right mouse button

Hopefully the bitmap has been updated to show a down arrow:

It's as easy as that. Scroll the items in the Gesture combo box to see the gestures currently supported and the actions associated with the gesture.

To associate your own actions with a particular gesture, simply select the gesture (or perform it over the image), select the action, and click on 'Apply'.

User Actions

To support a wider range of actions, the Mouse Gestures plug-in allows you to associate keyboard shortcuts with a gesture:

Set the keyboard focus to the 'Shortcut key' edit box, and then hold down the required keys. The edit box will update to show the shortcut key. Note that invalid shortcuts (like the Ctrl key on its own) aren't allowed, and will be displayed as 'None' in the shortcut key window.

Now, whenever you perform the appropriate gesture, the Mouse Gestures plug-in will send the equivalent key strokes to Internet Explorer.

The 'Open in new window' action

A final special note about the 'Open in new foreground/background window' action:

If the gesture is started over a hyperlink, the target of the hyperlink is opened in a new window (the same as if you'd right clicked on the hyperlink and selected 'Open in New Window').
If the gesture is started over a region of highlighted text, then the Mouse Gestures plug-in will treat the text as a hyperlink target, and try to open the new window at that page.
If the gesture is started any where else in the browser window, then a new window is opened at your home page.

Mouse Trails

One of the most requested enhancements to the plug-in was the ability to draw mouse trails. This seemed like an interesting thing to do, so when I finally committed to doing this, I looked around to see what had already been done in this area. I found a plug-in for the Firefox browser, but the trails it generated were at best basic. I wanted something that looked like XP not Win3.1!

After much prototyping, I've come up with something that ticks all the right boxes. To see how the mouse trails works, select the 'Mouse Trails' sheet of the Mouse Gestures configuration dialog:

Make sure that 'Enable mouse trails' is ticked, and perform a mouse gesture over the exclamation icon. The trail should look something like this:

When you end the gesture, the trail fades into the background. You might need to play around with the settings to determine the best combination of settings for your PC. Note: You'll need a quite powerful graphics card to get the full mouse trail experience. The trails will draw on a 2MB video card, but they can make the mouse a little jerky.

Details of how the mouse trail window is implemented are available in MouseTrail.h and TrailWindow.h in the attached source code.

Improvements

Some extra functionality that I eventually intend to add:

Add gesture support to applications that use an embedded Microsoft Web Browser control (like MSDN and Outlook).
Update the installer so that the DLL is removed when IE or Explorer has an instance of the DLL open.

Any other suggestions welcome.

Revision History

25 September 2003: Initial version (1.0)
04 November 2003: Version 1.0.0.4 - Bug fixes
09 April 2004: Version 1.0.1.1 - Updated to work with right-left scripts and for Visual Studio .NET 2003
27 May 2004: Version 1.0.3.1 - Bug fixes (special thanks to dchris_med!)
10 June 2004: Version 1.0.4.1 - Added new actions and gestures
24 June 2004: Version 1.1.0.2 - Added support for keyboard shortcuts (& bug fixes)
07 September 2005: Version 1.2.0.1 - Added mouse trails, IE7 (Beta 1) support