Communicating from the Browser to a Desktop Application

MarkGladding

4.84/5 (25 votes)

May 17, 2009

CPOL

15 min read

144831

4843

A very simple application that uses text to speech to speak out loud the currently selected block of text in your web browser.

Download source code - 134 KB

Introduction

There are many scenarios where it's useful to integrate a desktop application with the web browser. Given that most users spend the majority of their time today surfing the web in their browser of choice, it can make sense to provide some sort of integration with your desktop application. Often, this will be as simple as providing a way to export the current URL or a selected block of text to your application. For this article, I've created a very simple application that uses text to speech to speak out loud the currently selected block of text in your web browser.

Internet Explorer provides many hooks for integrating application logic into the browser, the most popular being support for adding custom toolbars. There are great articles explaining how to do this on Code Project, such as this Win32 article and this .NET article. You can also create toolbars for Firefox using their Chrome plug-in architecture which uses XML for the UI layout and JavaScript for the application logic (see CodeProject article).

What about the other browsers, Google Chrome, Safari, and Opera? Given there is no common plug-in architecture used by all browsers, you can see a huge development effort is required to provide a separate toolbar implementation for each browser.

It would be nice to be able to write one toolbar that could be used across all browsers. This is not possible today, but you can achieve almost the same effect using Bookmarklets.

Bookmarklets

A bookmarklet is special URL that will run a JavaScript application when clicked. The JavaScript will execute in the context of the current page. Like any other URL, it can be bookmarked and added to your Favourites menu or placed on a Favourites toolbar. Here is a simple example of a bookmarklet:

javascript:alert('Hello World.');

Here is a slightly more complex example that will display the currently selected text in a message box:

<a href="javascript:var q = ''; if (window.getSelection) 
  q = window.getSelection().toString(); else if (document.getSelection) 
  q = document.getSelection(); else if (document.selection) 
  q = document.selection.createRange().text; if(q.length > 0) 
  alert(q); else alert('You must select some text first');">Show Selected Text</a>

Show Selected Text

Select a block of text on the page and click the above link. The text will be shown in a message box.

Output from Show Selecte Text bookmarklet

Now, drag the bookmarklet and drop it on your Favourites toolbar (for IE, you need to right click, select Add to Favourites, and then create it in the Favourites Bar). Navigate to a new page, select a block of text, and click the Show Selected Text button. Once again, the selected text will be shown in a message box.

You can see the potential of bookmarklets. You can create a bookmarklet for each command of your application and display them on a web page. The user can then select the commands they want to use and add them to their Favourites (either the toolbar or menu).

The downside is it's slightly more effort to install than a single toolbar, but on the upside, it gives the user a lot of flexibility. They need only choose the commands they're interested in and can choose whether they want them accessible from a toolbar or menu.

From a developer's perspective, bookmarklets are great as they're supported by all the major browsers. The only thing you need to worry about is making sure your JavaScript code handles differences in browser implementations, something that is well documented and understood these days (although still a right pain).

Communicating with a Desktop Application from JavaScript

Bookmarklets allow you to execute an arbitrary block of JavaScript code at the click of a button, but how do you use this to communicate with a desktop application?

The answer is to build a web server into your desktop application and issue commands from your JavaScript code using HTTP requests.

Now, before you baulk at the idea of building a web server into your application, it's actually very simple. You don't need a complete web server implementation. You just need to be able to process simple HTTP GET requests. A basic implementation is as follows.

Listen for a socket connection on a specific port (80 is the default for the HTTP protocol, but you should use a different port for your application).
Accept a connection and read the HTTP GET request.
Extract the URL from the GET header and execute the associated command in your application.
Send an HTTP Response back to the browser.
Close the connection.
Go back to 1.

The .NET framework 2.0 has an HttpListener class and associated HttpListenerRequest and HttpListenerResponse classes that allow you to implement the above in a few lines of code.

On the browser, you need a way of issuing HTTP requests from your JavaScript code. There are a number of ways of issuing HTTP requests from JavaScript.

The simplest is to write a new URL into the document.location property. This will cause the browser to navigate to the new location. However, this is not what we want. We don't want to direct the user to a new page when they click one of our bookmarklets. Instead, we just want to issue a command to our application while remaining on the same page.

This sounds like a job for AJAX and the HttpXmlRequest. AJAX provides a convenient means of issuing requests to a web server in the background without affecting the current page. However, there is one important restriction placed on the HttpXmlRequest called the same domain origin policy. Browsers restrict HttpXmlRequests to the same domain as that used to serve the current page. For example, if you are viewing a page from codeproject.com, you can only issue HttpXmlRequests to codeproject.com. A request to another domain (e.g., google.com) will be blocked by the browser. This is an important security measure that ensures malicious scripts cannot send information to a completely different server behind the scenes without your knowledge.

This restriction means that we cannot use a HttpXmlRequest to communicate with our desktop application. Remember that JavaScript bookmarklets are executed in the context of the current page. We need to be able to send a request from any domain (e.g. codeproject.com) to our desktop application which will be in the localhost domain.

In order to overcome this problem, I turned to Google for inspiration. Google needs to be able to do precisely this in order to gather analytics information for a site. If you're not familiar with Google analytics, it can be used to gather a multitude of information about the visitors to your web site, such as the number of visitors, where they came from, and the pages on your site they visit. All this information is collected, transferred back to Google, and appears in various reports in your analytics account.

To add tracking to your site, you simply add a call to the Google analytics JavaScript at the bottom of every page of your site.

Whenever a visitor lands on your page, the JavaScript runs and the visitor details are sent back to your Google analytics account.

The question is how does Google do this? Surely, they can't use an HttpXmlRequest as it would break the same domain origin policy? They don't. Instead, they use what can only be described as a very clever technique.

Using the JavaScript Image class to make asynchronous cross-domain requests

The JavaScript Image class is a very simple class that can be used to asynchronously load an image. To request an image, you simply set the source property to the URL of the image. If the image loads successfully, the onload() method is called. If an error occurs, the onerror() method is called. Unlike the HttpXmlRequest, there is no same domain origin policy. The source image can be located on any server. It doesn't need to be hosted on the same site as the current page.

We can use this behaviour to send arbitrary requests to our desktop application (or any domain for that matter) if we realize the source URL can contain any information, including a querystring. The only requirement is that it returns an image. Here is an example URL:

<a href="http://localhost:60024/speaktext/dummy.gif?text=Hello">
  http://localhost:60024/speaktext/dummy.gif?text=Hello%20world</a>

We can easily map this URL to the following command in our application.

public void speaktext(string text);

In order to ensure the request completes without error, a 1x1 pixel GIF image is returned. This image is never actually shown to the user. A tiny image is used to minimize the number of bytes being transmitted.

The most important point to realize is all communication is one way, from the browser to the desktop application. There is no way of sending information from the desktop application back to the browser. However, for many applications, this is not a problem.

Google uses the JavaScript Image technique to send visitor information to your pages (hosted on yourdomain.com) back to your Google analytics account (hosted on google.com).

Maximum URL Length

You need to be aware that URLs have a maximum length that varies from browser to browser (around 2K - check). This restricts the amount of information you can send in a single request. If you need to send a large amount of information, you'll need to break it up into smaller chunks and send multiple requests. The sample application, BrowserSpeak, uses this technique to speak arbitrarily large blocks of text.

Text Encoding

JavaScript will automatically encode a URL you pass to Image.src as UTF-8. However, when passing arbitrary text as part of a URL, you will need to escape the '&' and '=' characters. These characters are used to delimit the name/value pairs (or arguments) that are passed in the querystring portion of the URL. This can be done using the JavaScript escape() function.

Avoiding the Cache

Web browsers will cache images (as well as many other resources) locally to avoid making multiple requests back to the server for the same resource. This behaviour is disastrous for our application. The first command will make it through to our desktop application, and the browser will cache dummy.gif locally. Subsequent requests will never reach our desktop application as they can be satisfied from the local cache.

There are a couple of solutions to this problem. One answer is to set the cache expiry directives in the HTTP response to instruct the browser never to cache the result.

The other approach, which is used for the BrowserSpeak application, is to ensure every request has a unique URL. This is done by appending a timestamp containing the current date and time. For example:

var request = "http://" + server + "/" + 
  command + "/dummy.gif" + args + 
  "&timestamp=" + new Date().getTime();

BrowserSpeak - A Concrete Example

It's now time to put all this theory into practice and create a sample application that has some real world use.

BrowserSpeak is a C# application that will speak the text on a web page out loud. It can be used when you are tired of reading large passages of text from the screen. It uses the System.Speech.Synthesis component found in the .NET Framework 3.0 for the text to speech functionality.

The BrowserSpeak Application

BrowserSpeak provides the following commands, available through its web interface and through its UI.

SpeakText
StopSpeaking
PauseSpeaking
ResumeSpeaking

It also provides a BufferText command available from the web interface. This command is used to send a block of text from the web browser to the desktop application. It splits the text into 1500 byte chunks so it's not limited by the maximum size of a URL. It's used by the Speak Selected bookmarklet to transfer the selected text to the BrowserSpeak application prior to speaking.

BrowserSpeak uses the following bookmarklets (drag these onto your Favourites bar to use from your browser):

The Speak Selected command is the most complex and also the most interesting. It's listed below:

// A bookmarklet to send a speaktext command to the BrowserSpeak application.

var server = "localhost:60024";
// Change the port number for your app to something unique.

var maxreqlength = 1500;
// This is a conservative limit that should work with all browsers.

var selectedText = _getSelectedText();

if(selectedText)
{
    _bufferText(escape(selectedText));
    _speakText();
}

void 0;
// Return from bookmarklet, ensuring no result is displayed.

function _getSelectedText()
{
    // Get the current text selection using
    // a cross-browser compatible technique.

    if (window.getSelection) 
        return window.getSelection().toString();
    else if (document.getSelection) 
        return document.getSelection(); 
    else if (document.selection) 
        return document.selection.createRange().text; 

    return null;
}

function _formatCommand(command, args)
{
    // Add a timestamp to ensure the URL is always unique and hence
    // will never be cached by the browser.

    return "http://" + server + "/" + command + 
           "/dummy.gif" + args + 
           "&timestamp=" + new Date().getTime(); 
}

function _speakText()
{
    var image = new Image(1,1); 
    image.onerror = function() { _showerror(); };
    image.src = _formatCommand("speaktext", "?source=" + document.URL); 
}


function _bufferText(text)
{
    var clearExisting = "true"; 
    var reqs = Math.floor((text.length + maxreqlength - 1) / maxreqlength);
    for(var i = 0; i < reqs; i++)
    {
        var start = i * maxreqlength;
        var end = Math.min(text.length, start + maxreqlength);
        var image = new Image(1,1); 
        image.onerror = function() _showerror(); };
        image.src = _formatCommand("buffertext", 
          "?totalreqs=" + reqs + "&req=" + (i + 1) + 
          "&text=" + text.substring(start, end) + 
          "&clear=" + clearExisting); 
        clearExisting = "false";
    }
}

function _showerror() 
{
    // Display the most likely reason for an error 
    alert("BrowserSpeak is not running. You must start BrowserSpeak first."); 
}

Most of the code is self-explanatory. However, it's important to explain the behaviour of the _bufferText() loop. If the text being sent is greater than 1500 bytes, then multiple requests will be made. Remember that as far as the browser is concerned, it's requesting an image. Modern browsers will issue multiple image requests in parallel. This will cause multiple buffertext commands to be issued in parallel. Not only that, it's quite possible the requests will arrive out of order at the BrowserSpeak desktop application. Therefore, every request includes the parameters req (the request number) and totalreqs (the total number of requests). This allows the BrowserSpeak application to reassemble the text into the correct order.

Managing Bookmarklet Code

The code for a bookmarklet must be formatted into a single line. For very small applications, this is not a problem. However, when you start to develop larger, more complex applications, you will want to develop your code over multiple lines, with plenty of whitespace and comments. I've found that using a JavaScript minifier, in particular the free YUI Compressor, is a great way of turning a normal chunk of JavaScript into a single line suitable for use in a bookmarklet. Ideally, you'd add this step into your automated build process.

The C# Application

The main application-specific logic lives in the MainForm class. First, it starts an HttpCommandDispatcher instance in the constructor, responsible for receiving and dispatching HTTP commands (sent from the bookmarklets). The MainForm class then listens for various events and updates the UI to reflect its current state.

It listens for button clicks and issues the appropriate commands.
It listens to changes in state from the SpeechController and updates the button states appropriately (e.g., enable the Stop button when speech is playing).
It listens for changes to the TextBuffer (updated by the BufferTextCommand) and reflects its contents in the TextBox on the main form.
It listens for the HttpCommandDispatcher.RequestReceived event and displays the received requests in the Requests window.

The HttpCommandDispatcher listens for HTTP requests using the HttpListener class found in System.Net. When a request is received, it extracts the command from the URL, looks up the appropriate HttpCommand, and calls the HttpCommand.Execute() method. It will also send a response with a dummy.gif image (this is preloaded and stored in a byte[] array).

A word about text encoding and extracting arguments from the URL. The HttpListenerRequest has a QueryString property that is a name/value collection containing the arguments received in the querystring portion of the URL. Unfortunately, I found you couldn't use this property as the argument values are not correctly decoded from their UTF-8 encoding. Instead, I parse the RawUrl property manually and call the HttpUtility.DecodeUrl() method on each argument value. This correctly handles the UTF-8 encoded strings we receive from JavaScript.

You will probably recognize the Command pattern. You must derive a class from HttpCommand for every command you wish to make available through the HTTP interface. Each command must be added using the HttpCommandDispatcher.AddCommand() method.

An abstract TextCommand is provided for use by commands that need to receive large amounts of text from the browser (e.g. the SpeakTextCommand). A TextCommand will listen to the TextBuffer and call the abstract TextAvailable method whenever new text arrives. Derived classes need to override this method and execute their operation whenever this method is called. This handles the case where the command arrives from the browser before all the text the command operates on has arrived.

One additional piece of functionality that's provided but not actually used in the sample application is the ImageLocator class. This class will take an HTTP request for an image, look up the appropriate image from the application's resources, and return an image in the requested format. For example, you can view the icon used for the About button using the following URL:

<a href="http://localhost:60024/house.png">http://localhost:60024/house.png</a>

The classes described above live in the HttpServer namespace and are pretty much decoupled from the BrowserSpeak application. You should be able to lift these out and drop them into your own application without change.

HttpListener and Vista

If you try and run BrowserSpeak on Vista, you will get an Access Denied exception when you try to start the HttpListener. Vista doesn't let standard users register and listen for requests to a specific URL. You could run your application as Administrator, but a better approach is to grant all users permission to register and listen on your application's URL. That way, you can run your application as a standard user. You can do this using the netsh utility. To grant BrowserSpeak's URL user permissions, execute the following command from a command prompt running as Administrator.

netsh http add urlacl url=http://localhost:60024/ user=BUILTIN\Users listen=yes

This setting is persistent and will survive reboots. When you deploy your application, you should execute this command as part of your installer.

Text to Speech

The text to speech functionality used by the application is found in the SpeechController class. Thanks to the functionality provided in the System.Speech.Synthesis namespace found in .NET 3.0, this class does almost nothing. It merely delegates through to an instance of the Microsoft SpeechSynthesizer class. If you wanted to remove the dependency on .NET 3.0, you could reimplement the SpeechController class and use COM-interop to access the native Microsoft Speech APIs (SAPI).

Final Thoughts

I hope I've demonstrated the power of bookmarklets in this article and given you some ideas on how to provide useful integration between the web browser and a desktop application.

There are a couple of things I couldn't get working to my satisfaction using this technique:

There seems no way of specifying the tooltip displayed for bookmarklets. Most browsers just show a portion of the JavaScript code - not particularly enlightening for the user.
There doesn't seem to be a reliable way of specifying an associated icon for a bookmarklet. Safari doesn't use icons at all on its Favourites bar. I was hoping the other browsers would at least use the website's favicon. You could even use a different favicon for each command by placing each bookmarklet on its own page and specifying unique favicons for each page using HTML similar to the following in the page header.

<link rel="shortcut icon" href="speaktext.ico" type="image/vnd.microsoft.icon"/>

Unfortunately, the browsers don't seem to use the favicon for bookmarklets.

I've made use of this technique in the latest version of my commercial text to speech application, Text2Go. I've also used a variation of this technique to add a menu of JavaScript bookmarklets in Internet Explorer 8's Accelerator preview window.