Click here to Skip to main content
Click here to Skip to main content

An Asynchronous Pluggable Protocol Handler for data: URLs

By , 28 Jan 2006
 

Sample Image - DataProtocol.gif

Introduction

I first encountered the data: protocol when I saw the JavaScript Draw site, an AJAX implementation of a scribble application. The problem with the site was that it did not work with Internet Explorer. Going through the source code, I found that among the various reasons it did not work was that it used data: URLs as the source for dynamically created images, which is not supported for Internet Explorer.

The data: protocol is described in RFC 2397. Currently, the only browsers that support the data: protocol are Opera and Mozilla Firefox. This article describes an asynchronous pluggable protocol implementation to support the data: protocol in Internet Explorer. One possible use of the data: protocol is to embed small images in the HTML itself to avoid server hits. It can also be useful in AJAX applications like JavaScript Draw to return images, encoded in base64, as the response text.

The data: URL format

The protocol itself, as described in the RFC, is quite simple. The format is:

    dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
    mediatype  := [ type "/" subtype ] *( ";" parameter )
    data       := *urlchar
    parameter  := attribute "=" value

The media type indicates the type of the data and its encoding. The default media type is text\plain;charset=US-ASCII. For an image, the media type can be image/gif, image/png etc. The optional base64 part of the URL indicates that the actual data represented in the URL is encoded in base64 format. Although the primary use of base64 encoding in URLs is for binary data, it can also be used to encode text. Finally, the data portion of the URL is the actual encoded data represented by the URL.

The next sections describe how the protocol was implemented.

Parsing the URL

ATL's regular expression classes come in pretty handy to parse the data: URLs. The following is a regular expression to parse the URL:

data:{(.*?/.*?)}?(;{.*?}={.*?})?{;(base64)?}?,{.*}

The regular expression captures the various portions of the URL into five different groups:

Group Capture
0

The type/subtype portion of the media type or the MIME type

1

The attribute name of any additional parameter specified with the MIME type

2

The value of the attribute captured in group 1
3 The base64 string
4 The actual data string

After capturing the different portions of the URL, the Base64 encoded data is converted into bytes. The ATL function Base64Decode is used for this.

    int nReqLen = Base64DecodeGetRequiredLength(strData.GetLength());
    m_pvData = new BYTE[nReqLen]; 
    int nDestLen = nReqLen; 
    bRet = Base64Decode(strData, strData.GetLength(), m_pvData, &nDestLen) != 0; 
    m_dwDataLength = nDestLen;

Converting the Text Data to Unicode

If the data format is text, the text is converted into Unicode so that Internet Explorer can handle it correctly. The encoding of the source data comes from the charset attribute specified in the parameter portion of the media type. An example of such a URL is data:text/plain;charset=iso-8859-8-i,%f9%ec%e5%ed - which is some Hebrew text encoded in ISO-8859-8-i.

To convert the multi byte to Unicode, we have to use the famous MultiByteToWideChar function. The MultiByteToWideChar function requires a DWORD codepage identifier. A little bit of research revealed that the IMultiLanguage interface in the MLang API can be used to obtain the codepage identifier from the named charsets.

CComPtr<IMultiLanguage2> spMLang;
if (SUCCEEDED(hr = spMLang.CoCreateInstance(CLSID_CMultiLanguage)))
{ MIMECSETINFO mi;
 if (SUCCEEDED(hr = spMLang->GetCharsetInfo(CComBSTR(GetCharset()), &mi)))
 ...
}

The MIMECSETINFO structure is declared as:

typedef struct tagMIMECSETINFO { 
 UINT uiCodePage; 
 UINT uiInternetEncoding; 
 WCHAR wszCharset[MAX_MIMECSET_NAME]; 
} MIMECSETINFO, *PMIMECSETINFO;

From the first glance, it seems that the uiCodePage member will give the required code page identifier; however, in my experience, under certain circumstances, the uiCodePage member is the required codepage, and in some other circumstances, the uiInternetEncoding is the required value. Unfortunately, I could not locate any document describing when to use what. As a result, the code to convert charsets becomes a little ugly.

int nSrcLen = strData.GetLength();
UINT uCodePage = mi.uiInternetEncoding;
int nWideChar = MultiByteToWideChar(uCodePage, 0, 
               (LPCSTR)strData, nSrcLen, NULL, 0);
if (nWideChar == 0)
{ 
    uCodePage = mi.uiCodePage;
    nWideChar = MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, NULL, 0);
}
if (nWideChar != 0)
{
    WCHAR* sz = new WCHAR[nWideChar + 1];
    MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, sz + 1, nWideChar);
    m_pvData = (BYTE*)sz;
    m_dwDataLength = (nWideChar + 1) * 2; 
    //If data is in Unicode it should have unicode lead bytes
    m_pvData[0] = 0xFF;
    m_pvData[1] = 0xFE;
}

Once the characters are converted to a Unicode stream of bytes, the byte stream needs to be prefixed with the Unicode lead bytes to indicate to Internet Explorer. The lead bytes are 0xFFFE.

The URL parsing gave us the data and the MIME type of the data. The actual implementation of the pluggable protocol is pretty simple.

Implementing the Asynchronous Pluggable Protocol Handler

An asynchronous pluggable protocol handler is a COM object that implements the IInternetProtocol and the IInternetProtocolInfo interfaces. For Internet Explorer to use the URL scheme handled by the protocol, the registration entries need to be added at HKEY_CLASSES_ROOT\PROTOCOLS\Handler. The following is an extract from the .rgs file for the protocol COM object.

HKCR
{ ...
    NoRemove PROTOCOLS
    {
        NoRemove Handler
        {
            ForceRemove data = s 'data: pluggable protocol'
            {
                val CLSID = s '{C79BF22F-25C4-4D3D-8183-14149EAB9C0C}'
            }
        }
    }
}

The only interesting methods in the implementation of the pluggable protocol handler are IInternetProtocol::Start and IInternetProtocol::Read. The IInternetProtocol::Start is called by Internet Explorer (actually, urlmon.dll) indicating the handler that data needs to be downloaded from a given URL. The pluggable protocol handler parses the URL and downloads the data. It notifies the caller of the progress using the IInternetProtocolSink-caller supplied callback interface. The caller calls IInternetProtocol::Read to read chunks of data depending on the status information received from the protocol handler. The start method of the data protocol handler is implemented as:

STDMETHODIMP CDataPluggableProtocol::Start(
    LPCWSTR szUrl,
    IInternetProtocolSink *pIProtSink,
    IInternetBindInfo *pIBindInfo,
    DWORD grfSTI,
    DWORD dwReserved)
{
    HRESULT hr = S_OK;

    if (m_url.Parse(szUrl))
    {
        m_dwPos = 0;

        CAtlString strData = m_url.GetDataString();
            
        pIProtSink->ReportProgress(BINDSTATUS_FINDINGRESOURCE, strData);
        pIProtSink->ReportProgress(BINDSTATUS_CONNECTING, strData);
        pIProtSink->ReportProgress(BINDSTATUS_SENDINGREQUEST, strData);
        pIProtSink->ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, 
                                      m_url.GetMimeType());
        pIProtSink->ReportData(BSCF_FIRSTDATANOTIFICATION, 0, 
                                  m_url.GetDataLength());
        pIProtSink->ReportData(BSCF_LASTDATANOTIFICATION | 
                                  BSCF_DATAFULLYAVAILABLE, 
                                  m_url.GetDataLength(), 
                                  m_url.GetDataLength());
        
    }
    else
    {
        if (grfSTI & PI_PARSE_URL)
            hr = S_FALSE;
    }

    return hr;
}

The function parses the URL which automatically extracts the data. The code then sends a series of notifications to the caller. The important call is ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, m_url.GetMimeType()); which indicates the MIME type of the data to the caller so that the caller can handle the data accordingly. The caller then calls IInternetProtocol::Read to read the data.

Testing the Protocol Handler

The protocol handler is automatically registered when the project is built. Once the handler is registered, data: URLs will start working in Internet Explorer. The protocol handler has been tested with the data: URL Tests at the mozilla.com testing website. The handler passes all the tests, except one. The test fails because of the limitation of Internet Explorer URL length. So far, no security issues have been identified. I welcome readers to indicate any possible security issues with the protocol handler.

History

  • January 28, 2006 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Rama Krishna Vavilala
Architect
United States United States
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Questionhow to implement a protocol that used in '<img>'s src attribut</img>memberchinese_zmm18 Mar '09 - 20:48 
GeneralDownload Managermemberbailingout@gmx.de9 Jun '08 - 6:17 
QuestionLicense or Public Domain?memberR. Douglas Barbieri25 Jan '08 - 6:21 
AnswerRe: License or Public Domain?memberRama Krishna Vavilala25 Jan '08 - 6:24 
GeneralRe: License or Public Domain?memberR. Douglas Barbieri25 Jan '08 - 6:36 
GeneralRe: License or Public Domain?memberR. Douglas Barbieri25 Jan '08 - 6:48 
GeneralMemory Leaks [modified]memberImpeller19 Aug '07 - 3:04 
Questionapp protocol and windows installermemberybouts25 Jan '07 - 19:57 
QuestionRe: app protocol and windows installermemberJohn Crenshaw22 Feb '07 - 8:00 
AnswerRe: app protocol and windows installermemberJohn Crenshaw22 Feb '07 - 9:06 
AnswerRe: app protocol and windows installermemberybouts25 Feb '07 - 17:13 
QuestionRe: app protocol and windows installermembermarisademeglio19 Dec '09 - 8:50 
GeneralMemory Leaksmemberimagiro16 Jan '07 - 22:37 
GeneralRe: Memory LeaksmvpRama Krishna Vavilala17 Jan '07 - 3:36 
GeneralRe: Memory Leaksmemberimagiro24 Jan '07 - 2:41 
GeneralRe: Memory Leaksmembercarabutnicolae123419 Dec '07 - 20:48 
GeneralImage sizememberleomagic15 Mar '06 - 2:24 
GeneralRe: Image sizememberRama Krishna Vavilala15 Mar '06 - 5:31 
Generalwinxp sp2memberaraud3 Feb '06 - 4:28 
GeneralThey domemberRama Krishna Vavilala3 Feb '06 - 5:15 
GeneralGreatmemberGilad Novik29 Jan '06 - 20:45 
GeneralRe: GreatmemberRama Krishna Vavilala30 Jan '06 - 3:26 
GeneralRe: Greatmemberhector santos30 Jan '06 - 9:31 
GeneralRe: GreatmemberRama Krishna Vavilala30 Jan '06 - 15:32 
GeneralGood Articlememberhector santos29 Jan '06 - 0:10 
GeneralRe: Good ArticlememberRama Krishna Vavilala29 Jan '06 - 11:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130516.1 | Last Updated 28 Jan 2006
Article Copyright 2006 by Rama Krishna Vavilala
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid