Click here to Skip to main content
15,880,392 members
Articles / Desktop Programming / ATL
Article

An Asynchronous Pluggable Protocol Handler for data: URLs

Rate me:
Please Sign up or sign in to vote.
4.76/5 (21 votes)
28 Jan 2006CPOL5 min read 162.3K   2K   60   26
This article describes an asynchronous pluggable protocol implementation to support the data: protocol, as described in RFC 2397, in Internet Explorer.

Sample Image - DataProtocol.gif

Introduction

I first encountered the data: protocol when I saw the JavaScript Draw site, an AJAX implementation of a scribble application. The problem with the site was that it did not work with Internet Explorer. Going through the source code, I found that among the various reasons it did not work was that it used data: URLs as the source for dynamically created images, which is not supported for Internet Explorer.

The data: protocol is described in RFC 2397. Currently, the only browsers that support the data: protocol are Opera and Mozilla Firefox. This article describes an asynchronous pluggable protocol implementation to support the data: protocol in Internet Explorer. One possible use of the data: protocol is to embed small images in the HTML itself to avoid server hits. It can also be useful in AJAX applications like JavaScript Draw to return images, encoded in base64, as the response text.

The data: URL format

The protocol itself, as described in the RFC, is quite simple. The format is:

dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype  := [ type "/" subtype ] *( ";" parameter )
data       := *urlchar
parameter  := attribute "=" value

The media type indicates the type of the data and its encoding. The default media type is text\plain;charset=US-ASCII. For an image, the media type can be image/gif, image/png etc. The optional base64 part of the URL indicates that the actual data represented in the URL is encoded in base64 format. Although the primary use of base64 encoding in URLs is for binary data, it can also be used to encode text. Finally, the data portion of the URL is the actual encoded data represented by the URL.

The next sections describe how the protocol was implemented.

Parsing the URL

ATL's regular expression classes come in pretty handy to parse the data: URLs. The following is a regular expression to parse the URL:

data:{(.*?/.*?)}?(;{.*?}={.*?})?{;(base64)?}?,{.*}

The regular expression captures the various portions of the URL into five different groups:

GroupCapture
0

The type/subtype portion of the media type or the MIME type

1

The attribute name of any additional parameter specified with the MIME type

2

The value of the attribute captured in group 1
3The base64 string
4The actual data string

After capturing the different portions of the URL, the Base64 encoded data is converted into bytes. The ATL function Base64Decode is used for this.

int nReqLen = Base64DecodeGetRequiredLength(strData.GetLength());
m_pvData = new BYTE[nReqLen];
int nDestLen = nReqLen;
bRet = Base64Decode(strData, strData.GetLength(), m_pvData, &nDestLen) != 0;
m_dwDataLength = nDestLen;

Converting the Text Data to Unicode

If the data format is text, the text is converted into Unicode so that Internet Explorer can handle it correctly. The encoding of the source data comes from the charset attribute specified in the parameter portion of the media type. An example of such a URL is data:text/plain;charset=iso-8859-8-i,%f9%ec%e5%ed - which is some Hebrew text encoded in ISO-8859-8-i.

To convert the multi byte to Unicode, we have to use the famous MultiByteToWideChar function. The MultiByteToWideChar function requires a DWORD codepage identifier. A little bit of research revealed that the IMultiLanguage interface in the MLang API can be used to obtain the codepage identifier from the named charsets.

CComPtr<IMultiLanguage2> spMLang;
if (SUCCEEDED(hr = spMLang.CoCreateInstance(CLSID_CMultiLanguage)))
{ MIMECSETINFO mi;
 if (SUCCEEDED(hr = spMLang->GetCharsetInfo(CComBSTR(GetCharset()), &mi)))
 ...
}

The MIMECSETINFO structure is declared as:

typedef struct tagMIMECSETINFO { 
 UINT uiCodePage; 
 UINT uiInternetEncoding; 
 WCHAR wszCharset[MAX_MIMECSET_NAME]; 
} MIMECSETINFO, *PMIMECSETINFO;

From the first glance, it seems that the uiCodePage member will give the required code page identifier; however, in my experience, under certain circumstances, the uiCodePage member is the required codepage, and in some other circumstances, the uiInternetEncoding is the required value. Unfortunately, I could not locate any document describing when to use what. As a result, the code to convert charsets becomes a little ugly.

int nSrcLen = strData.GetLength();
UINT uCodePage = mi.uiInternetEncoding;
int nWideChar = MultiByteToWideChar(uCodePage, 0, 
               (LPCSTR)strData, nSrcLen, NULL, 0);
if (nWideChar == 0)
{ 
    uCodePage = mi.uiCodePage;
    nWideChar = MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, NULL, 0);
}
if (nWideChar != 0)
{
    WCHAR* sz = new WCHAR[nWideChar + 1];
    MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, sz + 1, nWideChar);
    m_pvData = (BYTE*)sz;
    m_dwDataLength = (nWideChar + 1) * 2; 
    //If data is in Unicode it should have unicode lead bytes
    m_pvData[0] = 0xFF;
    m_pvData[1] = 0xFE;
}

Once the characters are converted to a Unicode stream of bytes, the byte stream needs to be prefixed with the Unicode lead bytes to indicate to Internet Explorer. The lead bytes are 0xFFFE.

The URL parsing gave us the data and the MIME type of the data. The actual implementation of the pluggable protocol is pretty simple.

Implementing the Asynchronous Pluggable Protocol Handler

An asynchronous pluggable protocol handler is a COM object that implements the IInternetProtocol and the IInternetProtocolInfo interfaces. For Internet Explorer to use the URL scheme handled by the protocol, the registration entries need to be added at HKEY_CLASSES_ROOT\PROTOCOLS\Handler. The following is an extract from the .rgs file for the protocol COM object.

HKCR
{ ...
    NoRemove PROTOCOLS
    {
        NoRemove Handler
        {
            ForceRemove data = s 'data: pluggable protocol'
            {
                val CLSID = s '{C79BF22F-25C4-4D3D-8183-14149EAB9C0C}'
            }
        }
    }
}

The only interesting methods in the implementation of the pluggable protocol handler are IInternetProtocol::Start and IInternetProtocol::Read. The IInternetProtocol::Start is called by Internet Explorer (actually, urlmon.dll) indicating the handler that data needs to be downloaded from a given URL. The pluggable protocol handler parses the URL and downloads the data. It notifies the caller of the progress using the IInternetProtocolSink-caller supplied callback interface. The caller calls IInternetProtocol::Read to read chunks of data depending on the status information received from the protocol handler. The start method of the data protocol handler is implemented as:

C#
STDMETHODIMP CDataPluggableProtocol::Start(
    LPCWSTR szUrl,
    IInternetProtocolSink *pIProtSink,
    IInternetBindInfo *pIBindInfo,
    DWORD grfSTI,
    DWORD dwReserved)
{
    HRESULT hr = S_OK;

    if (m_url.Parse(szUrl))
    {
        m_dwPos = 0;

        CAtlString strData = m_url.GetDataString();
            
        pIProtSink->ReportProgress(BINDSTATUS_FINDINGRESOURCE, strData);
        pIProtSink->ReportProgress(BINDSTATUS_CONNECTING, strData);
        pIProtSink->ReportProgress(BINDSTATUS_SENDINGREQUEST, strData);
        pIProtSink->ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, 
                                      m_url.GetMimeType());
        pIProtSink->ReportData(BSCF_FIRSTDATANOTIFICATION, 0, 
                                  m_url.GetDataLength());
        pIProtSink->ReportData(BSCF_LASTDATANOTIFICATION | 
                                  BSCF_DATAFULLYAVAILABLE, 
                                  m_url.GetDataLength(), 
                                  m_url.GetDataLength());
        
    }
    else
    {
        if (grfSTI & PI_PARSE_URL)
            hr = S_FALSE;
    }

    return hr;
}

The function parses the URL which automatically extracts the data. The code then sends a series of notifications to the caller. The important call is ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, m_url.GetMimeType()); which indicates the MIME type of the data to the caller so that the caller can handle the data accordingly. The caller then calls IInternetProtocol::Read to read the data.

Testing the Protocol Handler

The protocol handler is automatically registered when the project is built. Once the handler is registered, data: URLs will start working in Internet Explorer. The protocol handler has been tested with the data: URL Tests at the mozilla.com testing website. The handler passes all the tests, except one. The test fails because of the limitation of Internet Explorer URL length. So far, no security issues have been identified. I welcome readers to indicate any possible security issues with the protocol handler.

History

  • January 28, 2006 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionhow to implement a protocol that used in '<img>'s src attribut</img> Pin
chinese_zmm18-Mar-09 20:48
chinese_zmm18-Mar-09 20:48 
GeneralDownload Manager Pin
bailingout@gmx.de9-Jun-08 6:17
bailingout@gmx.de9-Jun-08 6:17 
QuestionLicense or Public Domain? Pin
User 624325-Jan-08 6:21
User 624325-Jan-08 6:21 
AnswerRe: License or Public Domain? Pin
Rama Krishna Vavilala25-Jan-08 6:24
Rama Krishna Vavilala25-Jan-08 6:24 
It is licensed as per CPOL.

http://www.codeproject.com/info/cpol10.aspx[^]

You have, what I would term, a very formal turn of phrase not seen in these isles since the old King passed from this world to the next. martin_hughes on VDK

GeneralRe: License or Public Domain? Pin
User 624325-Jan-08 6:36
User 624325-Jan-08 6:36 
GeneralRe: License or Public Domain? Pin
User 624325-Jan-08 6:48
User 624325-Jan-08 6:48 
GeneralMemory Leaks [modified] Pin
Impeller19-Aug-07 3:04
Impeller19-Aug-07 3:04 
Questionapp protocol and windows installer Pin
ybouts25-Jan-07 19:57
ybouts25-Jan-07 19:57 
QuestionRe: app protocol and windows installer Pin
John Crenshaw22-Feb-07 8:00
John Crenshaw22-Feb-07 8:00 
AnswerRe: app protocol and windows installer Pin
John Crenshaw22-Feb-07 9:06
John Crenshaw22-Feb-07 9:06 
AnswerRe: app protocol and windows installer Pin
ybouts25-Feb-07 17:13
ybouts25-Feb-07 17:13 
QuestionRe: app protocol and windows installer Pin
marisademeglio19-Dec-09 8:50
marisademeglio19-Dec-09 8:50 
GeneralMemory Leaks Pin
imagiro16-Jan-07 22:37
imagiro16-Jan-07 22:37 
GeneralRe: Memory Leaks Pin
Rama Krishna Vavilala17-Jan-07 3:36
Rama Krishna Vavilala17-Jan-07 3:36 
GeneralRe: Memory Leaks Pin
imagiro24-Jan-07 2:41
imagiro24-Jan-07 2:41 
GeneralRe: Memory Leaks Pin
carabutnicolae123419-Dec-07 20:48
carabutnicolae123419-Dec-07 20:48 
GeneralImage size Pin
leomagic15-Mar-06 2:24
leomagic15-Mar-06 2:24 
GeneralRe: Image size Pin
Rama Krishna Vavilala15-Mar-06 5:31
Rama Krishna Vavilala15-Mar-06 5:31 
Generalwinxp sp2 Pin
araud3-Feb-06 4:28
araud3-Feb-06 4:28 
GeneralThey do Pin
Rama Krishna Vavilala3-Feb-06 5:15
Rama Krishna Vavilala3-Feb-06 5:15 
GeneralGreat Pin
Gilad Novik29-Jan-06 20:45
Gilad Novik29-Jan-06 20:45 
GeneralRe: Great Pin
Rama Krishna Vavilala30-Jan-06 3:26
Rama Krishna Vavilala30-Jan-06 3:26 
GeneralRe: Great Pin
hector santos30-Jan-06 9:31
hector santos30-Jan-06 9:31 
GeneralRe: Great Pin
Rama Krishna Vavilala30-Jan-06 15:32
Rama Krishna Vavilala30-Jan-06 15:32 
GeneralGood Article Pin
hector santos29-Jan-06 0:10
hector santos29-Jan-06 0:10 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.