Click here to Skip to main content
6,295,667 members and growing! (14,725 online)
Email Password   helpLost your password?
General Programming » Internet / Network » General     Intermediate License: The Code Project Open License (CPOL)

An Asynchronous Pluggable Protocol Handler for data: URLs

By Rama Krishna Vavilala

This article describes an asynchronous pluggable protocol implementation to support the data: protocol, as described in RFC 2397, in Internet Explorer.
VC8.0, Windows, Visual Studio, MFC, ATL, Dev
Posted:28 Jan 2006
Views:40,618
Bookmarked:30 times
Announcements
Loading...
 
Search    
Advanced Search
printPrint   Broken Article?Report       add Share
  Discuss Discuss   Recommend Article Email
17 votes for this article.
Popularity: 5.67 Rating: 4.61 out of 5
1 vote, 5.9%
1

2
2 votes, 11.8%
3
4 votes, 23.5%
4
10 votes, 58.8%
5

Sample Image - DataProtocol.gif

Introduction

I first encountered the data: protocol when I saw the JavaScript Draw site, an AJAX implementation of a scribble application. The problem with the site was that it did not work with Internet Explorer. Going through the source code, I found that among the various reasons it did not work was that it used data: URLs as the source for dynamically created images, which is not supported for Internet Explorer.

The data: protocol is described in RFC 2397. Currently, the only browsers that support the data: protocol are Opera and Mozilla Firefox. This article describes an asynchronous pluggable protocol implementation to support the data: protocol in Internet Explorer. One possible use of the data: protocol is to embed small images in the HTML itself to avoid server hits. It can also be useful in AJAX applications like JavaScript Draw to return images, encoded in base64, as the response text.

The data: URL format

The protocol itself, as described in the RFC, is quite simple. The format is:

    dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
    mediatype  := [ type "/" subtype ] *( ";" parameter )
    data       := *urlchar
    parameter  := attribute "=" value

The media type indicates the type of the data and its encoding. The default media type is text\plain;charset=US-ASCII. For an image, the media type can be image/gif, image/png etc. The optional base64 part of the URL indicates that the actual data represented in the URL is encoded in base64 format. Although the primary use of base64 encoding in URLs is for binary data, it can also be used to encode text. Finally, the data portion of the URL is the actual encoded data represented by the URL.

The next sections describe how the protocol was implemented.

Parsing the URL

ATL's regular expression classes come in pretty handy to parse the data: URLs. The following is a regular expression to parse the URL:

data:{(.*?/.*?)}?(;{.*?}={.*?})?{;(base64)?}?,{.*}

The regular expression captures the various portions of the URL into five different groups:

Group Capture
0

The type/subtype portion of the media type or the MIME type

1

The attribute name of any additional parameter specified with the MIME type

2

The value of the attribute captured in group 1
3 The base64 string
4 The actual data string

After capturing the different portions of the URL, the Base64 encoded data is converted into bytes. The ATL function Base64Decode is used for this.

    int nReqLen = Base64DecodeGetRequiredLength(strData.GetLength());
    m_pvData = new BYTE[nReqLen]; 
    int nDestLen = nReqLen; 
    bRet = Base64Decode(strData, strData.GetLength(), m_pvData, &nDestLen) != 0; 
    m_dwDataLength = nDestLen;

Converting the Text Data to Unicode

If the data format is text, the text is converted into Unicode so that Internet Explorer can handle it correctly. The encoding of the source data comes from the charset attribute specified in the parameter portion of the media type. An example of such a URL is data:text/plain;charset=iso-8859-8-i,%f9%ec%e5%ed - which is some Hebrew text encoded in ISO-8859-8-i.

To convert the multi byte to Unicode, we have to use the famous MultiByteToWideChar function. The MultiByteToWideChar function requires a DWORD codepage identifier. A little bit of research revealed that the IMultiLanguage interface in the MLang API can be used to obtain the codepage identifier from the named charsets.

CComPtr<IMultiLanguage2> spMLang;
if (SUCCEEDED(hr = spMLang.CoCreateInstance(CLSID_CMultiLanguage)))
{ MIMECSETINFO mi;
 if (SUCCEEDED(hr = spMLang->GetCharsetInfo(CComBSTR(GetCharset()), &mi)))
 ...
}

The MIMECSETINFO structure is declared as:

typedef struct tagMIMECSETINFO { 
 UINT uiCodePage; 
 UINT uiInternetEncoding; 
 WCHAR wszCharset[MAX_MIMECSET_NAME]; 
} MIMECSETINFO, *PMIMECSETINFO;

From the first glance, it seems that the uiCodePage member will give the required code page identifier; however, in my experience, under certain circumstances, the uiCodePage member is the required codepage, and in some other circumstances, the uiInternetEncoding is the required value. Unfortunately, I could not locate any document describing when to use what. As a result, the code to convert charsets becomes a little ugly.

int nSrcLen = strData.GetLength();
UINT uCodePage = mi.uiInternetEncoding;
int nWideChar = MultiByteToWideChar(uCodePage, 0, 
               (LPCSTR)strData, nSrcLen, NULL, 0);
if (nWideChar == 0)
{ 
    uCodePage = mi.uiCodePage;
    nWideChar = MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, NULL, 0);
}
if (nWideChar != 0)
{
    WCHAR* sz = new WCHAR[nWideChar + 1];
    MultiByteToWideChar(uCodePage, 0, 
                (LPCSTR)strData, nSrcLen, sz + 1, nWideChar);
    m_pvData = (BYTE*)sz;
    m_dwDataLength = (nWideChar + 1) * 2; 
    //If data is in Unicode it should have unicode lead bytes

    m_pvData[0] = 0xFF;
    m_pvData[1] = 0xFE;
}

Once the characters are converted to a Unicode stream of bytes, the byte stream needs to be prefixed with the Unicode lead bytes to indicate to Internet Explorer. The lead bytes are 0xFFFE.

The URL parsing gave us the data and the MIME type of the data. The actual implementation of the pluggable protocol is pretty simple.

Implementing the Asynchronous Pluggable Protocol Handler

An asynchronous pluggable protocol handler is a COM object that implements the IInternetProtocol and the IInternetProtocolInfo interfaces. For Internet Explorer to use the URL scheme handled by the protocol, the registration entries need to be added at HKEY_CLASSES_ROOT\PROTOCOLS\Handler. The following is an extract from the .rgs file for the protocol COM object.

HKCR
{ ...
    NoRemove PROTOCOLS
    {
        NoRemove Handler
        {
            ForceRemove data = s 'data: pluggable protocol'
            {
                val CLSID = s '{C79BF22F-25C4-4D3D-8183-14149EAB9C0C}'
            }
        }
    }
}

The only interesting methods in the implementation of the pluggable protocol handler are IInternetProtocol::Start and IInternetProtocol::Read. The IInternetProtocol::Start is called by Internet Explorer (actually, urlmon.dll) indicating the handler that data needs to be downloaded from a given URL. The pluggable protocol handler parses the URL and downloads the data. It notifies the caller of the progress using the IInternetProtocolSink-caller supplied callback interface. The caller calls IInternetProtocol::Read to read chunks of data depending on the status information received from the protocol handler. The start method of the data protocol handler is implemented as:

STDMETHODIMP CDataPluggableProtocol::Start(
    LPCWSTR szUrl,
    IInternetProtocolSink *pIProtSink,
    IInternetBindInfo *pIBindInfo,
    DWORD grfSTI,
    DWORD dwReserved)
{
    HRESULT hr = S_OK;

    if (m_url.Parse(szUrl))
    {
        m_dwPos = 0;

        CAtlString strData = m_url.GetDataString();
            
        pIProtSink->ReportProgress(BINDSTATUS_FINDINGRESOURCE, strData);
        pIProtSink->ReportProgress(BINDSTATUS_CONNECTING, strData);
        pIProtSink->ReportProgress(BINDSTATUS_SENDINGREQUEST, strData);
        pIProtSink->ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, 
                                      m_url.GetMimeType());
        pIProtSink->ReportData(BSCF_FIRSTDATANOTIFICATION, 0, 
                                  m_url.GetDataLength());
        pIProtSink->ReportData(BSCF_LASTDATANOTIFICATION | 
                                  BSCF_DATAFULLYAVAILABLE, 
                                  m_url.GetDataLength(), 
                                  m_url.GetDataLength());
        
    }
    else
    {
        if (grfSTI & PI_PARSE_URL)
            hr = S_FALSE;
    }

    return hr;
}

The function parses the URL which automatically extracts the data. The code then sends a series of notifications to the caller. The important call is ReportProgress(BINDSTATUS_VERIFIEDMIMETYPEAVAILABLE, m_url.GetMimeType()); which indicates the MIME type of the data to the caller so that the caller can handle the data accordingly. The caller then calls IInternetProtocol::Read to read the data.

Testing the Protocol Handler

The protocol handler is automatically registered when the project is built. Once the handler is registered, data: URLs will start working in Internet Explorer. The protocol handler has been tested with the data: URL Tests at the mozilla.com testing website. The handler passes all the tests, except one. The test fails because of the limitation of Internet Explorer URL length. So far, no security issues have been identified. I welcome readers to indicate any possible security issues with the protocol handler.

History

  • January 28, 2006 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Rama Krishna Vavilala


Member

Occupation: Architect
Location: United States United States

Other popular Internet / Network articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 25 of 25 (Total in Forum: 25) (Refresh)FirstPrevNext
Generalhow to implement a protocol that used in ''s src attribut Pinmemberchinese_zmm21:48 18 Mar '09  
GeneralDownload Manager Pinmemberbailingout@gmx.de7:17 9 Jun '08  
GeneralLicense or Public Domain? PinmemberR. Douglas Barbieri7:21 25 Jan '08  
GeneralRe: License or Public Domain? PinmemberRama Krishna Vavilala7:24 25 Jan '08  
GeneralRe: License or Public Domain? PinmemberR. Douglas Barbieri7:36 25 Jan '08  
GeneralRe: License or Public Domain? PinmemberR. Douglas Barbieri7:48 25 Jan '08  
GeneralMemory Leaks [modified] PinmemberImpeller4:04 19 Aug '07  
Questionapp protocol and windows installer Pinmemberybouts20:57 25 Jan '07  
QuestionRe: app protocol and windows installer PinmemberJohn Crenshaw9:00 22 Feb '07  
AnswerRe: app protocol and windows installer PinmemberJohn Crenshaw10:06 22 Feb '07  
AnswerRe: app protocol and windows installer Pinmemberybouts18:13 25 Feb '07  
GeneralMemory Leaks Pinmemberimagiro23:37 16 Jan '07  
GeneralRe: Memory Leaks PinmvpRama Krishna Vavilala4:36 17 Jan '07  
GeneralRe: Memory Leaks Pinmemberimagiro3:41 24 Jan '07  
GeneralRe: Memory Leaks Pinmembercarabutnicolae123421:48 19 Dec '07  
GeneralImage size Pinmemberleomagic3:24 15 Mar '06  
GeneralRe: Image size PinmemberRama Krishna Vavilala6:31 15 Mar '06  
Generalwinxp sp2 Pinmemberaraud5:28 3 Feb '06  
GeneralThey do PinmemberRama Krishna Vavilala6:15 3 Feb '06  
GeneralGreat PinmemberGilad Novik21:45 29 Jan '06  
GeneralRe: Great PinmemberRama Krishna Vavilala4:26 30 Jan '06  
GeneralRe: Great Pinmemberhector santos10:31 30 Jan '06  
GeneralRe: Great PinmemberRama Krishna Vavilala16:32 30 Jan '06  
GeneralGood Article Pinmemberhector santos1:10 29 Jan '06  
GeneralRe: Good Article PinmemberRama Krishna Vavilala12:09 29 Jan '06  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 28 Jan 2006
Editor: Smitha Vijayan
Copyright 2006 by Rama Krishna Vavilala
Everything else Copyright © CodeProject, 1999-2009
Web18 | Advertise on the Code Project