Click here to Skip to main content
15,881,173 members
Articles / Desktop Programming / MFC
Article

URLEncode

Rate me:
Please Sign up or sign in to vote.
3.50/5 (6 votes)
25 Jun 2001 172.6K   1.4K   21   23
Convert string using URLEncode method

URLEncode

I use this functions to prepare POST strings with XML data. The first function URLEncode1 uses less memory but is slower than the second URLEncode2. Both functions return a CString and get a CString as the input parameter.

The demo project contains sample usage with execution time of presented functions.

Source

Helper function

inline BYTE toHex(const BYTE &x)
{
	return x > 9 ? x + 55: x + 48;
}

URLEncode1

CString URLEncode1(CString sIn)
{
    CString sOut;
	
    int k;
    const int nLen = sIn.GetLength() + 1;

    register LPBYTE pOutTmp = NULL;
    LPBYTE pOutBuf = NULL;
    register LPBYTE pInTmp = NULL;
    LPBYTE pInBuf =(LPBYTE)sIn.GetBuffer(nLen);
    BYTE b = 0;

    //count not alphanumeric characters
    k = 0;
	
    pInTmp = pInBuf;
    while(*pInTmp)
    {
        if (!isalnum(*pInTmp) && !isalnum(*pInTmp))
            k++;
        pInTmp++;
    }

    //alloc out buffer
    pOutBuf = (LPBYTE)sOut.GetBuffer(nLen  + 2 * k); //new BYTE [nLen  + 3 * k];

    if(pOutBuf)
    {
        pInTmp	= pInBuf;
	pOutTmp = pOutBuf;
		
	// do encoding
	while (*pInTmp)
        {
	    if(isalnum(*pInTmp))
                *pOutTmp++ = *pInTmp;
	    else
		if(isspace(*pInTmp))
		    *pOutTmp++ = '+';
		else
		{
		    *pOutTmp++ = '%';
		    *pOutTmp++ = toHex(*pInTmp>>4);
		     *pOutTmp++ = toHex(*pInTmp%16);
		}
	    pInTmp++;
	}
	
	*pOutTmp = '\0';
	//sOut=pOutBuf;
	//delete [] pOutBuf;
	sOut.ReleaseBuffer();
    }
    sIn.ReleaseBuffer();
    return sOut;
}

URLEncode2

CString URLEncode2(CString sIn)
{
    CString sOut;
	
    const int nLen = sIn.GetLength() + 1;

    register LPBYTE pOutTmp = NULL;
    LPBYTE pOutBuf = NULL;
    register LPBYTE pInTmp = NULL;
    LPBYTE pInBuf =(LPBYTE)sIn.GetBuffer(nLen);
    BYTE b = 0;
	
    //alloc out buffer
    pOutBuf = (LPBYTE)sOut.GetBuffer(nLen  * 3 - 2);//new BYTE [nLen  * 3];

    if(pOutBuf)
    {
        pInTmp	= pInBuf;
	pOutTmp = pOutBuf;
		
	// do encoding
	while (*pInTmp)
	{
	    if(isalnum(*pInTmp))
	        *pOutTmp++ = *pInTmp;
	    else
	        if(isspace(*pInTmp))
		    *pOutTmp++ = '+';
		else
		{
		    *pOutTmp++ = '%';
		    *pOutTmp++ = toHex(*pInTmp>>4);
		    *pOutTmp++ = toHex(*pInTmp%16);
		}
	    pInTmp++;
	}
	*pOutTmp = '\0';
	//sOut=pOutBuf;
	//delete [] pOutBuf;
	sOut.ReleaseBuffer();
    }
    sIn.ReleaseBuffer();
    return sOut;
}

Modifications

26.06.2001 - changed out buffer memory allocation (thx 2 Marc Brooks and Matthias)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Poland Poland
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionThis is slightly broke, and to clear up some confussion Pin
KevinSW21-Sep-12 1:35
KevinSW21-Sep-12 1:35 
To work on a project recently I had to figure this stuff out.
From reading countless questions and requests there is apparently some confusion, probably do to the progressive manor in which the web standards were developed and standardized over time.

Then the terms "URL" and "encoding" are pretty ambiguous.
There is URL encoding for an actual address/location then there is encoding for forms data, commonly of the "application/x-www-form-urlencoded" type.
In scope there is related "URL", "URI", and "URN" (search and study these on the Wikipedia).

In either case certain text characters are not allowed; known as "unsafe", or "reserved".
These are "percent encoded". These "unsafe" characters are represented as a hex number by following '%' with a number. I.E. '$' (USA dollar sign) character must be encoded like this "%24".
Reference: http://en.wikipedia.org/wiki/Percent-encoding[^]

Unfortunately with mixed and some times ambiguous standards the ' ' (space) character can, be represented as a single '+' or percent encoded as %20. Apparently either way will work for URL location/addresses but for forms they need be of the '+' variety.

For Windows unfortunately most if not all (all that I tried anyhow) base Windows API functions like the WinINet API InternetCanonicalizeUrl() just do the percent encoding type. And if even then the results and manner are pretty inconsistent.
Probably coming from the issue that in actual URL location/address you don't want to encode the '/' path character as it's part of the address but then in forms data you do.

So in working with URL loc/address you want to encode them one way then with the forms parts you want to encode a slightly different way.

I verified these behaviors (September 2012) between the standard documents and the behavior of the MS C# faculties like "System.Web.HttpUtility.UrlEncode()" for HTML forms (although I won't be using C# my self).

The OP's code is a little broke. The isalnum() covers most of the "unreserved characters" but misses the four punctuation chars (see the Wikipedia page above for reference).
Then some people have attempted to fix this in some code here but then have one or more of the four characters wrong.
Also some of the code here incorrectly replaces the line feed chars ('\r' and '\n') with the space '+' encoding. Maybe it's "filtering" if anything if that was what was desired?
The standard says these should just be percent encoded (%0A and %0D).

Also using the term "Unicode aware" is ambiguous. In discussion one should say "UTF8".
Anything else like attempts at UTF-16 style are not standard and not used (for obvious reason like Endianness).
And as it states here http://w3techs.com/technologies/overview/character_encoding/all[^] statistically ~73% percent of web pages are UTF-8 and growing steadily each year.
The original code already covers this since any other character (in particular those in the 128 to 255 range) are correctly percent encoded.

The rules for encoding form data for "application/x-www-form-urlencoded" encoding (for both ASCII and the close cousin UTF-8) are:
1) If a character is unreserved just copy it.
if(isalnum(c) || (c = '-') || (c = '_') || (c = '.') || (c = '~')) *out = c;

2) If the character is a space then make it a '+' char.
if(c == ' ') *out = '+';

3) For anything else percent encoded it. This covers all the UTF-8 bytes too in the process.

modified 21-Sep-12 23:21pm.

GeneralHere's a much simpler method that supports Unicode Pin
dc_200025-Apr-11 0:59
dc_200025-Apr-11 0:59 
Questionwtf is "if (!isalnum(*pInTmp) && !isalnum(*pInTmp))" ? Pin
s98769026-Jul-07 23:34
s98769026-Jul-07 23:34 
GeneralURLEncode2() Unicode version Pin
mcanti2-May-06 2:28
mcanti2-May-06 2:28 
GeneralUnicode solution Pin
angelo moscati4-Jul-05 3:41
angelo moscati4-Jul-05 3:41 
GeneralRe: Unicode solution Pin
angelo moscati4-Jul-05 5:24
angelo moscati4-Jul-05 5:24 
GeneralNot support UNICODE Pin
chinkuanyeh10-Oct-04 21:28
chinkuanyeh10-Oct-04 21:28 
GeneralCR LF support Pin
little.mole3-Jul-04 10:29
little.mole3-Jul-04 10:29 
Questionmistake? Pin
3m2u16-Mar-04 23:02
3m2u16-Mar-04 23:02 
Questionhow to converts a string that has been encoded for transmission in a URL into a decoded string? Pin
rafaelcn24-Mar-03 6:25
rafaelcn24-Mar-03 6:25 
AnswerRe: how to converts a string that has been encoded for transmission in a URL into a decoded string? Pin
little.mole3-Jul-04 11:23
little.mole3-Jul-04 11:23 
QuestionUnicode? Pin
AlexMarbus6-Nov-02 3:11
AlexMarbus6-Nov-02 3:11 
Generalnot portable Pin
pamela7-Oct-01 2:42
pamela7-Oct-01 2:42 
GeneralRe: not portable Pin
Anonymous17-May-03 9:15
Anonymous17-May-03 9:15 
QuestionHow to add a dialog before Windows's explore working Pin
12-Jul-01 14:48
suss12-Jul-01 14:48 
GeneralWindows already does (most) of this, methinks... Pin
Arnt Witteveen4-Jul-01 0:57
Arnt Witteveen4-Jul-01 0:57 
GeneralRe: Windows already does (most) of this, methinks... Pin
Ryszard Krakowiak9-Jul-01 4:35
Ryszard Krakowiak9-Jul-01 4:35 
GeneralRe: Windows already does (most) of this, methinks... Pin
23-Jul-01 15:29
suss23-Jul-01 15:29 
GeneralRe: Windows already does (most) of this, methinks... Pin
Anonymous29-Aug-02 1:29
Anonymous29-Aug-02 1:29 
GeneralMinor correction in UrlEncode1 Pin
Marc Brooks25-Jun-01 18:14
Marc Brooks25-Jun-01 18:14 
GeneralGetBuffer instead of new Pin
25-Jun-01 5:08
suss25-Jun-01 5:08 
GeneralGetBuffer instead of new Pin
25-Jun-01 5:08
suss25-Jun-01 5:08 
GeneralRe: GetBuffer instead of new Pin
Wictor Wilén1-Aug-01 10:59
Wictor Wilén1-Aug-01 10:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.