Click here to Skip to main content
Click here to Skip to main content

URLEncode

, 25 Jun 2001
Rate this:
Please Sign up or sign in to vote.
Convert string using URLEncode method
<!-- Download Links --> <!-- Add the rest of your HTML here -->

URLEncode

I use this functions to prepare POST strings with XML data. The first function URLEncode1 uses less memory but is slower than the second URLEncode2. Both functions return a CString and get a CString as the input parameter.

The demo project contains sample usage with execution time of presented functions.

Source

Helper function

inline BYTE toHex(const BYTE &x)
{
	return x > 9 ? x + 55: x + 48;
}

URLEncode1

CString URLEncode1(CString sIn)
{
    CString sOut;
	
    int k;
    const int nLen = sIn.GetLength() + 1;

    register LPBYTE pOutTmp = NULL;
    LPBYTE pOutBuf = NULL;
    register LPBYTE pInTmp = NULL;
    LPBYTE pInBuf =(LPBYTE)sIn.GetBuffer(nLen);
    BYTE b = 0;

    //count not alphanumeric characters
    k = 0;
	
    pInTmp = pInBuf;
    while(*pInTmp)
    {
        if (!isalnum(*pInTmp) && !isalnum(*pInTmp))
            k++;
        pInTmp++;
    }

    //alloc out buffer
    pOutBuf = (LPBYTE)sOut.GetBuffer(nLen  + 2 * k); //new BYTE [nLen  + 3 * k];

    if(pOutBuf)
    {
        pInTmp	= pInBuf;
	pOutTmp = pOutBuf;
		
	// do encoding
	while (*pInTmp)
        {
	    if(isalnum(*pInTmp))
                *pOutTmp++ = *pInTmp;
	    else
		if(isspace(*pInTmp))
		    *pOutTmp++ = '+';
		else
		{
		    *pOutTmp++ = '%';
		    *pOutTmp++ = toHex(*pInTmp>>4);
		     *pOutTmp++ = toHex(*pInTmp%16);
		}
	    pInTmp++;
	}
	
	*pOutTmp = '\0';
	//sOut=pOutBuf;
	//delete [] pOutBuf;
	sOut.ReleaseBuffer();
    }
    sIn.ReleaseBuffer();
    return sOut;
}

URLEncode2

CString URLEncode2(CString sIn)
{
    CString sOut;
	
    const int nLen = sIn.GetLength() + 1;

    register LPBYTE pOutTmp = NULL;
    LPBYTE pOutBuf = NULL;
    register LPBYTE pInTmp = NULL;
    LPBYTE pInBuf =(LPBYTE)sIn.GetBuffer(nLen);
    BYTE b = 0;
	
    //alloc out buffer
    pOutBuf = (LPBYTE)sOut.GetBuffer(nLen  * 3 - 2);//new BYTE [nLen  * 3];

    if(pOutBuf)
    {
        pInTmp	= pInBuf;
	pOutTmp = pOutBuf;
		
	// do encoding
	while (*pInTmp)
	{
	    if(isalnum(*pInTmp))
	        *pOutTmp++ = *pInTmp;
	    else
	        if(isspace(*pInTmp))
		    *pOutTmp++ = '+';
		else
		{
		    *pOutTmp++ = '%';
		    *pOutTmp++ = toHex(*pInTmp>>4);
		    *pOutTmp++ = toHex(*pInTmp%16);
		}
	    pInTmp++;
	}
	*pOutTmp = '\0';
	//sOut=pOutBuf;
	//delete [] pOutBuf;
	sOut.ReleaseBuffer();
    }
    sIn.ReleaseBuffer();
    return sOut;
}

Modifications

26.06.2001 - changed out buffer memory allocation (thx 2 Marc Brooks and Matthias)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Ryszard Krakowiak
Web Developer
Poland Poland
No Biography provided

Comments and Discussions

 
QuestionThis is slightly broke, and to clear up some confussion [modified] PinmemberKevinSW21-Sep-12 1:35 
To work on a project recently I had to figure this stuff out.
From reading countless questions and requests there is apparently some confusion, probably do to the progressive manor in which the web standards were developed and standardized over time.
 
Then the terms "URL" and "encoding" are pretty ambiguous.
There is URL encoding for an actual address/location then there is encoding for forms data, commonly of the "application/x-www-form-urlencoded" type.
In scope there is related "URL", "URI", and "URN" (search and study these on the Wikipedia).
 
In either case certain text characters are not allowed; known as "unsafe", or "reserved".
These are "percent encoded". These "unsafe" characters are represented as a hex number by following '%' with a number. I.E. '$' (USA dollar sign) character must be encoded like this "%24".
Reference: http://en.wikipedia.org/wiki/Percent-encoding[^]
 
Unfortunately with mixed and some times ambiguous standards the ' ' (space) character can, be represented as a single '+' or percent encoded as %20. Apparently either way will work for URL location/addresses but for forms they need be of the '+' variety.
 
For Windows unfortunately most if not all (all that I tried anyhow) base Windows API functions like the WinINet API InternetCanonicalizeUrl() just do the percent encoding type. And if even then the results and manner are pretty inconsistent.
Probably coming from the issue that in actual URL location/address you don't want to encode the '/' path character as it's part of the address but then in forms data you do.
 
So in working with URL loc/address you want to encode them one way then with the forms parts you want to encode a slightly different way.
 
I verified these behaviors (September 2012) between the standard documents and the behavior of the MS C# faculties like "System.Web.HttpUtility.UrlEncode()" for HTML forms (although I won't be using C# my self).
 
The OP's code is a little broke. The isalnum() covers most of the "unreserved characters" but misses the four punctuation chars (see the Wikipedia page above for reference).
Then some people have attempted to fix this in some code here but then have one or more of the four characters wrong.
Also some of the code here incorrectly replaces the line feed chars ('\r' and '\n') with the space '+' encoding. Maybe it's "filtering" if anything if that was what was desired?
The standard says these should just be percent encoded (%0A and %0D).
 
Also using the term "Unicode aware" is ambiguous. In discussion one should say "UTF8".
Anything else like attempts at UTF-16 style are not standard and not used (for obvious reason like Endianness).
And as it states here http://w3techs.com/technologies/overview/character_encoding/all[^] statistically ~73% percent of web pages are UTF-8 and growing steadily each year.
The original code already covers this since any other character (in particular those in the 128 to 255 range) are correctly percent encoded.
 
The rules for encoding form data for "application/x-www-form-urlencoded" encoding (for both ASCII and the close cousin UTF-8) are:
1) If a character is unreserved just copy it.
if(isalnum(c) || (c = '-') || (c = '_') || (c = '.') || (c = '~')) *out = c;
2) If the character is a space then make it a '+' char.
if(c == ' ') *out = '+';
3) For anything else percent encoded it. This covers all the UTF-8 bytes too in the process.

modified 21-Sep-12 23:21pm.

GeneralHere's a much simpler method that supports Unicode Pinmemberdc_200025-Apr-11 0:59 
Questionwtf is "if (!isalnum(*pInTmp) && !isalnum(*pInTmp))" ? Pinmembers98769026-Jul-07 23:34 
GeneralURLEncode2() Unicode version Pinmembermcanti2-May-06 2:28 
GeneralUnicode solution Pinmemberangelo moscati4-Jul-05 3:41 
GeneralRe: Unicode solution Pinmemberangelo moscati4-Jul-05 5:24 
GeneralNot support UNICODE Pinmemberchinkuanyeh10-Oct-04 21:28 
GeneralCR LF support Pinmemberlittle.mole3-Jul-04 10:29 
Questionmistake? Pinmember3m2u16-Mar-04 23:02 
Questionhow to converts a string that has been encoded for transmission in a URL into a decoded string? Pinmemberrafaelcn24-Mar-03 6:25 
AnswerRe: how to converts a string that has been encoded for transmission in a URL into a decoded string? Pinmemberlittle.mole3-Jul-04 11:23 
QuestionUnicode? PinmemberAlexMarbus6-Nov-02 3:11 
Generalnot portable Pinmemberpamela7-Oct-01 2:42 
GeneralRe: not portable PinsussAnonymous17-May-03 9:15 
QuestionHow to add a dialog before Windows's explore working PinmemberNewLearnXZX12-Jul-01 14:48 
GeneralWindows already does (most) of this, methinks... PinmemberArnt Witteveen4-Jul-01 0:57 
GeneralRe: Windows already does (most) of this, methinks... PinmemberRyszard Krakowiak9-Jul-01 4:35 
GeneralRe: Windows already does (most) of this, methinks... Pinmemberwangjj23-Jul-01 15:29 
GeneralRe: Windows already does (most) of this, methinks... PinsussAnonymous29-Aug-02 1:29 
GeneralMinor correction in UrlEncode1 PinmemberMarc Brooks25-Jun-01 18:14 
GeneralGetBuffer instead of new PinmemberAnonymous25-Jun-01 5:08 
GeneralGetBuffer instead of new PinmemberAnonymous25-Jun-01 5:08 
GeneralRe: GetBuffer instead of new PinmemberWictor Wilén1-Aug-01 10:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.141022.2 | Last Updated 26 Jun 2001
Article Copyright 2001 by Ryszard Krakowiak
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid