Click here to Skip to main content
15,895,746 members
Articles / Desktop Programming / MFC

String Manipulations in Unicode

Rate me:
Please Sign up or sign in to vote.
4.14/5 (29 votes)
11 May 2011GPL33 min read 92.7K   34  
Tips and tricks on String manipulations in Unicode (Windows Programming context)
This is an old version of the currently published article.

Introduction

Whether you write a Win32 application or an MFC application, you may want to add Unicode support. Unicode gives you the freedom to include texts from any language in your application. So how do you manipulate these strings in Unicode? I will show you how.

Unicode Tips and Tricks

In project properties, select "Use Unicode Character Set" as character set. After doing that, using TCHAR library will enable us to Unicode. Let's talk about the code.

To use a Unicode string as a function parameter, wrap it with macro _T or TEXT macro. For example:

C++
AfxMessageBox(_T("You clicked it!"));

_T() and TEXT() are macros from tchar header file. TCHAR library automatically maps functions to Unicode when Unicode is defined. Using TCHAR library helps us to move code to multibyte stream (or unicode) whenever required. Try to avoid primitive data type char array or char *. This is because before using them in your controls, you have to convert them. Repetitive conversion may be tedious.

Use TCHAR instead of char and use TCHAR* instead of char*. TCHAR* can be written LPTSTR. For const TCHAR*, you may write LPCTSTR which is required when a string is passed as an argument to a function where modification should be restricted.

To calculate length of strings, use _tcslen function:

C++
len = _tcslen(str);

To compare strings instead of strcmp and strncmp, use _tcscmp and _tcsncmp. Here’s an example:

C++
if (!_tcsncmp(line, _T("desiredtext"), 11))
	AfxMessageBox(_T("Got desired text."));

For copying string _tcscpy_s and _tcsncpy_s instead of strcpy and strncpy:

C++
_tcsncpy_s(dest, srcstr, 20);

Note, strpy or _tcscpy_s, don’t put a null after copying the string so remember to set null after copying the string when required.

For string concatenation, use _tcscat_s and _tcsncat_s instead of strcat and strncat. Here, the 2nd parameter is the size of the destination string.

C++
_tcsncat_s(timestamp, 20, str, i);

For splitting tokens, use _tcstok_s instead of strtok.

C++
LPTSTR  next_token;
token = _tcstok_s(str, delim, &next_token);

To convert a string to integer, you can use _ttoi() function.

C++
CString str = _T("10");
CString temp;
temp.Format(_T(" length: %d"), _ttoi(str));
AfxMessageBox(str+temp);

Previously, I wasn't aware of this function. I wrote the following function to do the task. Understanding this function would enable you to understand fundamental concepts to a deeper extent.

C++
int GetEquivValue(TCHAR ch, int base) {
	int diff = _T('a') - _T('A');

	// Invalid base
	if (base >= 16)
		return 0;

	if (base == 16) {
		// make upper case if lowercase
		if (ch >= _T('a') && ch <= _T('z'))
			ch -= diff;

		diff = _T('A') - _T('0');

		if (ch >= _T('A') && ch <= _T('F'))
			return (ch-diff);
	}

	if (ch >= _T('0') && ch <= _T('9'))
		return (ch - _T('0'));
	return 0;
}

int SAatoib(LPTSTR  str, int base) {
	int res = 0;
	int i, len, tmp;

	len = _tcslen(str);
	for (i=0; i<len; i++) {
		tmp = GetEquivValue(str[i], base);
		res = res*base + tmp;
	}
	return res;
}

To convert a string to decimal number, you may write:

C++
int decVal = SAatoib((LPTSTR )NumStr, 10);

To convert a string to hexadecimal number, use:

C++
int decVal = SAatoib((LPTSTR )NumStr, 16);

To convert a string to octal number, use:

C++
int decVal = SAatoib((LPTSTR )NumStr, 8);

To format the string like printf, you can always use CString::Format function. For example, to get IP address:

C++
unsigned long int ipsegval[4] = {1, 2, 3, 4};
CString ip;
ip.Format(_T("%u.%u.%u.%u"), ipsegval[0], ipsegval[1], ipsegval[2], ipsegval[3]);

Now for example, you have a CString. But you need it as const char* or char* The cast to LPCTSTR from CString is a valid one, but cast to const char* isn’t valid. But there is a way. Here’s an example:

C++
CString ipaddrstr(_T("1.2.3.4"));
CStringA ipaddrstrA(ipaddrstr);
ipaddr = inet_addr(ipaddrstrA);

Note: inet_addr function requires char *.

I hope this post helps beginners.

History

  • 11th May, 2011: Version 3
  • 9th May, 2011: Version 2
  • 6th May, 2011: Initial version

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
Software Developer Oracle
United States United States
Interested in mathematics and english literature.

Comments and Discussions

Discussions on this specific version of this article. Add your comments on how to improve this article here. These comments will not be visible on the final published version of this article.