Click here to Skip to main content
15,886,199 members
Articles / Programming Languages / C++

When Two Strings are Equal on Windows

Rate me:
Please Sign up or sign in to vote.
4.33/5 (3 votes)
14 Nov 2010CPOL3 min read 13.5K   6   2
When two strings are equal on Windows

It’s seems to be a trivial and obvious task, but in fact it’s not and I nearly have slipped over it. I wanted to dynamically load some shader resources and my app crushed at runtime. I didn’t know why so performed debugging and got the enemy. The enemy is hidden in two functions: philosophy of Windows string comparisons and MultiByteToWideChar function.

Code Snippet 1

Not working C++ code that throws an exception:

C++
wchar_t* ConvertToWchar_t(const char * txt)
{
size_t argLength = strlen(txt);
wchar_t * result = new wchar_t[argLength+1];
ZeroMemory(result,argLength+1);
int convertedCharacters = MultiByteToWideChar
	(CP_UTF8,MB_COMPOSITE,(LPCSTR)result,(int)argLength,(LPWSTR)txt,0);
if(!convertedCharacters)
{
DWORD lastErr = GetLastError();
switch(lastErr)
{
case ERROR_INSUFFICIENT_BUFFER:
throw exception("insufficient buffer");
case ERROR_NO_UNICODE_TRANSLATION:
throw exception("invalid unicode char");
case ERROR_INVALID_FLAGS:
throw exception("no unicode translation");
case ERROR_INVALID_PARAMETER:
throw exception("error invalid parameter");
}
}
return result;
}

As you know, Windows is working on Unicode and everything should be working on it. This imposes programmers to have some knowledge about it, for example, how to convert from char * to wchar_t* and when two strings are equals. Don’t forget that you can write Unicode text in some different ways. Bear it in mind that if you want to compare two strings, they have to be normalized before comparison!! A normalization is a process of proper characters byte interpretation so that strings could be interchangeably interpreted.

In many cases, Unicode allows multiple representations of what is, linguistically, the same string. For example:

  • Capital A with dieresis (umlaut) can be represented either as a single Unicode code point “Ä” (U+00C4) or the combination of Capital A and the combining Dieresis character (“A” + “¨”, that is, U+0041 U+0308). Similar considerations apply for many other characters with diacritic marks.
  • Capital A itself can be represented either in the usual manner (Latin Capital Letter A, U+0041) or by Fullwidth Latin Capital Letter A (U+FF21). Similar considerations apply for the other simple Latin letters (both uppercase and lowercase) and for the katakana characters used in writing Japanese.
  • The stringfi” can be represented either by the characters “f” and “i” (U+0066 U+0069) or by the ligature “?” (U+FB01). Similar considerations apply for many other combinations of characters for which Unicode defines ligatures.

What’s their binary code depends on the norm. See the full description of available norms here. To check whether your wchar_t* is normalized, use IsNormalizedString Function and to make your string normalized, use NormalizeString Function. But what if you’re programming in .NET and not in unmanaged Windows API? You have to care for that also, but in another way. Look at the C# code below to see the small program I wrote for myself to check it.

Code Snippet 2

C++
class Program
{
static void Main(string[] args)
{
string s = "A\u0308\uFB03n";
string s2 = "Äffin";
String sm1 = new String( s.ToCharArray() );
String sm2 = new String( s2.ToCharArray() );if (String.Compare( s, s2 ) == 0)
{
Console.WriteLine( "equal" );
}if (s == s2)
{
Console.WriteLine( "equal" );
}
else
{
Console.WriteLine( "not equal" );
}if (sm1.Equals( sm2 ))
{
Console.WriteLine( "equals" );
}
else
{
Console.WriteLine( " not equal" );
}if (String.Compare( sm1, sm2 )==0)
{
Console.WriteLine( "equal" );
}
else
{
Console.WriteLine( "not equal" );
}
}
}

It appears that not every string comparison from the above returns true as it should. Only String.Compare guaranteed correct result but this code isn’t perfect. In order to satisfy security rules, always perform string comparison with proper CultureInfo then it would be correct. C# programming isn’t as easy as it seems to be if you want to do your job right and not only to get it working in a special runtime environment.

In Win32 unmanaged environment, you had better use functions like lstrcmpi to compare two strings. Microsoft provided an article on how to perform safe string comparisons and internationalizations features. Windows has a magic win32 API MulityByteToWideChar function with non working parameters: MB_PRECOMPOSED, MB_COMPOSITE. See the post here to get more knowledge about it. If you want to convert the char* to wchar_t*, please use mbstowcs_s function. There is an MSDN article that shows some basic string format conversions here.

Code Snippet 3

The well coded safe char* to wchar_t* conversion function:

C++
wchar_t* ConvertToWchar_t(const char * txt)
{
 size_t argLength = strlen(txt);
 wchar_t * result = new wchar_t[argLength+1];
 ZeroMemory(result,argLength+1);
 size_t convertedChars;
 mbstowcs_s(&convertedChars,result,argLength+1,txt,_TRUNCATE);
 if(!convertedChars)
 {
 DWORD lastErr = GetLastError();
 switch(lastErr)
 {
 case ERROR_INSUFFICIENT_BUFFER:
 throw exception("insufficient buffer");
 case ERROR_NO_UNICODE_TRANSLATION:
 throw exception("invalid unicode char");
 case ERROR_INVALID_FLAGS:
 throw exception("no unicode translation");
 case ERROR_INVALID_PARAMETER:
 throw exception("error invalid parameter");
 }
 }
 return result;
}

If you want to know more, take a look at this article. There is also a sample on how to correctly perform normalization process on MSDN.

Filed under: C#, C/C++, CodeProject
Tagged: C++

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Student
Poland Poland
I'm student of CAD/CAM design at Warsaw University of Technology, currently at the last 5th course year and successfully graduated from BSc studies in computer science in February 2010 from the same university. I'm interested in windows programming technolgies, geometric modelling, programming of numerically controlled devices, virtual reality and computer graphics. You can follow me on facebook : http://www.facebook.com/profile.php?id=1249817870 or linkedIn: http://pl.linkedin.com/in/pytelg.

Comments and Discussions

 
GeneralIt’s seems to be a trivial and obvious task, but in fact ... it is!!! [modified] Pin
Nick Gorlov17-Nov-10 20:40
Nick Gorlov17-Nov-10 20:40 
AnswerRe: It’s seems to be a trivial and obvious task, but in fact ... it is!!! Pin
pytelg20-Nov-10 8:38
pytelg20-Nov-10 8:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.