Introduction
In order to allow your programs to be used in international
markets it is worth making your application Unicode or MBCS
aware. The Unicode character set is a "wide character"
(2 bytes per character) set that contains every character
available in every language, including all technical symbols and
special publishing characters. Multibyte character set (MBCS)
uses either 1 or 2 bytes per character and is used for character
sets that contain large numbers of different characters (eg Asian
language character sets).
Which character set you use depends on the language and the
operating system. Unicode requires more space than MBCS since
each character is 2 bytes. It is also faster than MBCS and is
used by Windows NT as standard, so non-Unicode strings passed to
and from the operating system must be translated, incurring
overhead. However, Unicode is not supported on Win95 and so MBCS
may be a better choice in this situation. Note that if you wish
to develop applications in the Windows CE environment then all
applications must be compiled in Unicode.
Using MBCS or Unicode
The best way to use Unicode or MBCS - or indeed even ASCII -
in your programs is to use the generic text mapping macros
provided by Visual C++. That way you can simply use a single
define to swap between Unicode, MBCS and ASCII without having to
do any recoding.
To use MBCS or Unicode you need only define either _MBCS
or _UNICODE in your project. For Unicode you
will also need to specify the entry point symbol in your Project
settings as wWinMainCRTStartup. Please note that
if both _MBCS and _UNICODE are
defined then the result will be unpredictable.

Generic Text mappings and portable functions
The generic text mappings replace the standard char or LPSTR
types with generic TCHAR or LPTSTR macros. These macros will map
to different types and functions depending on whether you have
compiled with Unicode or MBCS (or neither) defined. The simplest
way to use the TCHAR type is to use the CString
class - it is extremely flexible and does most of the work for
you.
In conjunction with the generic character type, there is a set
of generic string manipulation functions prefixed by _tcs.
For instance, instead of using the strrev
function in your code, you should use the _tcsrev
function which will map to the correct function depending on
which character set you have compiled for. The table below
demonstrates:
| #define |
Compiled
Version |
Example |
| _UNICODE |
Unicode (wide-character) |
_tcsrev maps to _wcsrev |
| _MBCS |
Multibyte-character |
_tcsrev maps to _mbsrev |
| None (the default: neither _UNICODE
nor _MBCS defined) |
SBCS (ASCII) |
_tcsrev maps to strrev |
Each str* function has a corresponding tcs*
function that should be used instead. See the TCHAR.H file for
all the mapping and macros that are available. Just look up the
online help for the string function in question in order to find
the equivalent portable function.
Note: Do not use the str*
family of functions with Unicode strings, since Unicode strings
are likely to contain embedded null bytes.
The next important point is that each literal string should be
enclosed by the TEXT() (or _T())
macro. This macro prepends a "L" in front of literal
strings if the project is being compiled in Unicode, or does
nothing if MBCS or ASCII is being used. For instance, the string
_T("Hello") will be interpreted as "Hello" in
MBCS or ASCII, and L"Hello" in Unicode. If you are
working in Unicode and do not use the _T()
macro, you may get compiler warnings.
Note that you can use ASCII and Unicode within the same
program, but not within the same string.
All MFC functions except for database class member functions
are Unicode aware. This is because many database drivers themselves
do not handle Unicode, and so there was no point in writing Unicode
aware MFC classes to wrap these drivers.
Converting between Generic types and ASCII
ATL provides a bunch of very useful macros for
converting between different character format. The basic form of
these macros is X2Y(), where X is the source
format. Possible conversion formats are shown in the following
table.
| String Type |
Abbreviation |
| ASCII (LPSTR) |
A |
| WIDE (LPWSTR) |
W |
| OLE (LPOLESTR) |
OLE |
| Generic (LPTSTR) |
T |
| Const |
C |
Thus, A2W converts an LPSTR to an LPWSTR,
OLE2T converts an LPOLESTR to an LPTSTR, and
so on.
There are also const forms (denoted by a C)
that convert to a const string. For instance, A2CT
converts from LPSTR to LPCTSTR.
When using the string conversion macros you need to include
the USES_CONVERSION macro at the beginning of
your function:
void foo(LPSTR lpsz)
{
USES_CONVERSION;
...
LPTSTR szGeneric = A2T(lpsz)
...
}
Two caveats on using the conversion macros:
- Never use the conversion macros inside a tight loop. This
will cause a lot of memory to be allocated each time the
conversion is performed, and will result in slow code.
Better to perform the conversion outside the loop and
pass the converted value into the loop.
- Never return the result of the macros directly from a
function, unless the return value implies making a copy
of the data before returning. For instance, if you have a
function that returns an LPOLESTR, then do not do the
following:
LPTSTR BadReturn(LPSTR lpsz)
{
USES_CONVERSION;
return A2T(lpsz);
}
Instead, you should return the value as a CString,
which would imply a copy of the string would be made
before the function returns:
CString GoodReturn(LPSTR lpsz)
{
USES_CONVERSION;
return A2T(lpsz);
}
Tips and Traps
The TRACE statement
The TRACE macros have a few cousins - namely
the TRACE0, TRACE1, TRACE2
and TRACE3 macros. These macros allow you to
specify a format string (as in the normal TRACE
macro), and either 0,1,2 or 3 parameters, without the need to
enclose your literal format string in the _T()
macro. For instance,
TRACE(_T("This is trace statement number %d\n"), 1);
can be written
TRACE1("This is trace statement number %d\n", 1);
Viewing Unicode strings in the debugger
If you are using Unicode in your applciation and wish to view Unicode strings
in the debugger, then you will need to go to Tools | Options | Debug and click
on "Display Unicode Strings".
The Length of strings
Be careful when performing operations that depend on the size
or length of a string. For instance, CString::GetLength
returns the number of characters in a string, NOT the size in
bytes. If you were to write the string to a CArchive
object, then you would need to multiply the length of the string
by the size of each character in the string to get the number of
bytes to write:
CString str = _T("Hello, World");
archive.Write( str, str.GetLength( ) * sizeof( TCHAR ) );
Reading and Writing ASCII text files
If you are using Unicode or MBCS then you need to be careful
when writing ASCII files. The safest and easiest way to write
text files is to use the CStdioFile class
provided with MFC. Just use the CString class
and the ReadString and WriteString member
functions and nothing should go wrong. However, if you need to
use the CFile class and it's associated Read
and Write functions, then if you use the following code:
CFile file(...);
CString str = _T("This is some text");
file.Write( str, (str.GetLength()+1) * sizeof( TCHAR ) );
instead of
CStdioFile file(...);
CString str = _T("This is some text");
file.WriteString(str);
then the results will be Significantly different. The two lines of
text below are from a file created using the first and second code snippets
respectively:

(This text was viewed using WordPad)
Not all structures use the generic text mappings
For instance, the CHARFORMAT structure, if the RichEditControl
version is less than 2.0, uses a char[] for the szFaceName field,
instead of a TCHAR as would be expected. You must be careful not
to blindly change "..." to _T("...") without
first checking. In this case, you would probably need to convert
from TCHAR to char before copying any data to the szFaceName
field.
Copying text to the Clipboard
This is one area where you may need to use ASCII and Unicode
in the same program, since the CF_TEXT format for the clipboard
uses ASCII only. NT systems have the option of the CF_UNICODETEXT
if you wish to use Unicode on the clipboard.
Installing the Unicode MFC libraries
The Unicode versions of the MFC libraries are
not copied to your hard drive unless you select them during a
Custom installation. They are not copied during other types of
installation. If you attempt to build or run an MFC Unicode
application without the MFC Unicode files, you may get errors.
(From the online docs) To copy the files to
your hard drive, rerun Setup, choose Custom installation,
clear all other components except "Microsoft Foundation
Class Libraries," click the Details button, and
select both "Static Library for Unicode" and
"Shared Library for Unicode."
|
|
 |
 | This article reproduced... Steve_Harris | 2:02 27 Jan '10 |
|
 |
...here![^]
I hope you realise that hamsters are very creative when it comes to revenge. - Elaine
|
|
|
|
 |
 | Some of the Russian characters not displayed. Member 2975111 | 18:41 19 Jan '10 |
|
 |
We have a VC++ application which is build using Unicode character set. Everything is working fine but some of the Russian characters are not displayed, it is coming as ‘?’ . This is happening for the strings displayed on button most of the time.
|
|
|
|
 |
 | Russian chars in multibyte configuration WR1270 | 15:38 18 Sep '09 |
|
 |
Hi.
I have a huge program written in multibyte configuration, when recently I was asked to translate it to Russian.
When I change the caption of a button to some russian string using SetWindowText , the caption changes to something like "???????" presenting question marks instead of russian chars.
When I create a new project, and set it to unicode instead of multibyte, SetWindowText works fine with russian.
Does it mean I must change my progam to unicode? If so , I'll have to fix 1000's lines of code, wrapping them with _T() and replacing all occurances of std::string to CString Is it true?
|
|
|
|
 |
 | facing problem in text to its Unicode value conversion Member 4417050 | 2:56 8 Jul '09 |
|
 |
The scenario is .....
I have a changed the font of textbox to some another language font. Now i am typing some text...
How could I get the corresponding unicode value of those text
|
|
|
|
 |
 | This article referenced on a (stupid) Patent ddrogahn | 11:55 23 Feb '09 |
|
 |
US Patent 7308399 - Searching for and updating translations in a terminology database http://www.patentstorm.us/patents/7308399/claims.html
Apparently, converting text encodings, and searching a database is patent worthy. How they drag it out to 45 claims is even worse.
|
|
|
|
 |
|
 |
Woohoo! My 5 seconds of fame.
|
|
|
|
 |
 | How do I test code on a US-English computer Allan Braun | 11:35 24 Jul '08 |
|
 |
I need to create files with names in unicode characters This is to test my code How is this done with a computer based on US-English?
|
|
|
|
 |
 | Clarification Raghavendra Pise | 20:02 17 Dec '07 |
|
 |
Hi,
I have one Xtension for quark Xpres server developed in Vc++ tool. I have done this project in win-32 appliacation(_MBCS). Now I have to make compatible for Unicode as well as ASCII charcter.
Please suggest me shall I use whchar_t or TCHR to accomplish my task? I read some of documents for unicode but didnt realize.
|
|
|
|
 |
 | Unicode Problem Mark_VC | 10:07 19 Jul '07 |
|
 |
I am replacing english with polish language in my application (only the user messages). I am not using Unicode. What happens is not expected. The menu displays the correct polish letters, but the other part of the program like controls and buttons do not diplay polish characters. they do display the english characters [Many characters in polish and english are same]. The program should either not display polish language at all, or it should display it all correctly. Why my controls is not diplaying polish characters. [some insight: there are about 10 polish characters that are different than english and that's where the problem is]. Any ideas?
|
|
|
|
 |
|
 |
I am going to answer it myself first, first. May be the menu which is non client area is part of the operating system. Since the operating system (XP), supports unicode characters so does the menu in my application. When it comes to the client area, that is where the problem lies, the operating system has not control over it. The polish characters are not displayed on the buttons and instead resprestend by vertical bars. I am still looking for any genuine answer and an easy way to fix it.
|
|
|
|
 |
 | Unicode - A Warning tshavel | 10:08 25 Jan '07 |
|
 |
For anybody out there who is developing an application, for which there exists even the most minescule, remote possibility that any fragment of it will ever need to support Asian characters, do not - and hear me good, DO NOT assume it will be a minor fix at a later date. Define Unicode, and begin using wide characters now, for every string, and string operation involved in the entire application. The nightmare of trying to retrofit an application, and all of the code linking to it, is beyond what I can describe in words.
|
|
|
|
 |
 | C++ 6.0 adam_smith_2003 | 4:29 12 Jul '06 |
|
 |
Hello,
can someone please help me as i have been trying to do this for two days and am going mad.
Am using VC++ 6.0 and i have a static peice of text in arabic that i need to print out. I am using windows XP.
I have used this example code but am not sure on the wide char stuff, i dont seem to be able use it with the TextOut function which is what am using as am trying to print to a printer. I dont really want to use MFC but i will if i need to.
The most i have managed to print out is either ????'s
any help REALLY appreciated !
Adam
|
|
|
|
 |
|
 |
It is going to be hard without MFC. Did you enable unicode? I would suggest, using an outside text editor which can support arabic text in unicode. Copy that text into your applicatin simply by copy and past (where you will be using pDC->TextOut( *your copies text here*). I think it might work. Let me know.
Mark
|
|
|
|
 |
 | Strange things... Leonhardt Wille | 0:58 12 Jul '06 |
|
 |
Hi there, I don't know whether my problem is Unicode dependent or not but I'll post it here anyway - maybe someone can help. I use a MSHTML control to enter cyrillic ANSI text (charset windows-1251) which really works fine. I load some russian text looking like "Ñèãàðåòû" (cigarettes) into the control and the correct russian text "Сигареты" is displayed. When I now retrieve the Text from the control by getting the InnerHtml of the Body element, I receive a BSTR with the correct russian text. To save this text I have to convert it to ANSI because our target platform only supports ANSI. Now all I do is a disdainful csHtml = OLE2T(bsBody);, and from that point on I only have question marks instead of Russian chars. I also tried a locale-aware conversion with VarUI1FromStr which brought the same result. I don't know what to do now - Software like Dreamweaver etc. is also able to save russian text into ANSI files...    
Thou Shalt Process Messages If Thee Wishes Thine UI To Behave Properly (Gary R. Wheeler)
|
|
|
|
 |
|
 |
I found it... It's so easy!
WideCharToMultiByte (1251,NULL,bsHtml,bsHtml.Length(),csHtml.GetBuffer(bsHtml.Length()),bsHtml.Length(),NULL,NULL);
I think I'll have lunch now...
|
|
|
|
 |
 | What is the difference between _TCHAR* and LPTSTR ? ana_v123 | 8:13 10 Jul '06 |
|
 |
What is the difference between _TCHAR* and LPTSTR ?
Following function gives error (_UNICODE is defined in project settings - MFC Application in VC++6)
void MyFunc(LPTSTR lpszName) { _TCHAR szName[10]; _tcscpy(szName, lpszName); }
Error is: error C2440: '=' : cannot convert from 'unsigned short [9]' to 'char *'
But function does not give error if function argument LPTSTR is changed to _TCHAR*
What could be the reason ???
Ana
Ana_v123
|
|
|
|
 |
|
 |
It's been a long time, but in case anyone still reads this (besides me), I think this is the answer:
I think that Ana defined _UNICODE but forgot to define UNICODE. In an MFC project, this mixture causes some TCHAR stuff to be ANSI and some to be Unicode. You must either define both macros or omit definitions of both.
|
|
|
|
 |
 | Unicode mfc dll ,domodal throwing exception kuldeepjangir | 4:14 21 Jun '06 |
|
 |
I had an extension dll that was working fine in mbcs mode. But i had to convert this dll to unicode supported dll so i have added _unicode in property and build it there is not any linking or compiling error but when i try to domodal a dialog it throws an exception ,i catch it in catch(...) and getlast error it shows "operation completed successfully"
kuldeep
|
|
|
|
 |
 | Read a unicode text from the file code66 | 1:07 4 Apr '06 |
|
 |
how to read the unicode text from the file since read always takes char type parameter
|
|
|
|
 |
 | stat fails with the maximum file length for Japanese characters Sandeep. Vaidya | 20:18 23 Mar '06 |
|
 |
I using Win2K Japanese. I have created a file with maximum path size allowed by the windows using the Japanese characters in file name. If I use stat() for that file, it failes and return -1.
If I Get the file descriptor by using the open function and use fstat() it succeedes.
When I use the same set of characters in the path with less length, stat()succeeds. Please let me know what coult be the problem.
Thanks in advance. Sandeep
|
|
|
|
 |
|
 |
Welcome to the world of Microsoft's Japanese products. You have to experiment with every API to find out what works and what doesn't work.
If you want a psychic powered debugging guess about this particular bug, it looks like part of the CRT converts the path name to ANSI with a maximum path length in bytes, which can only hold about half of the characters that the actual path name uses.
|
|
|
|
 |
 | stl wstring llbird | 16:32 20 Dec '05 |
|
 |
vector vec_strings; copy(vec_strings.begin(), vec_strings.end(), ostream_iterator(wcout));
can not complie??(vc++6 pack 6) dislike Unicode!!
-- modified at 21:32 Tuesday 20th December, 2005
|
|
|
|
 |
 | Write code to complete conversion Cupcake38 | 10:42 15 Oct '05 |
|
 |
 I need to find out the conversion rates between liters and pints, and liters to gallons. The conversion needs to be in wet liters to wet pints and from wet liters to wet gallons.
Thank you to any one can help!
Cupcake38
|
|
|
|
 |
|
 |
Check this:
http://www.onlineconversion.com/volume.htm
Mvg, André Laan
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
I used to laugh at Dilbert cartoons, now I often confuse it with reality. -- Xiangyang Liu --
|
|
|
|
 |
 | define unicode windows Behzad Bahjat Manesh | 4:05 30 Jul '05 |
|
 |
please , say how can define unicode windows, by vc6 and install unicode of a program
Bahjat Manesh Ardakan
|
|
|
|
 |
|
|