Who Should Read this Article?
If you create Windows console programs and want to be able to print wide strings properly, this is something for you.
More than the actual proficiency in C++, it is important that you understand what Unicode is and what wide strings are.
It's hard to emphasize enough the importance of making Unicode aware applications.
The novices in C/C++ should be taught from the beginning not to use
printf, etc. It should be pointed out to them from the beginning that modern Windows systems internally work with 16-bit Unicode, aka wide strings. Therefore
wprintf, etc. (or even better: the
TCHARparadigm) should be used instead.
When new C++ projects are created in Visual Studio, they follow the
TCHARparadigm. It means that, instead of the above,
_tprintf, etc. are used. They are typedefs that have different meaning depending on the character set chosen in the project settings. This paradigm is created so that the same code could be built for old (Windows 95, Windows 98) and new versions of Windows (NT, XP and newer). Since programming for these old Windows versions does not make sense any more, we could simply use the wide versions of functions. Yet, following the
TCHARparadigm still makes sense, because it can make the code more portable to operating systems that do not use wide strings, like Linux.
All this works fine. The problem arises when you write a console application. The application can read wide command line arguments properly. I do not know if input of wide string via standard input works OK because I never needed to use it. But I needed to output them and it did not work. I tried CRT functions like
wprintfand STL objects like
wcout. Neither of them worked. I searched for a suitable solution and could not find it.
I set up the cmd window to use Lucida Console font (and you should do it too, otherwise any attempt to see Unicode characters in it is bound to fail!). I realized that it is possible to print wide strings directly to the console using functions from conio.h (
_tcprintf, etc.). Very nice!
Yet... When someone is using a console application, she/he expects to be able to redirect its output. It does not work if output goes directly to the console. It must go to
It seems Microsoft was not consistent in this. While the whole system works with wide strings, the console output does not, and in .NET, the default output code page is UTF-8! But it gave me the idea. I also noticed that text files encoded in UTF-8 can be properly printed to the console (using `type` for example), provided the console code page is set to UTF-8 using the command `chcp 65001`. Now I wanted to use UTF-8 from C++.
Using the Code
Setting and Resetting the Codepage
We must prepare the console for UTF-8. We first store the current console output codepage in a variable:
UINT oldcp = GetConsoleOutputCP();
Then we change the console output codepage to UTF-8, which is the equivalent of `chcp 65001`:
Before exiting the program, we must be polite and bring the console back to the state as it was before. We must:
When We Want to Print Out Wide Strings in the Program, We Will Do it Like this
Suppose we have a wide string containing Unicode characters, say:
wchar_t s = L"èéøÞǽлљΣæča";
If you write that in Visual Studio, when you attempt to save the file you will be prompted to save it in some Unicode format. "Unicode - Codepage 1200" will be OK.
We convert it to UTF-8:
First we call
WideCharToMultiBytewith the 6th argument set to zero. That way, the function will tell us how many bytes it is going to need to store the converted string.
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
We allocate a buffer:
char* m = new char[bufferSize];
The second call to
WideCharToMultiBytedoes the actual conversion:
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
Print it to
stdout. Notice the capital
S. It tells the
wprintfunction to expect narrow string:
Release the buffer:
Now the output goes to
stdout. If redirected to a file, the file will be encoded as UTF-8.
This Is It
It is not a big deal and cannot be compared to the articles that require much more work. Yet I hope it can be useful because it tries to solve a problem that is widely neglected. Last time I checked, I could not find the solution for this problem in Java either.
In my example code, I packed everything I spoke about here in small
wostreamoverrides. They are not perfect and I'm pretty sure they could be coded better. I would do it if I knew more about
iostreamprogramming. Yet they can be useful for those who want the solution out of the box and easy to use. But it should be pointed out that they are not thread safe. There are more comments in the code.
This article is completely rewritten, mainly because the comments of Member 2901525 made me understand that the code is not perfect enough to be offered without some more explanation. The article itself was very short, looked sketchy and earned some low marks. I forgot to mention that Lucida Console font must be used in the cmd window. Member 2901525 noticed a weak point in the code and I changed this. Otherwise there are no significant changes in the code.