Click here to Skip to main content
Click here to Skip to main content

Unicode Output to the Windows Console

By , 25 Mar 2009
 

Who Should Read this Article?

If you create Windows console programs and want to be able to print wide strings properly, this is something for you.

More than the actual proficiency in C++, it is important that you understand what Unicode is and what wide strings are.

Introduction

It's hard to emphasize enough the importance of making Unicode aware applications.

The novices in C/C++ should be taught from the beginning not to use main, strlen, printf, etc. It should be pointed out to them from the beginning that modern Windows systems internally work with 16-bit Unicode, aka wide strings. Therefore wmain, wcslen, wprintf, etc. (or even better: the TCHAR paradigm) should be used instead.

When new C++ projects are created in Visual Studio, they follow the TCHAR paradigm. It means that, instead of the above, _tmain, _tcsclen, _tprintf, etc. are used. They are typedefs that have different meaning depending on the character set chosen in the project settings. This paradigm is created so that the same code could be built for old (Windows 95, Windows 98) and new versions of Windows (NT, XP and newer). Since programming for these old Windows versions does not make sense any more, we could simply use the wide versions of functions. Yet, following the TCHAR paradigm still makes sense, because it can make the code more portable to operating systems that do not use wide strings, like Linux.

All this works fine. The problem arises when you write a console application. The application can read wide command line arguments properly. I do not know if input of wide string via standard input works OK because I never needed to use it. But I needed to output them and it did not work. I tried CRT functions like wprintf and STL objects like wcout. Neither of them worked. I searched for a suitable solution and could not find it.

I set up the cmd window to use Lucida Console font (and you should do it too, otherwise any attempt to see Unicode characters in it is bound to fail!). I realized that it is possible to print wide strings directly to the console using functions from conio.h (_cputts, _tcprintf, etc.). Very nice!

Yet... When someone is using a console application, she/he expects to be able to redirect its output. It does not work if output goes directly to the console. It must go to stdout or stderr.

It seems Microsoft was not consistent in this. While the whole system works with wide strings, the console output does not, and in .NET, the default output code page is UTF-8! But it gave me the idea. I also noticed that text files encoded in UTF-8 can be properly printed to the console (using `type` for example), provided the console code page is set to UTF-8 using the command `chcp 65001`. Now I wanted to use UTF-8 from C++.

Using the Code

Setting and Resetting the Codepage

We must prepare the console for UTF-8. We first store the current console output codepage in a variable:

UINT oldcp = GetConsoleOutputCP();

Then we change the console output codepage to UTF-8, which is the equivalent of `chcp 65001`:

SetConsoleOutputCP(CP_UTF8);

Before exiting the program, we must be polite and bring the console back to the state as it was before. We must:

SetConsoleOutputCP(oldcp);

When We Want to Print Out Wide Strings in the Program, We Will Do it Like this

Suppose we have a wide string containing Unicode characters, say:

wchar_t s[] = L"èéøÞǽлљΣæča";
If you write that in Visual Studio, when you attempt to save the file you will be prompted to save it in some Unicode format. "Unicode - Codepage 1200" will be OK.

We convert it to UTF-8:

First we call WideCharToMultiByte with the 6th argument set to zero. That way, the function will tell us how many bytes it is going to need to store the converted string.

int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);

We allocate a buffer: 

char* m = new char[bufferSize]; 

The second call to WideCharToMultiByte does the actual conversion:

WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);

Print it to stdout. Notice the capital S. It tells the wprint function to expect narrow string:

wprintf(L"%S", m); 

Release the buffer: 

delete[] m; 

Now the output goes to stdout. If redirected to a file, the file will be encoded as UTF-8.

This Is It 

It is not a big deal and cannot be compared to the articles that require much more work. Yet I hope it can be useful because it tries to solve a problem that is widely neglected. Last time I checked, I could not find the solution for this problem in Java either.

In my example code, I packed everything I spoke about here in small ostream and wostream overrides. They are not perfect and I'm pretty sure they could be coded better. I would do it if I knew more about iostream programming. Yet they can be useful for those who want the solution out of the box and easy to use. But it should be pointed out that they are not thread safe. There are more comments in the code.

History 

This article is completely rewritten, mainly because the comments of Member 2901525 made me understand that the code is not perfect enough to be offered without some more explanation. The article itself was very short, looked sketchy and earned some low marks. I forgot to mention that Lucida Console font must be used in the cmd window. Member 2901525 noticed a weak point in the code and I changed this. Otherwise there are no significant changes in the code.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

swuk
Croatia Croatia
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralRe: Another approachmemberswuk27 Mar '09 - 11:35 
> First of all, don't be so pedantic. I know they are written in C++, and so do you.
Of course I do Wink | ;)
 
> Disagree totally. Sorry, but I believe in taking the long view as I've explained.
I understand your point Wink | ;) And I'm glad you will localize to German. I very much like German.
GeneralRe: Another approachmemberswuk27 Mar '09 - 11:28 
Hey! I just looked at your homepage and found:
"Despite not having a Christian background, I came to Faith in Spring 2001"
The same is true for me, during the years around 2000!
Smile | :) Smile | :)
General[My vote of 1] Vote of 1memberMember 290152523 Mar '09 - 22:11 
So you have shown us how to use SetConsoleOutputCP() and how to overload operator<<. C'mon, is it big deal?
 
Besides the code is far from being perfect. There are some points that maybe require improvement:
 
1. Performance. Every time you call operator<< you set and re-set current codepage. That can be expensive, right?
2. Thread safety. Imagine two threads are trying to call those operators more or less simultaneously. One thread can re-set the codepage just before the second thread is printing a string.
3. Resource handling. Your code allocates buffer for the converted string. Do you see that throw statement between the allocation and the release?
4. Error handling. What happens if SetConsoleOutputCP returns FALSE? Why not to check return value of the second call to WideCharToMultiByte ?
GeneralRe: [My vote of 1] Vote of 1 [modified]memberswuk24 Mar '09 - 0:05 
Thank you for your commentSmile | :) I do not mind a bad mark when it is such a useful feedback!
 
1) You are right, but the actual printing to the console is so much slower then changing the codepage that that there is no noticable slowdown.
2) I'm very grateful for this feedback. I simply forgot about that.
3) Same as 2). This was intended to make the application fail if someone tries to use the ostream overrides with something other then (w)cout or (w)cerr. But true, if someone had decided to catch the exception this could have opened a memory leak.
4) If SetConsoleOutputCP returns FALSE, nothing terrible. It would probably print garbage. WideCharToMultiByte returns the number of bytes written to the buffer. It is important if the fixed buffer is used. Then, if the return value is the same as the buffer size, it means it's full and we could reasonably suspect that some characters were truncated. I used this value when I used fixed buffer. When I switched to dynamic allocation I thought it is not necessary any more. Now, of course, true, in a really really bullet proof program everything should be checked. For example if the allocation was successful. Hmmm...
 
modified on Wednesday, March 25, 2009 1:18 PM

GeneralMy vote of 1memberblackbird17 Mar '09 - 2:09 
This is not an article c'mon.
GeneralRe: My vote of 1 [modified]memberswuk17 Mar '09 - 3:50 
May I ask you to explain what do you mean?
Have you seen the code?
The article is so short because I wrote these header files (ostream_wide.h, wostream_wide.h)... and they turned out to be so simple to use ... that even I was pleasantly surprised. All we need to properly print Unicode on the Windows console is to include them in our projects. We barely have to change any existing code because this approach works seamlessly. And the code uses WideCharToMultiByte, GetConsoleOutputCP and SetConsoleOutputCP so it can help someone to learn about this functions.
Do you know how many Windows console application have problems with Unicode? I tried hard to find a solution for that for my applications. Now I'm offering it to the community.
 
modified on Tuesday, March 17, 2009 10:39 AM

GeneralRe: My vote of 1memberAlexandre GRANVAUD17 Mar '09 - 5:08 
maybe you should explain how your snippet is built etc...
GeneralRe: My vote of 1 [modified]memberswuk17 Mar '09 - 5:21 
Yes, maybe.
This is my first article. I personally do not like to see article full of code stripped from the context. I thought it's easier to understand inside of a working project, provided it is commented.
But I'll see how the people react. Maybe I add some code in the article body.
 
modified on Tuesday, March 17, 2009 12:34 PM

GeneralRe: My vote of 1memberswuk17 Mar '09 - 5:28 
I mean, when I see a lot of code in the article I sometimes think: Auch, this is complicated!
I wanted to emphasize the simplicity of usage.
 
I see you had the same problem with your article (An intersting article Smile | :) It might be just what I need!), and that you have added source to the article body.
 
But then again, if it was not there I would not miss it - if the source package is nicely written, so that it leads me all the way. Commented, of course.
What I miss is something else: To be able to know as quickly as possible what it can do and what it can not. I have the class CPostavke that contains simple members but also one member that is an instance of another class which in turn has a member of type CAtlArray. Can I fill my CPostavke using your code? I'll eventually find out, and it will make no big difference if the code was embedded in the article or not. But it would be much easier if you explained that in your article.
 
Don't mean to criticize you, just want to explain why I did things the way I did - to everybody who is reading.
And I would like you to answer me that question about your code that bothers me.
Smile | :)
GeneralRe: My vote of 1memberAnna-Jayne Metcalfe23 Mar '09 - 12:27 
FWIW I always use the quality of the article as a guide to the likely quality of the code. If the article is poor or sketchy, I'm unlikely to even look at the code.
 
In that I suspect I'm not alone, so I'd suggest you seriously think about adding more detail to the article (e.g a bit of background, workarounds you've tried and rejected, how you came to the solution you did, and which challenges you overcame in developing it).
 
Anna Rose | [Rose]
 
Having a bad bug day?
 
Tech Blog | Anna's Place | Tears and Laughter
 
"If mushy peas are the food of the devil, the stotty cake is the frisbee of God"

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web04 | 2.6.130523.1 | Last Updated 25 Mar 2009
Article Copyright 2009 by swuk
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid