Read japanese text from UTF-8 text file

Question

0.00/5 (No votes)

See more:

I have created one text file with UTF-8 encoding, and in that file I written some Japanese characters, now I want to read this text file and display on console as well as store data in another file..

Posted 12-Nov-14 22:39pm

Member 10168792

Add a Solution

Comments

Jochen Arndt 13-Nov-14 4:53am

The answer to this question is OS dependant because you need to call system or external library functions to convert encodings to those used by your application and by the console.

Member 10168792 13-Nov-14 5:41am

Thanks,
But I am not aware about MultibyteByteToWideChar as well as WideCharToMultibyte, can you show this with any small example?

Member 10168792 13-Nov-14 4:56am

Currently I am using Windows 7 Enterprise.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jochen Arndt · Accepted Answer · 2014-11-12T23:33:00

The Microsoft SDK provides two functions to convert between character encodings: MultiByteToWideChar[^] and WideCharToMultiByte[^].

To simplify the code of your app, you should make it using Unicode (which is the default with recent VisualStudio versions).

Use MultiByteToWideChar to convert an UTF-8 string to wide chars. To print this to the console, it may be necessary to convert the string to the encoding used by the console (call GetConsoleOutputCP[^]). When the code page used by the console is not able to print your Japanese characters, you may change the code page using SetConsoleOutputCP[^]. In all cases you must ensure that the font used by the console contains the used characters.

With output to file you are free to use any encoding. It depends mainly on the applications that should open the file.

[EDIT according to the comment posted above]
You may have a look at the tip Handling simple text files in C/C++[^] for an example.
The general process is:

Get the size of the UTF-8 file
Allocate a buffer for the UTF-8 text
Open the file, read the content into the buffer, close the file
Call MultiByteToWideChar with CP_UTF8, lpMultiByteStr = input buffer, cbMultiByte = file size, lpWideCharStr = NULL, cchWideChar == 0 to get the length for the buffer
Allocate the wide char buffer using the value returned by the above call
Call MultiByteToWideChar again passing now the output buffer and it's size.
Do something with the wide string like printing to console
Delete the buffers if no longer needed

If you want to use the UTF-8 file content also for other purposes, you must allocate one byte more and set that to zero. This is not necessary when only using MultiByteToWideChar and passing the correct size.