How to store UNICODE value to char*

Question

4.00/5 (1 vote)

See more:

I have an application where I have assign a unicode (smiley -- '\u263B') to CString variable with some message text. I am able to see the smiley and the message text when it is in the CString but I need to send it to other application which requires it to assign char*. But here I am not able to receive whole message. I am only able to see the first 2 letters (first in smiley image and other is first character of message).
Please let me know what is the best way to send smiley over the network.

Posted 28-Mar-13 21:16pm

SNI

Add a Solution

3 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

nv3 · Answer 1 · 2013-03-29T00:16:00

Your question shows that you have not really a clear understanding of the concepts of
- Unicode
- UTF8 / UTF16
- ANSI character string
- ASCII character string
and the programming concepts
- char and wchar_t arrays
- CString
- std::string.
To me it seems very important to get at least a basic understanding of these things, in order to survive in the C++ world. So instead of answering your question directly, I try to give you a start on those in a nutshell (for more see See What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?[^]):

ASCII was one of the first encoding schemes, using the lower 7 bits of each byte to represent one of the letters, digits, and special symbols. It was a simple scheme and was sufficient for english texts.

ANSI code pages: This is actually a misnomer and denotes an extended coding schemes in which the upper 128 codes in byte are being used for symbol sets that are used in certain regions. For example code page 1252 defines special symbols for most of the European languages. Windows offers conversion functions from ANSI code page x to Unicode

Unicode is an international standard of encoding characters (symbols) and is very comprehensive. It uses 16 bits per symbol (and can be extended to even more bits) and that allows to represent all symbols used in today's world to be represented in a single code scheme.

UTF-8 and UTF-16 are two representation forms of Unicode using 8-bit and 16-bit building blocks. UTF-16 is basically a one-to-one representation of Unicode in which every 2-byte cell represents one character. UTF-8 is a variable length encoding theme. The most frequently used characters fit into a single byte, but other characters need 2, 3, or 4 bytes. UTF-8 representation has become very popular, because it still uses a byte as elementary cell, but is capable of representing the complete Unicode set.

char and wchar_t arrays: These are the most elementary storage mechanisms for character strings. You tell the compiler to set aside a fixed amount of char or wchar_t cells. Usually you set aside more space than necessary, because you don't know the length of the strings yet that you are going to store. Zero-termination is one form of denoting the length of a string. Keeping the length in a separate int variable is another form. Note that the lengths is usually counted in characters, not in bytes. A wchar_t array has twice as many bytes as there fit characters into it.

CString is an MFC class that makes string handling a lot easier. It includes that string length counter and does automatic storage allocation from the heap, so you don't have to tell it the maximum length of strings you are going to store. It also contains conversion operators from char to wchar_t type of strings, and many useful string operators. Becoming familiar with CString is an absolute must in an MFC environment (and useful in other envirnments as well).

std::string is the equivalent to CString in the STL library, which came later than CString, but is not necessarily better; it's just the STL way of doing things.

So there are always two things to consider when talking about strings: (a) what are the character cells made of (8 bits or 16 bits for example) and how is the storage management done to store an array of those character cells.

Now back to your question, how can you send a string with some strange symbols from one application to another. I assume you are taking about the transfer via a file. One way of doing this is to write it as UTF-8 string into the file, and read it as UTF-8 string on the other side. Both applications must be aware that they are dealing with UTF-8 and not plain ASCII or ANSI code pages formats. If your destination application is written for pure-ASCII, or even ANSI code page 1252, you don't have a chance to get the smiley across. You could try using ANSI code page 437, which is the original IBM PC character set, but that also does require that you reader application is expecting that.

Ian A Davidson · Answer 2 · 2013-03-28T23:24:00

Solution 2

You probably want to convert it to a multi-byte string. Look up information on WideCharToMultiByte[^].

Regards,
Ian.

Posted 28-Mar-13 23:24pm

Ian A Davidson

Richard MacCutchan · Answer 3 · 2013-03-28T23:07:00

Solution 1

A char* points to an array of ASCII characters, you cannot use it to access a Unicode string.

Posted 28-Mar-13 23:07pm

Richard MacCutchan