Click here to Skip to main content
15,895,815 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
I have an application where I have assign a unicode (smiley -- '\u263B') to CString variable with some message text. I am able to see the smiley and the message text when it is in the CString but I need to send it to other application which requires it to assign char*. But here I am not able to receive whole message. I am only able to see the first 2 letters (first in smiley image and other is first character of message).
Please let me know what is the best way to send smiley over the network.
Posted

Your question shows that you have not really a clear understanding of the concepts of
- Unicode
- UTF8 / UTF16
- ANSI character string
- ASCII character string
and the programming concepts
- char and wchar_t arrays
- CString
- std::string.
To me it seems very important to get at least a basic understanding of these things, in order to survive in the C++ world. So instead of answering your question directly, I try to give you a start on those in a nutshell (for more see See What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?[^]):

ASCII was one of the first encoding schemes, using the lower 7 bits of each byte to represent one of the letters, digits, and special symbols. It was a simple scheme and was sufficient for english texts.

ANSI code pages: This is actually a misnomer and denotes an extended coding schemes in which the upper 128 codes in byte are being used for symbol sets that are used in certain regions. For example code page 1252 defines special symbols for most of the European languages. Windows offers conversion functions from ANSI code page x to Unicode

Unicode is an international standard of encoding characters (symbols) and is very comprehensive. It uses 16 bits per symbol (and can be extended to even more bits) and that allows to represent all symbols used in today's world to be represented in a single code scheme.

UTF-8 and UTF-16 are two representation forms of Unicode using 8-bit and 16-bit building blocks. UTF-16 is basically a one-to-one representation of Unicode in which every 2-byte cell represents one character. UTF-8 is a variable length encoding theme. The most frequently used characters fit into a single byte, but other characters need 2, 3, or 4 bytes. UTF-8 representation has become very popular, because it still uses a byte as elementary cell, but is capable of representing the complete Unicode set.

char and wchar_t arrays: These are the most elementary storage mechanisms for character strings. You tell the compiler to set aside a fixed amount of char or wchar_t cells. Usually you set aside more space than necessary, because you don't know the length of the strings yet that you are going to store. Zero-termination is one form of denoting the length of a string. Keeping the length in a separate int variable is another form. Note that the lengths is usually counted in characters, not in bytes. A wchar_t array has twice as many bytes as there fit characters into it.

CString is an MFC class that makes string handling a lot easier. It includes that string length counter and does automatic storage allocation from the heap, so you don't have to tell it the maximum length of strings you are going to store. It also contains conversion operators from char to wchar_t type of strings, and many useful string operators. Becoming familiar with CString is an absolute must in an MFC environment (and useful in other envirnments as well).

std::string is the equivalent to CString in the STL library, which came later than CString, but is not necessarily better; it's just the STL way of doing things.

So there are always two things to consider when talking about strings: (a) what are the character cells made of (8 bits or 16 bits for example) and how is the storage management done to store an array of those character cells.

Now back to your question, how can you send a string with some strange symbols from one application to another. I assume you are taking about the transfer via a file. One way of doing this is to write it as UTF-8 string into the file, and read it as UTF-8 string on the other side. Both applications must be aware that they are dealing with UTF-8 and not plain ASCII or ANSI code pages formats. If your destination application is written for pure-ASCII, or even ANSI code page 1252, you don't have a chance to get the smiley across. You could try using ANSI code page 437, which is the original IBM PC character set, but that also does require that you reader application is expecting that.
 
Share this answer
 
Comments
nv3 29-Mar-13 7:03am    
[Pasted on behalf of OP]
Thanks for your reply.
I did the following...

CString l_strSmiley = _T("\u263B");
l_strSmiley.Append(_T("My Text"));

//This converts in to UTF 8
CT2A pszUTF8(l_strSmiley, CP_UTF8);
char* tag = pszUTF8;

//This converts back in proper format.
CA2T pszT(tag, CP_UTF8);
strtext.Format(_T("%s"),pszT);


This works fine in a application that runs on one or same m/c but when I am sending data (char* tag ) to other user and he is receiveing in const char*. When I tried to convert back it into the original text then instead of my unicode / smiley the text is concatenated with SQUARE box. Please let me know how to convert this in other end.
nv3 29-Mar-13 7:06am    
Have you checked that the UTF-8 string was not sent correctly? (print out the contents in hex byte-for-byte to check that).
You probably want to convert it to a multi-byte string. Look up information on WideCharToMultiByte[^].

Regards,
Ian.
 
Share this answer
 
A char* points to an array of ASCII characters, you cannot use it to access a Unicode string.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900