Click here to Skip to main content
15,886,199 members
Please Sign up or sign in to vote.
2.00/5 (1 vote)
See more:
Hi,

I need to encode a chinese string using C#.I know using ASCIIEncoding class we can

encode 8 bit character(Normal english language).But I think in chinese size is 2 byte

for one character.Please help me

Thanks in advance
Posted

It might helps to solve your problem,

Converting chinese character to unicode[^]
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 18-Jul-11 16:25pm    
It may not be relevant at all. .NET characters are already Unicode; there is nothing to convert. There are different Unicode encodings. Please see my solution.
--SA
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 18-Jul-11 16:27pm    
It may or may not be relevant. However, this is useful info, my 5. .NET characters are already Unicode; there is nothing to convert. There are different Unicode encodings. Please see my solution.
--SA
First of all, .NET natively support Unicode UTF-16 encoding which covers all code points; actually, the code point is encoded with 2 bytes or 4 bytes, which covers not only BMP (Base Multilingual Plane, 0 to 0xFFFF) but all characters above it. To best of my knowledge, all Chinese code points sit in BMP. So so all other UTFs supporting full Unicode: UTF-8 and UTF-32. So, you don't need anything to "encode" Chinese, it is already supported.

You need to use encoding only to read/write data to stream. Prefer UTF-8, which is the standard de-facto for most applications including the Web. Use System.Text.Encoding.UTF8 and System.Text.UTF8Encoding, see http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^].

You might need better understanding of Unicode. This is not a 16-bit code! It standardize mapping between characters as cultural entities regardless of concrete glyphs and integer values understood in its abstract mathematical meaning, regardless of bit presentation of data in computers. The code points go well above 0xFFFF. On top of this, there are UTFs.

See:
http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].

—SA
 
Share this answer
 
Comments
Espen Harlinn 18-Jul-11 18:12pm    
Good comprehensive reply, my 5
Sergey Alexandrovich Kryukov 18-Jul-11 18:16pm    
Thank you, Espen.
--SA
thatraja 18-Jul-11 22:35pm    
5!
Sergey Alexandrovich Kryukov 18-Jul-11 23:24pm    
Thank you, Raja.
--SA
You can use UnicodeEncoding[^]
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 18-Jul-11 16:26pm    
This is kind of pointless. .NET characters are already Unicode; there is nothing to convert. There are different Unicode encodings, but the UnicodeEncoding is UTF-16, and not very practical. Please see my solution.
--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900