All version of .NET are based on Unicode.
It's more than just supporting Unicode. In a way, non-Unicode strings/characters are not directly supported. It means that if you write some .NET application with the use of standard
string
and
char
types, and then save data using one of the non-Unicode
encodings (such as ASCII), it won't guarantee that the data is saved without loss, unless you take a special care about it. It can happen because your user, by default, won't be limited by the
character repertoire of ASCII, and will always be able to enter some Unicode characters of the
code points which cannot be represented in ASCII. (ASCII is only the example.) Such character will all be automatically represented by '?', which of course will mean the loss of data.
Therefore, it is recommended that any .NET application (even a console application) should be written in assumption that the text data can be entered from any part of Unicode repertoire. And you should not try to support non-UTF encodings, with the exclusion of some special cases, such as utilities dealing with encoding or analyzing texts and the encodings.
Internally, in memory representation, .NET uses the representation of Unicode characters in the form of UTF-16LE, but this fact should never be directly used in programming. The character/string data should be considered as abstract Unicode data, abstracted from its memory representation. For getting particular UTF data (represented as the
byte[]
array)
serialization based on the
System.Text.Encoding
classes should be used:
http://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx[
^].
Some reading:
http://en.wikipedia.org/wiki/Unicode[
^],
http://www.unicode.org/[
^],
http://en.wikipedia.org/wiki/Unicode_Transformation_Format[
^],
http://www.unicode.org/faq/utf_bom.html[
^].
—SA