Click here to Skip to main content
15,917,568 members
Home / Discussions / C#
   

C#

 
GeneralRe: Convert char* to string Pin
Judah Gabriel Himango11-May-04 3:49
sponsorJudah Gabriel Himango11-May-04 3:49 
GeneralRe: Convert char* to string Pin
peraonline11-May-04 20:32
peraonline11-May-04 20:32 
GeneralRe: Convert char* to string Pin
Heath Stewart12-May-04 2:55
protectorHeath Stewart12-May-04 2:55 
GeneralRe: Convert char* to string Pin
peraonline12-May-04 3:10
peraonline12-May-04 3:10 
GeneralRe: Convert char* to string Pin
Heath Stewart12-May-04 3:57
protectorHeath Stewart12-May-04 3:57 
GeneralRe: Convert char* to string Pin
peraonline12-May-04 4:03
peraonline12-May-04 4:03 
GeneralCharacter Encoding Pin
gUrM33T11-May-04 1:45
gUrM33T11-May-04 1:45 
GeneralRe: Character Encoding Pin
Mike Dimmick11-May-04 3:33
Mike Dimmick11-May-04 3:33 
StreamReader has several constructors, some of which take an Encoding and/or a boolean value to indicate whether the encoding should be detected or not.

A bit of poking around in Reflector reveals that if you don't provide an encoding, it uses UTF8, and if you don't say otherwise, it tries to detect the encoding rather than use the default UTF8.

When trying to detect an encoding, it uses the Byte Order Mark character. The Unicode standard indicates that this character, U+FEFF, should appear at the beginning of the text in whatever encoding is used. In UTF-16 little-endian, this becomes the byte sequence 0xFF 0xFE; in UTF-8, it's (IIRC) 0xEF 0xBB 0xBF. If there's no Byte Order Mark, it simply uses the encoding specified in the constructor, unless you didn't use one of those variants, in which case it uses UTF-8. .NET can also detect UTF-16BE, or big-endian, where the bytes of UTF-16 are the other way round.

If you use File.OpenText or FileInfo.OpenText, you don't get to specify an encoding.

Unfortunately very few of us have files encoded as UTF-8. They're far more likely to be encoded using our default code page. For most Western European and North American users, this is going to be Windows 1252 (Windows Western). You can get hold of an encoding for the user's configured ANSI code page using Encoding.Default.

Western users, particularly UK, US and Canada, may not notice at first that the encoding is different, because the first 256 code points of Unicode are the same as ISO Latin 1 (a little, though not a lot, different from 1252). Due to the way it's encoded, the first 128 code points of UTF-8 are also the same as Latin 1 and ASCII (ISO-646-US). Any UTF-8 code byte greater than 127 indicates that one or more following bytes needs to be interpreted along with this one to get the full character.

There's no reliable way to detect which encoding is used by a random sample of text in a byte-oriented character stream (which isn't UTF-8). The concept of Byte Order Marks is relatively new. You either have to know or ask the user.

More information links:

Microsoft Global Development Portal[^]
Code Page reference tables[^]
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
[^]
http://www.unicode.org/[^]
Character Sets[^] (from my blog).

Stability. What an interesting concept. -- Chris Maunder
GeneralRe: Character Encoding Pin
Paul Watson11-May-04 4:11
sitebuilderPaul Watson11-May-04 4:11 
GeneralRe: Character Encoding Pin
Heath Stewart11-May-04 4:25
protectorHeath Stewart11-May-04 4:25 
GeneralbeginInvoke() Pin
sreejith ss nair11-May-04 1:32
sreejith ss nair11-May-04 1:32 
GeneralRe: beginInvoke() Pin
Heath Stewart11-May-04 4:14
protectorHeath Stewart11-May-04 4:14 
GeneralRe: beginInvoke() Pin
Paul Watson11-May-04 4:14
sitebuilderPaul Watson11-May-04 4:14 
GeneralHTTP request,response Pin
dcronje11-May-04 1:05
dcronje11-May-04 1:05 
GeneralRe: HTTP request,response Pin
Heath Stewart11-May-04 4:05
protectorHeath Stewart11-May-04 4:05 
Generalhelp on wmi and registry Pin
chettu11-May-04 0:54
chettu11-May-04 0:54 
GeneralRe: help on wmi and registry Pin
Heath Stewart11-May-04 3:59
protectorHeath Stewart11-May-04 3:59 
GeneralRe: help on wmi and registry Pin
chettu11-May-04 20:18
chettu11-May-04 20:18 
GeneralRe: help on wmi and registry Pin
Heath Stewart12-May-04 2:52
protectorHeath Stewart12-May-04 2:52 
GeneralRe: help on wmi and registry Pin
chettu12-May-04 3:21
chettu12-May-04 3:21 
Generalblock the keyboard Pin
cristina_tudor11-May-04 0:36
cristina_tudor11-May-04 0:36 
GeneralRe: block the keyboard Pin
Corinna John11-May-04 0:50
Corinna John11-May-04 0:50 
GeneralReturning string from unmanaged dll Pin
Mikke_x10-May-04 23:50
Mikke_x10-May-04 23:50 
GeneralRe: Returning string from unmanaged dll Pin
Heath Stewart11-May-04 3:31
protectorHeath Stewart11-May-04 3:31 
GeneralRe: Returning string from unmanaged dll Pin
Mikke_x11-May-04 4:06
Mikke_x11-May-04 4:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.