Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# VB
Hi guys,
 
I'm converting an old VB application into C#, and part of the system requires me to convert certain special characters to their ASCII equivalent.
 
In VB, the code is:
 
 
sValue = Asc("œ")  'which gives 156
 
sValue = Asc("°")  'which gives 176
 
sValue = Asc("£")  'which gives 163
 
 
These are the correct values according to http://www.ascii-code.com/.
 

But when doing the same conversion in C#, the first of these values gives a strange answer.
 
Here is the code:
 
 
As ints:
 
int i1 = (int)Convert.ToChar("œ");    // which gives 339

int i2 = (int)Convert.ToChar("°");    // which gives 176

int i3 = (int)Convert.ToChar("£");    // which gives 163

 
As bytes:
 
byte i1 = (byte)Convert.ToChar("œ");    // which gives 83

byte i2 = (byte)Convert.ToChar("°");    // which gives 176

byte i3 = (byte)Convert.ToChar("£");    // which gives 163

 
 
What gives?! Frown | :( I'm suspecting it's something to do with the sign bit, but I can't see what.
 
Many thanks
Posted 27-Dec-12 5:35am
Comments
Sergey Alexandrovich Kryukov at 27-Dec-12 22:23pm
   
Who told you it should be ASCII? ASCII won't work for you...
—SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

Hello Nick,
 
What you refer to as being ASCII is *not* ASCII (see http://en.wikipedia.org/wiki/ASCII[^]).
Only the 7-bit ASCII character encoding is unambiguously given.
 
There exist several 8-bit extensions to the original 7-bit encoding.
 
Your page claims to list œ as being part of latin-1. But reding carefully, the page says
 
[...] The extended ASCII codes (character code 128-255)
There are several different variations of the 8-bit ASCII table. The table below is according to ISO 8859-1, also called ISO Latin-1. Codes 129-159 contain the Microsoft® Windows Latin-1 extended characters. [...]

 
Microsoft decided some years ago to "modify" the standard to fit their needs. See http://www.cs.tut.fi/~jkorpela/chars.html[^] or more specific on http://www.cs.tut.fi/~jkorpela/chars.html#win[^].
 
Standard Latin-1 does *not* contain œ. That is included in Latin-9 (also known as ISO/IEC-8859-15), see also ISO Latin 9 as compared with ISO Latin 1[^] and http://en.wikipedia.org/wiki/ISO/IEC_8859-15[^].
 
Now, how to solve your issue?
Neither latin-1 nor latin-9 works on Windows.
You need to take Encoding.GetEncoding(1252) which happens to be the same result as calling Encoding.Default (as ProgramFOX[^] described in Solution #3).
 
Cheers
Andi
  Permalink  
Comments
Sergey Alexandrovich Kryukov at 27-Dec-12 22:19pm
   
Exactly. This is some legacy trash called "extended ASCII". Practically, none of the modern systems support it, for a good reason.
Unicode representation of these characters should be used, that's it.
My 5.
—SA
Andreas Gieriet at 27-Dec-12 22:43pm
   
Hello Sergey,
thanks for your 5!
Cheers
Andi
Nick Fisher (Consultant) at 28-Dec-12 5:45am
   
Excellent answer, thanks. Nick
Andreas Gieriet at 28-Dec-12 8:01am
   
You are welcome!
Andi
Espen Harlinn at 28-Dec-12 7:40am
   
Good guess, a 5 :-D
Andreas Gieriet at 28-Dec-12 8:01am
   
Hello Espen,
thanks for your 5!
Cheers
Andi
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

Richard is right. To get the same bytes in C# as the bytes in VB, use this:
byte i1 = Encoding.Default.GetBytes("œ")[0];
The GetBytes method returns a byte array, with Encoding.Default.GetBytes("œ")[0] you get the first value of the byte array.
 
Hope this helps.
  Permalink  
v2
Comments
Nick Fisher (Consultant) at 28-Dec-12 5:45am
   
Yes, this works now. Many thanks. Nick
ProgramFOX at 28-Dec-12 7:45am
   
You're welcome!
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Use the GetBytes[^] of the Encoding.ASCII[^] encoding to get the characters converted to ascii.
 
Best regards
Espen Harlinn
  Permalink  
Comments
Andreas Gieriet at 27-Dec-12 21:24pm
   
Hello Espen,
this would remove diacritics by mapping the windows code page 1252 characters to 7-bit ASCII instead of converting to unicode encoding. See also Solution #3 and #4.
Cheers
Andi
Espen Harlinn at 28-Dec-12 7:39am
   
OP asked for ASCII, repeatedly ...
 
And as you wrote in your answer - you're doing a conversion to code page 1252, which is what OP actually needed, but it wasn't what he asked for.
Andreas Gieriet at 28-Dec-12 8:06am
   
Hello Espen,
I focussed more on his example code and felt that asking for ASCII ist wrong...
It's interesting though, that converting to ASCII results in removing diacritics (œ --> o) - that was new to me.
Cheers
Andi
Sergey Alexandrovich Kryukov at 27-Dec-12 22:21pm
   
Sorry, but won't work in this case. You probably answered formally, but did not look at the characters themselves. Please see the correct solution #4 and my comments.
(I did not vote this time.)
—SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

C# uses Unicode rather than ASCII to represent characters and strings.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 389
1 Maciej Los 180
2 Richard MacCutchan 140
3 DamithSL 129
4 Kornfeld Eliyahu Peter 119


Advertise | Privacy | Mobile
Web04 | 2.8.140709.1 | Last Updated 27 Dec 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid