Click here to Skip to main content
15,868,055 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
It would be very much appreciable if someone give me solution for Converting
"HEX" to its equivalent "UTF-8".

For example :
I need to convert "’" into "’", here ’ is HEX Entity of "’" which is Entity of Single Quote.



Waiting for your valuable response .....!!!!
Posted
Updated 15-Apr-15 0:01am
v3
Comments
Sinisa Hajnal 15-Apr-15 6:21am    
You don't have to convert it. Its computer number is the same (assuming the same culture) its just 0x20 (hex space) vs 32 (decimal space) or FF vs 255. Just convert the number into decimal number system.
Er.Nikhil 15-Apr-15 6:32am    
I think you are not totally clear with what i have asked for help....Actually in file i am having Hexadecimal Entity (i.e 0x2019), which i need to convert to its equivalent UTF-8 (i.e ’)....
I need to change that entity to UTF-8 Characters.
Kindly suggest me solution for this Conversion
Sergey Alexandrovich Kryukov 15-Apr-15 9:25am    
Is it XML/HTML character entity or what? Is it Unicode code point?
Can it be Unicode code point beyond BMP (more than two bytes of data)?
—SA
Sergey Alexandrovich Kryukov 15-Apr-15 9:30am    
No, it's not true. You are talking about low values of Unicode code points. Even above 127 (above ASCII range), UTF-8 encoding is different from others.
—SA

1 solution

If this is Unicode code point, you can use the fact that its integer value using the usual integer representation is the same as UTF32-LE representation of the given character. So, you can use this function: https://msdn.microsoft.com/en-us/library/system.char.convertfromutf32%28v=vs.110%29.aspx.

Interestingly, this is not a character, but a string. This is related to the peculiarities of in-memory representation of strings in .NET: internally, UTF-16 is used, and the characters beyond BMP are represented as surrogate pairs. Formally, .NET representation deals with the surrogate pair as with two characters (which can lead to forming invalid string if you try to manipulate such string "manually" — never do it), even though, from the Unicode standpoint, this fragment of string represented by 4 bytes is really one character. You need to be careful with such cases, never operate such string on byte basics.

After you got the string (let's call it someString, you can use it or represent as UTF-8, which is always some array of bytes:
C#
int codePoint = //...
string someString = char.ConvertFromUtf32(codePoint);
byte[] utf8 = System.Text.Encoding.UTF8.GetBytes(someString);
Please see:
https://msdn.microsoft.com/en-us/library/system.text.encoding.utf8(v=vs.110).aspx,
https://msdn.microsoft.com/en-us/library/ds4kkd55(v=vs.110).aspx,
https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx.

—SA
 
Share this answer
 
v3

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900