Click here to Skip to main content
15,896,153 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i want to find hex number of each char of string

i use two method (substring and arry )

here it is the problem
string = "ههه"
in sub string = "ه" + "ه" + "ه" ?????
and have same hex number what can i do to find char hex number
Posted
Updated 20-Jun-12 5:14am
v2
Comments
Sergey Alexandrovich Kryukov 20-Jun-12 11:17am    
I can easily find it out, if you confirm: did you look at this page properly after you done this post? Does your text look correct? This page is in UTF-8...
--SA
red_ostad 20-Jun-12 11:55am    
yes its correct
Sergey Alexandrovich Kryukov 20-Jun-12 11:42am    
Anyway, I answered... please make sure you can use my instructions to find such things by yourself next time...
--SA
red_ostad 20-Jun-12 11:56am    
link plz

1 solution

This is U+0647, 'ه', Arabic Letter "Heh".

This is the trick of Arabic writing system fully implemented in Windows and, to best of my knowledge, in nearly all modern OS: if you put some Arabic letters together, they change their glyph to form proper connections with each other. Look: "ههه". This is the same Arabic "Heh" repeated three times, even though this string looks like three different letters.

Now, you may ask: how to find it out? After all, you are not going ask me about every single character, are you?

To do this, you should understand how Unicode and UTFs work. Unicode is nothing like 16-bit encoding like may mistakenly think, rather, this is a standard defining one-to-one correspondence between "characters", understood as cultural entities abstracted from their glyph forms, and integer numbers, understood as abstract mathematical integer numbers, without any concern about their bit size or computer presentation. Those numbers are called "code points". So, the core Unicode does not define encoding. Encodings can be different and are defined by UTFs. Now, this page is in UTF-8, which is a byte code with variable size per character. So, if I simply tried to read your UTF-8 text in binary form, it would be hard to recognize the code points in this code. The only straightforward encoding with one-to-one correspondence between the encoding words and code points is UTF-64 (UTF-64LE or UTF-64RE). However, I knew for sure, that all Arabic subset lies in the BMP (Based Multilingual Plane) with code points within first 17 bits. Few more 16-bit places exists, but Arabic language is way too popular for that; the extra plains are reserved form much more exotic writing systems.

So, I copies your text in the text file and saved it as UTF-16LE (in Window jargon, it is called "Unicode files", but in fact this is UTF-16LE), opened it in the binary editors and recognized 4 identical Arabic code points U+0647. To find out what is it, I used the Windows application Character Map (Charmap.EXE) bundled with every version of Windows. It provided me with the information on the subset ("Unicode sub-range") of Arabic writing system (using code points U+06XX) and the information on this character.

Learn about it:

http://en.wikipedia.org/wiki/Unicode[^],
http://unicode.org/[^];

http://en.wikipedia.org/wiki/Code_point[^],
http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane[^],
http://en.wikipedia.org/wiki/UTF[^],
http://unicode.org/faq/utf_bom.html[^].

—SA
 
Share this answer
 
v4
Comments
red_ostad 20-Jun-12 11:50am    
thank but i want to quick help for find unicode number of each one .
i want to make script to revers arabic char to use in none right to left support
program . coud i find any code fo changed glyph chr . i mean if it changed it have new unicode char in character map . i use it before i ask this question .
Sergey Alexandrovich Kryukov 20-Jun-12 12:34pm    
What do you mean "but"? I gave you not the exact recipe, I explained how it works. If you have any questions, don't hesitate to ask.
There is not such thing as "glyph character"!
When the glyph is changed, the code point is not changed. You don't actually have "new Unicode char", can you finally get it or not? In other words, the glyph in text depends on both code point and the location of the character in the text. I clearly demonstrated it to you! You can invoke Browser search to see that all 4 characters are the same. After all, read it all more thoroughly. It's really not so simple, needs some effort to understand.

If you can explain what would be the exact effect you want to achieve, I will be able to help you more; but don't forget to explain why.

--SA
VJ Reddy 20-Jun-12 13:16pm    
Good answer with references. 5!
Sergey Alexandrovich Kryukov 20-Jun-12 13:31pm    
Thank you, VJ.
Do you have any idea how to finally explain those not-so-trivial things to OP?
--SA
red_ostad 20-Jun-12 20:29pm    
hello again and tnx for your Patience .
i dont hesitate to ask . i explain to you what exactly i want to do
i write code that revers the string of textbox to another text box in form
for example you type "abcdefg" and it convert to "gfedcba" .
but i found the problem here , in arabic char when i want to revers
the answer is wrong . for example input is "سلام" output is "مالس"
but i want to see "ﻡﺎﻟﺳ" (i type it whit character map)
you say (You don't actually have "new Unicode char")
but i see defrent hexa cod and unicode here . i know that i dont know
and its my tray . dont be angry with noob gay like me Even i dont attention
enough . and my eng lang is not pro :D .
and finally i tray to use Persian subtitle un none right to left support video editors . because if u use revers typing its work !
i use this code until now . best regard

namespace unicodex
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

private void textBox1_TextChanged(object sender, EventArgs e)
{
textBox2.Text = Reverse(textBox1.Text);
label1.Text = hexacod(textBox1.Text);
}
public string Reverse(string str)
{
int len = str.Length;
string arrr="";
char[] arr = new char[len];

for (int i = 0; i < len; i++)
{


arr[i] = (char)((int)(str[len - 1 - i]));
arrr = arrr + " , " + arr[i];

}


return new string(arr);


}
public string hexacod(string str)
{
int len = str.Length;
string arr = "";
for (int i = 0; i < len; i++)
{
arr = arr+" , "+((int)(str[len - 1 - i])).ToString ();
}
return arr;
}

private void textBox2_TextChanged(object sender, EventArgs e)
{
label2.Text = hexacod(textBox2.Text);
}

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900