Click here to Skip to main content
15,861,172 members
Articles / Programming Languages / C#

Working with Strings with Combining Characters

Rate me:
Please Sign up or sign in to vote.
4.50/5 (8 votes)
30 Mar 2010CPL3 min read 25K   4   5
Like diacritics in Arabic, Hebrew, etc.

This article was previously published on my blog, Just Like a Magic.

Contents

Introduction

In some languages, like Arabic and Hebrew, you combine some characters with combining characters based on the pronunciation of the word. Combining characters are characters (like diacritics, etc.) that are combined with base characters to change the pronunciation of the word (sometimes called vocalization.) Some examples of combining characters are diacritics:

 Base CharacterCombining Character(s)Result
1
Combining a single character

Arabic Letter Teh
Arabic Letter Teh
0x062A 

Arabic Damma
Arabic Damma
0x064F
Arabic Letter Teh + Damma.gif
Letter Teh + Damma
2
Combining two characters
Arabic Letter Teh
Arabic Letter Teh
0x062A

Arabic Shadda
Arabic Shadda
0x0651

Arabic Fathatan
Arabic Fathatan
0x064B

Arabic Letter Teh + Shadda + Fathatan
Letter Teh + Shadda + Fathatan

When you combine a character with another one, then you end up with two characters. When you combine two characters with a base one, you end up with 3 characters combined in one, and so on.

Enumerating a String with Base Characters

Now we are going to try an example. This example uses a simple word,Word Muhammad (Mohammad; the name of the Islam prophet.)

Word Muhammad Details

This word (with the diacritics) consists of 9 characters, sequentially as follows:

  1. Meem
  2. Damma (a combining character combined with the previous Meem)
  3. Kashida
  4. Hah
  5. Meem
  6. Shadda (a combining character)
  7. Fatha (a combining character both Shadda and Fatha are combined with the Meem)
  8. Kashida
  9. Dal

After characters are combined with their bases, we end up with 6 characters sequentially as follows:

  1. Meem (have a Damma above)
  2. Kashida
  3. Hah
  4. Meem (have a Shadda and a Fatha above)
  5. Kashida
  6. Dal

The following code simply enumerates the string and displays a message box with each character along with its index:  

C#
string someName = "مُـحمَّـد";

for (int i = 0; i < someName.Length; i++)
    MessageBox.Show(string.Format("{0}t{1}", someName[i]));

What we get? When enumerating the string, we enumerate its base characters only.

Enumerating a String with Combining Characters

.NET Framework provides a way for enumerating strings with combining characters, it is via the TextElementEnumerator and StringInfo types (both reside in namespace System.Globalization.) The following code demonstrates how you can enumerate a string along with its combining characters:

C#
string someName = "مُـحمَّـد";

TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(someName);

while (enumerator.MoveNext())
    MessageBox.Show(string.Format("{0}t{1}", enumerator.ElementIndex, enumerator.Current));

Comparing Strings

Sometimes, you will be faced with a situation where you need to compare two identical strings which differ only by their diacritics (combining characters) for instance. If you were to compare them using the common way (using String.Compare for instance), they would be different because of the combining characters.

To overcome this, you will need to use a special overload of String.Compare method: 

C#
string withCombiningChars = "مُـحمَّـد";
string withoutCombiningChars = "محمد";

Console.WriteLine(string.Compare(withCombiningChars,
    withoutCombiningChars) == 0 ? "Both strings are the same." : "The strings are different!");

if (string.Compare(withCombiningChars, 
    withoutCombiningChars, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreSymbols) == 0)
    Console.WriteLine("Both strings are the same.");
else
    Console.WriteLine("The strings are different!"); 

The Kashida ـ isn't of the Arabic alphabets. It's most likely be a space! So the option CompareOptions.IgnoreSymbols ignores it from comparison.

Writing Arabic Diacritics

The following table summarizes the Arabic diacritics and the keyboard shortcut for each character:

Unicode RepresentationCharacterNameShortcut
0x064BArabic FathatanFathatanShift + W
0x064CArabic DammatanDammatanShift + R
0x064DArabic KasratanKasratanShift + S
0x064EArabic FathaFathaShift + Q
0x064FArabic DammaDammaShift + E
0x0650Arabic KasraKasraShift + A
0x0651Arabic ShaddaShaddaShift + ~
0x0652Arabic SukunSukunShift + X

Using the Character Map Application

Microsoft Windows comes with an application that helps you in browsing the characters that a font supports. This application is called, Character Map.

You can access this application by typing charmap.exe into Run, or pressing Start->Programs->Accessories->System Tools->Character Map.

Try It Out!

Code examples for the reader to discover:

C#
A.

string someName = "مُـحمَّـد";

MessageBox.Show(StringInfo.GetNextTextElement(someName,2));


B.

string a = "Adam";
string b = "Ádam";

Console.WriteLine(string.Compare(a, b) == 0 ? "They are the same." : "No, They are different.");

// Also try changing the CultureInfo object
if (string.Compare(a, b, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
    Console.WriteLine("They are the same.");
else
    Console.WriteLine("No, They are different.");

License

This article, along with any associated source code and files, is licensed under The Common Public License Version 1.0 (CPL)


Written By
Technical Lead
Egypt Egypt
Mohammad Elsheimy is a developer, trainer, and technical writer currently hired by one of the leading fintech companies in Middle East, as a technical lead.

Mohammad is a MCP, MCTS, MCPD, MCSA, MCSE, and MCT expertized in Microsoft technologies, data management, analytics, Azure and DevOps solutions. He is also a Project Management Professional (PMP) and a Quranic Readings college (Al-Azhar) graduate specialized in Quranic readings, Islamic legislation, and the Arabic language.

Mohammad was born in Egypt. He loves his machine and his code more than anything else!

Currently, Mohammad runs two blogs: "Just Like [a] Magic" (http://JustLikeAMagic.com) and "مع الدوت نت" (http://WithdDotNet.net), both dedicated for programming and Microsoft technologies.

You can reach Mohammad at elsheimy[at]live[dot]com

Comments and Discussions

 
GeneralNot working in .Net 4.0 Pin
ahmad rabiei zadeh2-Jun-11 0:05
ahmad rabiei zadeh2-Jun-11 0:05 
GeneralThanks - شكرن Pin
ON7AMI18-Oct-10 21:00
ON7AMI18-Oct-10 21:00 
GeneralRe: Thanks - شكرن Pin
Mohammad Elsheimy19-Oct-10 6:04
Mohammad Elsheimy19-Oct-10 6:04 
GeneralCombiningchars Pin
@amino23-Mar-10 2:43
@amino23-Mar-10 2:43 
GeneralRe: Combiningchars Pin
Mohammad Elsheimy23-Mar-10 3:43
Mohammad Elsheimy23-Mar-10 3:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.