Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C#
Hey friends!
 
I'm trying to create a function that adds diacritics to a character :
private static char AddDiacritics(char Letter, char Character)
{
   return (modified letter);
}
 
For example, if I call the function AddDiacritics('e', '\"'); I want the function to return the character ë.
If I call the function AddDiacritics('u', '^'); I want the function to return û
 
Is there some kind of support in the framework for this?
 
Thanks,
 
Eduard
Posted 31-Mar-11 5:52am
Edited 31-Mar-11 7:07am
Dalek Dave432.9K
v2
Comments
Dalek Dave at 31-Mar-11 12:07pm
   
Edited for Readability
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

AFAIK, there is no built in support for this.
 
Normally, it would be expected that the user keyboard takes care of it when the info is entered. If you have to retrospectively work it, it becomes quite fiddly, as you will have to take account of which characters can have which diacritic added.
  Permalink  
Comments
Olivier Levrey at 31-Mar-11 11:34am
   
You are right. There is no support for this, and there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one...
Dalek Dave at 31-Mar-11 12:08pm
   
True
Eduard Keilholz at 1-Apr-11 2:37am
   
Thanks for your answer, my 5
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

I'm not aware of an inbuilt solution for this. Normally such a thing would be done by the IME as Griff says.
 
You probably need to create a lookup table, which is logically a two-dimensional map and which you probably would implement as Dictionary<char, Dictionary<char,char>>. Populate it in a static constructor like so:
Dictionary<char, Dictionary<char,char>> lookupTable = new Dictionary<char, Dictionary<char,char>>;
static MyClass(){
 lookupTable['a'] = new Dictionary<char,char>();
 lookupTable['a']['"'] = 'ä';
 lookupTable['a']['o'] = 'å';
 // ... etc
 
I'm not sure if there is even a standard set of diacritical marks from the ASCII set, and there's not much point using the ones in Unicode since if the user can enter those, they can enter the character they want directly. Those you need for ANSI (Western Europe) are grave (è), acute (é), umlaut (ä), circumflex (â), ring (å), slash (ø), caron (š), tilde (ñ) and cedila (ç); sensible characters to use to mark them would be `, ', ", ^, o, v, ~ and c. In the ANSI set there are also eth and thorn, which are not really diacritics of a normal character; oe and ae diphthongs; and the German ß, to think about.
  Permalink  
v2
Comments
Olivier Levrey at 31-Mar-11 11:55am
   
Have my 5. I was writing my solution at the same time but you were faster than me ;)
Dalek Dave at 31-Mar-11 12:08pm
   
Good answer
Eduard Keilholz at 1-Apr-11 2:35am
   
Thanks for helping, my 5
Björn Ranft at 6-Mar-12 7:22am
   
Elegant solution, my 5!
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

OriginalGriff is right there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one.
 
You could use an enum to list all diacritics, and dictionaries for each supported letter. For example:
 
enum Diacritic
{
    //"é"
    Acute,
    //"è"
    Grave,
    //"ê"
    Circonflexe,
    ...
}
 
//supported letters for e
static Dictionary<Diacritic, char> letterE;
//supported letters for a
static Dictionary<Diacritic, char> letterA;
//supported letters
static Dictionary<char, Dictionary<Diacritic, char>> letters;
 
//call this function to initialize dictionaries
//or put its code it a static constructor
static void Init()
{
    //all possibilities for "e"
    letterE = new Dictionary<Diacritic, char>();
    letterE[Diacritic.Acute] = 'é';
    letterE[Diacritic.Grave] = 'è';
    letterE[Diacritic.Circonflexe] = 'ê';
    ...
 
    //all possibilities for "a"
    letterA = new Dictionary<Diacritic, char>();
    letterA[Diacritic.Grave] = 'à';
    letterA[Diacritic.Circonflexe] = 'â';
    ...
 
    //supported letters
    letters = new Dictionary<char, Dictionary<Diacritic, char>>();
    letters.Add('a', letterA);
    letters.Add('e', letterE);
    ...
}
 
//will throw a KeyNotFoundException if the requested character doesn't exist
static char AddDiacritics(char letter, Diacritic diacritic)
{
    return letters[letter][diacritic];
}
 
This code is not very elegant but I am not sure if a more elegant solution exists for that problem...
  Permalink  
v2
Comments
Dalek Dave at 31-Mar-11 12:08pm
   
Good Call.
Eduard Keilholz at 1-Apr-11 2:35am
   
Thanks for helping, my 5
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

As Original Griff said there is no built in support. As others said can do using a lookup table. Got this from net, a handy class
 
public class DiacritMerger
{
    static readonly Dictionary<char, char> _lookup = new Dictionary<char, char>
                     {
                         {'\'', '\u0301'},
                         {'"', '\u0308'},
                         {'^', '\u0302'}
                     };
    public static string AddDiacritics(string asciiBase, string diacrits)
    {
        var combined = asciiBase.Zip(diacrits, (ascii, diacrit) => DiacritVersion(diacrit, ascii));
        return new string(combined.ToArray());
    }
    private static char DiacritVersion(char diacrit, char character)
    {
        char combine;
        return _lookup.TryGetValue(diacrit, out combine) ? new string(new[] { character, combine }).Normalize()[0] : character;
    }
}
 

Then can use like
 
MessageBox.Show(DiacritMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacritMerger.AddDiacritics("e", "\""));
  Permalink  
Comments
Dalek Dave at 31-Mar-11 12:09pm
   
good Answer.
Albin Abel at 31-Mar-11 12:24pm
   
Thanks Dalek Dave
Eduard Keilholz at 1-Apr-11 2:37am
   
Thanks a lot! My 5
Albin Abel at 1-Apr-11 3:37am
   
You are welcome, Thanks too Eduard Keilholz
Eduard Keilholz at 4-Apr-11 8:10am
   
Wow, I implemented this code, it works like a charm! Thanks once more!
Albin Abel at 4-Apr-11 9:44am
   
You are most welcome Eduard Keilholz. Let the credit goes to the original author :)
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

Although this is not really a solution to the problem, it might be usefull to know and might possibly even help build the table used by other solutions.
 
A string can be normalized to different form. In my case, I want to do the opposite (remove diacritics), and I have used:
 
string stFormD = fileName.Normalize(NormalizationForm.FormD);
for (int ich = 0; ich < stFormD.Length; ich++)
{
    char currentChar = stFormD[ich];
 
    System.Globalization.UnicodeCategory uc =
        System.Globalization.CharUnicodeInfo.GetUnicodeCategory(currentChar);
 
    if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
    {
        //...
    }
}
 
I don't know but maybe it would be possible to try for each possible Char to normalize it (the corresponding string) to the "D" form and then build information from that.
 
However, I haven't found a way to split character like œ to oe so in my case, I handle a few characters manually.
  Permalink  
v2
Comments
Eduard Keilholz at 19-Jul-11 6:00am
   
Good contribution! I like people actually reading the forum's history finding a way to they're solution and even more important, share!
 
Thanks a lot!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 386
1 Marcin Kozub 225
2 Sergey Alexandrovich Kryukov 215
3 /\jmot 189
4 Praneet Nadkar 173
0 OriginalGriff 8,289
1 Sergey Alexandrovich Kryukov 7,407
2 DamithSL 5,624
3 Maciej Los 4,989
4 Manas Bhardwaj 4,986


Advertise | Privacy | Mobile
Web03 | 2.8.1411023.1 | Last Updated 18 Jul 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100