Click here to Skip to main content
12,397,805 members (55,384 online)
Rate this:
 
Please Sign up or sign in to vote.
See more: C#
Hey friends!

I'm trying to create a function that adds diacritics to a character :
private static char AddDiacritics(char Letter, char Character)
{
   return (modified letter);
}

For example, if I call the function AddDiacritics('e', '\"'); I want the function to return the character ë.
If I call the function AddDiacritics('u', '^'); I want the function to return û

Is there some kind of support in the framework for this?

Thanks,

Eduard
Posted 31-Mar-11 4:52am
Updated 31-Mar-11 6:07am
Dalek Dave433.4K
v2
Comments
Dalek Dave 31-Mar-11 12:07pm
   
Edited for Readability
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

AFAIK, there is no built in support for this.

Normally, it would be expected that the user keyboard takes care of it when the info is entered. If you have to retrospectively work it, it becomes quite fiddly, as you will have to take account of which characters can have which diacritic added.
  Permalink  
Comments
Olivier Levrey 31-Mar-11 11:34am
   
You are right. There is no support for this, and there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one...
Dalek Dave 31-Mar-11 12:08pm
   
True
Eduard Keilholz 1-Apr-11 2:37am
   
Thanks for your answer, my 5
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

I'm not aware of an inbuilt solution for this. Normally such a thing would be done by the IME as Griff says.

You probably need to create a lookup table, which is logically a two-dimensional map and which you probably would implement as Dictionary<char, Dictionary<char,char>>. Populate it in a static constructor like so:
Dictionary<char, Dictionary<char,char>> lookupTable = new Dictionary<char, Dictionary<char,char>>;
static MyClass(){
 lookupTable['a'] = new Dictionary<char,char>();
 lookupTable['a']['"'] = 'ä';
 lookupTable['a']['o'] = 'å';
 // ... etc

I'm not sure if there is even a standard set of diacritical marks from the ASCII set, and there's not much point using the ones in Unicode since if the user can enter those, they can enter the character they want directly. Those you need for ANSI (Western Europe) are grave (è), acute (é), umlaut (ä), circumflex (â), ring (å), slash (ø), caron (š), tilde (ñ) and cedila (ç); sensible characters to use to mark them would be `, ', ", ^, o, v, ~ and c. In the ANSI set there are also eth and thorn, which are not really diacritics of a normal character; oe and ae diphthongs; and the German ß, to think about.
  Permalink  
v2
Comments
Olivier Levrey 31-Mar-11 11:55am
   
Have my 5. I was writing my solution at the same time but you were faster than me ;)
Dalek Dave 31-Mar-11 12:08pm
   
Good answer
Eduard Keilholz 1-Apr-11 2:35am
   
Thanks for helping, my 5
Björn Ranft 6-Mar-12 7:22am
   
Elegant solution, my 5!
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 3

OriginalGriff is right there is no algorithm able to deal with all cases. The only option is to deal with all the possibilities one by one.

You could use an enum to list all diacritics, and dictionaries for each supported letter. For example:

enum Diacritic
{
    //"é"
    Acute,
    //"è"
    Grave,
    //"ê"
    Circonflexe,
    ...
}
 
//supported letters for e
static Dictionary<Diacritic, char> letterE;
//supported letters for a
static Dictionary<Diacritic, char> letterA;
//supported letters
static Dictionary<char, Dictionary<Diacritic, char>> letters;
 
//call this function to initialize dictionaries
//or put its code it a static constructor
static void Init()
{
    //all possibilities for "e"
    letterE = new Dictionary<Diacritic, char>();
    letterE[Diacritic.Acute] = 'é';
    letterE[Diacritic.Grave] = 'è';
    letterE[Diacritic.Circonflexe] = 'ê';
    ...
 
    //all possibilities for "a"
    letterA = new Dictionary<Diacritic, char>();
    letterA[Diacritic.Grave] = 'à';
    letterA[Diacritic.Circonflexe] = 'â';
    ...
 
    //supported letters
    letters = new Dictionary<char, Dictionary<Diacritic, char>>();
    letters.Add('a', letterA);
    letters.Add('e', letterE);
    ...
}
 
//will throw a KeyNotFoundException if the requested character doesn't exist
static char AddDiacritics(char letter, Diacritic diacritic)
{
    return letters[letter][diacritic];
}

This code is not very elegant but I am not sure if a more elegant solution exists for that problem...
  Permalink  
v2
Comments
Dalek Dave 31-Mar-11 12:08pm
   
Good Call.
Eduard Keilholz 1-Apr-11 2:35am
   
Thanks for helping, my 5
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 4

As Original Griff said there is no built in support. As others said can do using a lookup table. Got this from net, a handy class

public class DiacritMerger
{
    static readonly Dictionary<char, char> _lookup = new Dictionary<char, char>
                     {
                         {'\'', '\u0301'},
                         {'"', '\u0308'},
                         {'^', '\u0302'}
                     };
    public static string AddDiacritics(string asciiBase, string diacrits)
    {
        var combined = asciiBase.Zip(diacrits, (ascii, diacrit) => DiacritVersion(diacrit, ascii));
        return new string(combined.ToArray());
    }
    private static char DiacritVersion(char diacrit, char character)
    {
        char combine;
        return _lookup.TryGetValue(diacrit, out combine) ? new string(new[] { character, combine }).Normalize()[0] : character;
    }
}


Then can use like

MessageBox.Show(DiacritMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacritMerger.AddDiacritics("e", "\""));
  Permalink  
Comments
Dalek Dave 31-Mar-11 12:09pm
   
good Answer.
Albin Abel 31-Mar-11 12:24pm
   
Thanks Dalek Dave
Eduard Keilholz 1-Apr-11 2:37am
   
Thanks a lot! My 5
Albin Abel 1-Apr-11 3:37am
   
You are welcome, Thanks too Eduard Keilholz
Eduard Keilholz 4-Apr-11 8:10am
   
Wow, I implemented this code, it works like a charm! Thanks once more!
Albin Abel 4-Apr-11 9:44am
   
You are most welcome Eduard Keilholz. Let the credit goes to the original author :)
Antonio Barros 26-Mar-16 10:42am
   
Hi, Albin Abel.
Thank you for your post. It was useful to undertand how we can resolve this problem. I've tried to add yo your class the use of tilde above a "w", but I couldn't obtain the result, I don't know why. I only add the last line of the following:

static readonly Dictionary _lookup = new Dictionary
{
{'\'', '\u0301'},
{'"', '\u0308'},
{'^', '\u0302'},
{'˜', '\u02DC'}
};
and to test I add the last line in the following:
MessageBox.Show(DiacriticMerger.AddDiacritics("u", "^"));
MessageBox.Show(DiacriticMerger.AddDiacritics("e", "\""));
MessageBox.Show(DiacriticMerger.AddDiacritics("w", "˜"));
But the message only displays "w" without the tilde.
What could be the problem? Thank you for all.

António Barros
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 5

Although this is not really a solution to the problem, it might be usefull to know and might possibly even help build the table used by other solutions.

A string can be normalized to different form. In my case, I want to do the opposite (remove diacritics), and I have used:

string stFormD = fileName.Normalize(NormalizationForm.FormD);
for (int ich = 0; ich < stFormD.Length; ich++)
{
    char currentChar = stFormD[ich];
 
    System.Globalization.UnicodeCategory uc =
        System.Globalization.CharUnicodeInfo.GetUnicodeCategory(currentChar);
 
    if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
    {
        //...
    }
}

I don't know but maybe it would be possible to try for each possible Char to normalize it (the corresponding string) to the "D" form and then build information from that.

However, I haven't found a way to split character like œ to oe so in my case, I handle a few characters manually.
  Permalink  
v2
Comments
Eduard Keilholz 19-Jul-11 6:00am
   
Good contribution! I like people actually reading the forum's history finding a way to they're solution and even more important, share!

Thanks a lot!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Mobile
Web02 | 2.8.160721.1 | Last Updated 18 Jul 2011
Copyright © CodeProject, 1999-2016
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100