Click here to Skip to main content
15,879,535 members
Articles / Programming Languages / C#

Convert Accented Characters to Simple Characters

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
12 Jul 2010CPOL 24.7K   4   9
How to convert accented characters to simple characters

I recently needed a way to replace accented characters with simple English ones to allow more readable friendly URLs. I'm sure there are plenty of Danes out there who are sick of seeing their language butchered by UrlEncode. After a bit of reading up, it seems .NET 2.0 does 99% of the heavy lifting for you:

C#
//using System.Text;

/// <summary>
/// Replaces Accented Characters with Closest Equivalents
/// </summary>
/// <param name="original">The original.</param>
/// <returns></returns>
/// <remarks>Based on code from:
/// http://blogs.msdn.com/b/michkap/archive/2007/05/14/2629747.aspx</remarks>
public static string ToSimpleCharacters(this string original)
{
    if (string.IsNullOrEmpty(original)) return string.Empty;
    string stFormD = original.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    for (int ich = 0; ich < stFormD.Length; ich++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
        if (uc != UnicodeCategory.NonSpacingMark)
        {
            if (Lookup.ContainsKey(stFormD[ich]))
            {
                sb.Append(Lookup[stFormD[ich]]);
            }
            else
            {
                sb.Append(stFormD[ich]);
            }
        }
    }

    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

private static Dictionary<char, string> _lookup;
private static Dictionary<char, string> Lookup
{
    get
    {
        if (_lookup == null)
        {
            _lookup = new Dictionary<char, string>();
            _lookup[char.ConvertFromUtf32(230)[0]] = "ae";//_lookup['æ']="ae";
            _lookup[char.ConvertFromUtf32(198)[0]] = "Ae";//_lookup['Æ']="Ae";
            _lookup[char.ConvertFromUtf32(240)[0]] = "d";//_lookup['ð']="d";
        }
        return _lookup;
    }
}

I’m sure that there must be a few substitutions that don’t get caught by this code. If you’ve got one, just drop me a line!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Freestyle Interactive Ltd
United Kingdom United Kingdom
I'm a lead developer for Freestyle Interactive Ltd where we create many wonderful websites built on Microsofts ASP.Net and Ektron CMS.

I've been developing .Net applications (both Windows and Web) since 2002.

Comments and Discussions

 
GeneralMy vote of 5 Pin
RogReis201020-Feb-13 21:04
RogReis201020-Feb-13 21:04 
QuestionStrange replacement Pin
JKos12-Jul-10 3:07
JKos12-Jul-10 3:07 
AnswerRe: Strange replacement Pin
Martin Jarvis12-Jul-10 3:13
Martin Jarvis12-Jul-10 3:13 
GeneralRe: Strange replacement Pin
JKos12-Jul-10 3:21
JKos12-Jul-10 3:21 
GeneralRe: Strange replacement Pin
Martin Jarvis12-Jul-10 3:28
Martin Jarvis12-Jul-10 3:28 
I did not know that! I've updated the article to substitute in a 'd' now.

Thanks

GeneralMy kludgy approach Pin
supercat929-Jun-10 5:09
supercat929-Jun-10 5:09 
GeneralGood but.. Pin
Anshul R28-Jun-10 15:15
Anshul R28-Jun-10 15:15 
GeneralRe: Good but.. Pin
Martin Jarvis28-Jun-10 19:37
Martin Jarvis28-Jun-10 19:37 
GeneralRe: Good but.. Pin
Anshul R29-Jun-10 1:58
Anshul R29-Jun-10 1:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.