Click here to Skip to main content
Click here to Skip to main content

Convert Accented Characters to Simple Characters

By , 12 Jul 2010
 

I recently needed a way to replace accented characters with simple English ones to allow more readable friendly URLs. I'm sure there are plenty of Danes out there who are sick of seeing their language butchered by UrlEncode. After a bit of reading up, it seems .NET 2.0 does 99% of the heavy lifting for you:

 //using System.Text;
  
 /// <summary>
 /// Replaces Accented Characters with Closest Equivalents
 /// </summary>
 /// <param name="original">The original.</param>
 /// <returns></returns>
 /// <remarks>Based on code from: 
 /// http://blogs.msdn.com/b/michkap/archive/2007/05/14/2629747.aspx</remarks>
 public static string ToSimpleCharacters(this string original)
 {
     if (string.IsNullOrEmpty(original)) return string.Empty;
     string stFormD = original.Normalize(NormalizationForm.FormD);
     StringBuilder sb = new StringBuilder();
  
     for (int ich = 0; ich < stFormD.Length; ich++)
     {
         UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
         if (uc != UnicodeCategory.NonSpacingMark)
         {
             if (Lookup.ContainsKey(stFormD[ich]))
             {
                 sb.Append(Lookup[stFormD[ich]]);
             }
             else
             {
                 sb.Append(stFormD[ich]);
             }
         }
     }
  
     return (sb.ToString().Normalize(NormalizationForm.FormC));
 }
  
 private static Dictionary<char, string> _lookup;
 private static Dictionary<char, string> Lookup
 {
     get
     {
         if (_lookup == null)
         {
             _lookup = new Dictionary<char, string>();
             _lookup[char.ConvertFromUtf32(230)[0]] = "ae";//_lookup['æ']="ae";
             _lookup[char.ConvertFromUtf32(198)[0]] = "Ae";//_lookup['Æ']="Ae";
             _lookup[char.ConvertFromUtf32(240)[0]] = "d";//_lookup['ð']="d";
         }
         return _lookup;
     }
 } 

I’m sure that there must be a few substitutions that don’t get caught by this code. If you’ve got one, just drop me a line!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Martin Jarvis
Software Developer (Senior) Freestyle Interactive Ltd
United Kingdom United Kingdom
Member
I'm a lead developer for Freestyle Interactive Ltd where we create many wonderful websites built on Microsofts ASP.Net and Ektron CMS.
 
I've been developing .Net applications (both Windows and Web) since 2002.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberRogReis201020 Feb '13 - 21:04 
QuestionStrange replacementmemberJKos12 Jul '10 - 3:07 
AnswerRe: Strange replacementmemberMartin Jarvis12 Jul '10 - 3:13 
GeneralRe: Strange replacementmemberJKos12 Jul '10 - 3:21 
Aha, you're doing that visually.
 
Actually, it's much more like d, visually also.
 
In case, if I believe the WIKI article I mentioned earlier,
the Danes and Norwegeans actually write "d" in place of ETH nowadays.
 
Just my 5 cents, I hope I didn't stole you too much of your time Wink | ;-)
GeneralRe: Strange replacementmemberMartin Jarvis12 Jul '10 - 3:28 
GeneralMy kludgy approachmembersupercat929 Jun '10 - 5:09 
GeneralGood but..memberpranav9528 Jun '10 - 15:15 
GeneralRe: Good but..memberMartin Jarvis28 Jun '10 - 19:37 
GeneralRe: Good but..memberpranav9529 Jun '10 - 1:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 12 Jul 2010
Article Copyright 2010 by Martin Jarvis
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid