Click here to Skip to main content
Click here to Skip to main content

Convert Accented Characters to Simple Characters

, 12 Jul 2010
Rate this:
Please Sign up or sign in to vote.
I recently needed a way to replace accented characters with simple English ones to allow more readable friendly urls. I'm sure there are plenty of Danes out there who are sick of seeing their language butchered by UrlEncode... 

I recently needed a way to replace accented characters with simple English ones to allow more readable friendly URLs. I'm sure there are plenty of Danes out there who are sick of seeing their language butchered by UrlEncode. After a bit of reading up, it seems .NET 2.0 does 99% of the heavy lifting for you:

 //using System.Text;
  
 /// <span class="code-SummaryComment"><summary>
</span> /// Replaces Accented Characters with Closest Equivalents
 /// <span class="code-SummaryComment"></summary>
</span> /// <span class="code-SummaryComment"><param name="original">The original.</param>
</span> /// <span class="code-SummaryComment"><returns></returns>
</span> /// <span class="code-SummaryComment"><remarks>Based on code from: 
</span> /// http://blogs.msdn.com/b/michkap/archive/2007/05/14/2629747.aspx<span class="code-SummaryComment"></remarks>
</span> public static string ToSimpleCharacters(this string original)
 {
     if (string.IsNullOrEmpty(original)) return string.Empty;
     string stFormD = original.Normalize(NormalizationForm.FormD);
     StringBuilder sb = new StringBuilder();
  
     for (int ich = 0; ich < stFormD.Length; ich++)
     {
         UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
         if (uc != UnicodeCategory.NonSpacingMark)
         {
             if (Lookup.ContainsKey(stFormD[ich]))
             {
                 sb.Append(Lookup[stFormD[ich]]);
             }
             else
             {
                 sb.Append(stFormD[ich]);
             }
         }
     }
  
     return (sb.ToString().Normalize(NormalizationForm.FormC));
 }
  
 private static Dictionary<char, string> _lookup;
 private static Dictionary<char, string> Lookup
 {
     get
     {
         if (_lookup == null)
         {
             _lookup = new Dictionary<char, string>();
             _lookup[char.ConvertFromUtf32(230)[0]] = "ae";//_lookup['æ']="ae";
             _lookup[char.ConvertFromUtf32(198)[0]] = "Ae";//_lookup['Æ']="Ae";
             _lookup[char.ConvertFromUtf32(240)[0]] = "d";//_lookup['ð']="d";
         }
         return _lookup;
     }
 } 

I’m sure that there must be a few substitutions that don’t get caught by this code. If you’ve got one, just drop me a line!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Martin Jarvis
Software Developer (Senior) Freestyle Interactive Ltd
United Kingdom United Kingdom
I'm a lead developer for Freestyle Interactive Ltd where we create many wonderful websites built on Microsofts ASP.Net and Ektron CMS.
 
I've been developing .Net applications (both Windows and Web) since 2002.
Follow on   Twitter

Comments and Discussions

 
GeneralMy vote of 5 PinmemberRogReis201020-Feb-13 21:04 
QuestionStrange replacement PinmemberJKos12-Jul-10 3:07 
AnswerRe: Strange replacement PinmemberMartin Jarvis12-Jul-10 3:13 
GeneralRe: Strange replacement PinmemberJKos12-Jul-10 3:21 
GeneralRe: Strange replacement PinmemberMartin Jarvis12-Jul-10 3:28 
GeneralMy kludgy approach Pinmembersupercat929-Jun-10 5:09 
GeneralGood but.. Pinmemberpranav9528-Jun-10 15:15 
GeneralRe: Good but.. PinmemberMartin Jarvis28-Jun-10 19:37 
Thanks for the comment.
 
To be honest, I don't understand the ins and outs of the string normalization features of .Net which the example uses. During my research this approach was the only one I found that didn't require reams explicit of character mappings.
 
A more nitty gritty explaination of the .Net Characterset Normalization process is available in the Original Article[^].
 
Unfortunately, not all of the characters required in my case were converted, so I added a very basic lookup table to catch the three examples which I know about.

GeneralRe: Good but.. Pinmemberpranav9529-Jun-10 1:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140721.1 | Last Updated 12 Jul 2010
Article Copyright 2010 by Martin Jarvis
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid