Click here to Skip to main content
13,006,020 members (84,194 online)
Click here to Skip to main content
Add your own
alternative version


3 bookmarked
Posted 28 Nov 2010

Remove Diacritical Marks in a Unicode String

, 29 Nov 2010
Rate this:
Please Sign up or sign in to vote.
With a helper CharMap class using VC2010 C++0x implementation

This contribution comes from this forum question[^], and my
unefficient answer

So we want to remove some diacritical marks[^] in a Unicode string, for instance change occurrences of àáảãạăằắẳẵặâầấẩẫậ to plain a, with the help of C++0x[^] as implemented in VC2010.

For that let's define a C array of const wchar_t* with the first character being the replacement character and the next ones being the characters to replace:

// This CODE cannot get formatted by the CP editor
const wchar_t* pchangers[] =

The following CharMap class is constructed from a std::vector<std::wstring> of such strings and uses it to populate it's std::map<wchar_t, wchar_t> charmap member, with keys being characters after first and values being first character:
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
class CharMap
    std::map<wchar_t, wchar_t> charmap;
    CharMap(const std::vector<const std::wstring>& changers)
        std::for_each(changers.begin(), changers.end(), [&](const std::wstring& changer){
            std::transform(changer.begin() + 1, changer.end(), std::inserter(charmap, charmap.end()), [&](wchar_t wc){
                return std::make_pair(wc, changer[0]);});
    std::wstring operator()(const std::wstring& in)
        std::wstring out(in.length(), L'\0');
        std::transform(in.begin(), in.end(), out.begin(), [&](wchar_t wc) ->wchar_t {
            auto it = charmap.find(wc);
            return it == charmap.end() ? wc : it->second;});
        return out;
};  // class CharMap

The std::wstring CharMap::operator()(const std::wstring& in) constructs a std::wstring out from in, changing all characters to replace in in to their replacement character in out and returns out.

Now let's just put it at work:

#include <iostream>
std::vector<const std::wstring> changers(pchangers, pchangers + sizeof pchangers / sizeof (wchar_t*));
int main()
// This CODE cannot get formatted by the CP editor

std::wcout << CharMap(changers)(L" người mình.mp3 ") << std::endl;
// END unformatted CODE
    return 0;

Kind of demonstration of the power of C++0x isn't it?

If you have pasting problems with Unicode strings, download the full code (1 KB).



This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Alain Rist
France France
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralIt really works. thanks! Pin
Randy Walles29-Jun-11 22:30
memberRandy Walles29-Jun-11 22:30 
GeneralThe sample pchangers does not include all possible cases. If... Pin
Alain Rist28-Jun-11 23:12
memberAlain Rist28-Jun-11 23:12 
GeneralI've downloaded full code archive, but it didn't help me. Ma... Pin
Randy Walles28-Jun-11 21:50
memberRandy Walles28-Jun-11 21:50 
GeneralRe: The sample pchangers does not include all possible cases. If... Pin
Alain Rist28-Jun-11 23:13
memberAlain Rist28-Jun-11 23:13 
GeneralMessage Removed Pin
_beauw_29-Nov-10 18:13
member_beauw_29-Nov-10 18:13 
GeneralRe: Very Useful Pin
Alain Rist29-Nov-10 23:58
memberAlain Rist29-Nov-10 23:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170628.1 | Last Updated 30 Nov 2010
Article Copyright 2010 by Alain Rist
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid