Click here to Skip to main content
Click here to Skip to main content

UTF16 to UTF8 to UTF16 simple CString based conversion

By , 16 May 2008
 

Introduction

For conversion of strings between UTF8 and UTF16 (as well as other formats), Microsoft gives us the MultiByteToWideChar and WideCharToMultiByte functions. These functions use null terminated char/widechar based strings. Use of those strings requires a bit of memory management, and if you use the functions extensively, your code may end up looking like a complete mess. That's why I decided to wrap these two functions for use with the more coder-friendly CString types.

The conversion functions

UTF16toUTF8

CStringA UTF16toUTF8(const CStringW& utf16)
{
   CStringA utf8;
   int len = WideCharToMultiByte(CP_UTF8, 0, utf16, -1, NULL, 0, 0, 0);
   if (len>1)
   { 
      char *ptr = utf8.GetBuffer(len-1);
      if (ptr) WideCharToMultiByte(CP_UTF8, 0, utf16, -1, ptr, len, 0, 0);
      utf8.ReleaseBuffer();
   }
   return utf8;
}

UTF8toUTF16

CStringW UTF8toUTF16(const CStringA& utf8)
{
   CStringW utf16;
   int len = MultiByteToWideChar(CP_UTF8, 0, utf8, -1, NULL, 0);
   if (len>1)
   { 
      wchar_t *ptr = utf16.GetBuffer(len-1);
      if (ptr) MultiByteToWideChar(CP_UTF8, 0, utf8, -1, ptr, len);
      utf16.ReleaseBuffer();
   }
   return utf16;
}

Using the code

Use of the two helper functions is straightforward. But, do note that they are only useful if your project is set to use the UNICODE character set. The functions also only work in Visual Studio 7.1 or above. If you use Visual Studio 6.0, you won't be able to compile because you miss CStringA and CStringW. In the following code snippet, you have a usage example:

CStringW utf16("òèçùà12345");
CStringA utf8 = UTF16toUTF8(utf16);
CStringW utf16_2 = UTF8toUTF16(utf8);

History

After a comment by Ivo Beltchev, I decided to change the functions as he suggested. Initially, I designed the functions like this:

CStringA UTF16toUTF8(const CStringW& utf16)
{
  LPSTR pszUtf8 = NULL;
  CStringA utf8("");

  if (utf16.IsEmpty()) 
    return utf8; //empty imput string

  size_t nLen16 = utf16.GetLength();
  size_t nLen8 = 0;

  if ((nLen8 = WideCharToMultiByte (CP_UTF8, 0, utf16, nLen16, 
                                    NULL, 0, 0, 0) + 2) == 2)
    return utf8; //conversion error!

  pszUtf8 = new char [nLen8];
  if (pszUtf8)
  {
    memset (pszUtf8, 0x00, nLen8);
    WideCharToMultiByte(CP_UTF8, 0, utf16, nLen16, pszUtf8, nLen8, 0, 0);
    utf8 = CStringA(pszUtf8);
  }

  delete [] pszUtf8;
  return utf8; //utf8 encoded string
}

CStringW UTF8toUTF16(const CStringA& utf8)
{
  LPWSTR pszUtf16 = NULL;
  CStringW utf16("");
  
  if (utf8.IsEmpty()) 
    return utf16; //empty imput string

  size_t nLen8 = utf8.GetLength();
  size_t nLen16 = 0;

  if ((nLen16 = MultiByteToWideChar (CP_UTF8, 0, utf8, nLen8, NULL, 0)) == 0)
    return utf16; //conversion error!

  pszUtf16 = new wchar_t[nLen16];
  if (pszUtf16)
  {
    wmemset (pszUtf16, 0x00, nLen16);
    MultiByteToWideChar (CP_UTF8, 0, utf8, nLen8, pszUtf16, nLen16);
    utf16 = CStringW(pszUtf16);
  }

  delete [] utf16;
  return utf16; //utf16 encoded string
}

These functions work just as well, but the latter versions are smaller and a bit optimized. Thanks to Ivo for the observation!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

John Paul Pirau
Software Developer (Senior)
Romania Romania
Member
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionTraditional Chinese characters aren’t being read from network streammemberMember 864850822 Mar '12 - 17:53 
GeneralMy vote of 3memberDezhi Zhao13 Jan '11 - 4:55 
GeneralEven more elegant!memberElmue23 Aug '08 - 10:37 
GeneralRe: Even more elegant! [modified]memberJohn Paul Pirau4 Sep '08 - 2:25 
GeneralYes it is more elegant and it works!memberElmue9 Sep '08 - 16:05 
QuestionSince when CStringA is UTF-8?memberWong Shao Voon19 May '08 - 22:13 
AnswerRe: Since when CStringA is UTF-8?memberJohn Paul Pirau19 May '08 - 23:03 
AnswerRe: Since when CStringA is UTF-8?memberNemanja Trifunovic29 May '08 - 4:02 
GeneralRe: Since when CStringA is UTF-8?memberJohn Paul Pirau2 Jun '08 - 23:26 
GeneralNo need for a temporary buffermemberIvo Beltchev17 May '08 - 8:48 
GeneralRe: No need for a temporary buffermemberJohn Paul Pirau19 May '08 - 2:35 
QuestionRe: No need for a temporary buffermemberguoxuran21 Jun '09 - 17:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130516.1 | Last Updated 16 May 2008
Article Copyright 2008 by John Paul Pirau
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid