Click here to Skip to main content
12,402,889 members (85,032 online)
Rate this:
Please Sign up or sign in to vote.
See more: C++ STL

The problem is that utf16_codecvt methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it.

I have tested to make sure that it has the facet (utf16_codecvt) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate> methods.

Any ideas?

class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>

void MyTestFunc()
    ... //
    std::wifstream myFile;
    std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
    myFile.imbue(myLoc);, std::ios::in | std::ios::binary);
    ... //, 1);
    ... //

The following link gives an example of the types of things I am trying to do:
April 01, 1999 - Unicode Files - P.J. Plauger[^]
Posted 27-Jun-09 9:35am
Updated 26-Nov-09 10:56am

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters.

By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.

class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
   typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
   typedef char16_t ElemT;
   typedef char ByteT;
   virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
      const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
      ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
   {	// convert bytes [_First1, _Last1) to [_First2, _Last)
      return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);

   virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
      const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
      ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
   {	// convert [_First1, _Last1) to bytes [_First2, _Last)
      return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);

   virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
      ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
   {	// generate bytes to return to default shift state
      return Base::do_unshift(s, _First2, _Last2, _Mid2);

   virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
      const ByteT*_Last1, size_t _Count) const
   {	// return min(_Count, converted length of bytes [_First1, _Last1))
      return Base::do_length(s, _First1, _Last1, _Count);

So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ Frown | :(

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy | Mobile
Web02 | 2.8.160721.1 | Last Updated 26 Nov 2009
Copyright © CodeProject, 1999-2016
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100