Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ STL

The problem is that utf16_codecvt methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it.

I have tested to make sure that it has the facet (utf16_codecvt) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate> methods.

Any ideas?

class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>
{
    ...//
};

void MyTestFunc()
{
    ... //
    std::wifstream myFile;
    std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
    myFile.imbue(myLoc);
    myFile.open(pFileName, std::ios::in | std::ios::binary);
    ... //
    myFile.read(bom_buffer, 1);
    ... //
}

The following link gives an example of the types of things I am trying to do:
April 01, 1999 - Unicode Files - P.J. Plauger http://www.ddj.com/cpp/184403638?pgno=1[^]
Posted 27-Jun-09 9:35am
Edited 26-Nov-09 10:56am
v2

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters.

By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.

class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
{
   typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
   typedef char16_t ElemT;
   typedef char ByteT;
   virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
      const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
      ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
   {	// convert bytes [_First1, _Last1) to [_First2, _Last)
      return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
   }

   virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
      const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
      ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
   {	// convert [_First1, _Last1) to bytes [_First2, _Last)
      return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
   }

   virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
      ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
   {	// generate bytes to return to default shift state
      return Base::do_unshift(s, _First2, _Last2, _Mid2);
   }

   virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
      const ByteT*_Last1, size_t _Count) const
   {	// return min(_Count, converted length of bytes [_First1, _Last1))
      return Base::do_length(s, _First1, _Last1, _Count);
   }
};


So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ Frown | :(
  Permalink  
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 520
1 mhegazy94 460
2 Ravi Bhavnani 230
3 Kornfeld Eliyahu Peter 185
4 Shemeemsha RA 160
0 Sergey Alexandrovich Kryukov 7,205
1 OriginalGriff 6,801
2 CPallini 5,350
3 George Jonsson 3,644
4 Gihan Liyanage 2,797


Advertise | Privacy | Mobile
Web02 | 2.8.140922.1 | Last Updated 26 Nov 2009
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100