Click here to Skip to main content
15,891,423 members
Please Sign up or sign in to vote.
3.67/5 (3 votes)
See more:
I read some data from the data file,which is a hex encoding UTF-16 . how can I convert characters such as Chinese and English
For example: 0626 2D4E 8765 2000 6500 6E00 6700 6C00 6900 7300 6800 and ☆ 中文 english
With part of the code:
Use Unicode Character Set
C++
CFile file(_T("bin.data"),CFile::modeRead);
    BYTE *pBuf;
    int nIndex=0;
    long position=0x328418;
    for(int k=0;k<1835;k++)
    {
        file.Seek(position, CFile::begin);
        pBuf=new BYTE[33];
        pBuf[32]=0;
        memset(pBuf,0,32);
        file.Read(pBuf,32);
        CString sResult=_T("");
        int numByte=20;
        int i;
        for(i=0;i<2;i+=2)
        {
            char temp[3];
            temp[2]='\0';
            CString s=_T("");
            sscanf(temp,"%02x%02x",pBuf[i+1],pBuf[i]);
            //s.Format(_T("%02x%02x"),pBuf[i+1],pBuf[i]);
            s=temp;
            sResult+=s;
        }
        m_comboxItemName.InsertString(nIndex,sResult);//a variable related CComboBox
        nIndex++;
        position+=0x450;
        sResult.Empty();
    }
    file.Close();
    UpdateData(false);
}

The result is garbled
Posted
Updated 7-Apr-11 19:13pm
v3
Comments
Sandeep Mewara 8-Apr-11 1:13am    
Always use PRE tags to format code part. It makes the question readable.

Each quartet in your example is a UTF-16/LE. So it is right to consider the first two hexes as the second byte and the second two hexes as the first.
But you store the "bytes" in a char[], you append to a CString.

Now here is the "problem". CString can be either:
- a manager of sequence of char (for ANSI projects) or
- a wchar_t sequence for unicode projects, and interprets a char* (or char[]) as "characters forming an ANSI string".

So, if your project is not set up as unicode, you are dealing with what the system consider an ANSI string (that you, but not the system, know to contain UTF16LE), and -if your project is setup as UNICODE- when giving the char[] to the CString (with += as you did) CString attempt an ... ANSI to UNICODE conversion, assuming that the ANSI codepage is the one of your system (simplified Chinese ?!?).
None of the two option is coherent.

To go way from this mess, try the following:
- Ensure your project is set-up as UNICODE (so that CString manages wchar_t)
- before giving to the CString the char[] containing the bytes, (you already know are UTF16) cast them as (wchar_t*) so that no other conversions are done.

Now CString is a real UTF16 string, that can be given to whatever windows API to display.
 
Share this answer
 
Comments
Sansan Fang 8-Apr-11 7:33am    
Thanks for your answer!I understand, and have modified, as follows:Don't know what mistakes, hope to correct to help me, I test in VS2010 and passed successfully.it turned to what I I want
C++
Thanks for your answer!I understand, and have modified, as follows:

Don't know what mistakes, hope to correct to help me, I  test  in  VS2010 and passed successfully.it turned to what I I want





CFile file(_T("elements.data"),CFile::modeRead);
      wchar_t *pBuf;
      int nIndex=0;
      long position=0x328418;
      for(int k=0;k<1835;k++)
      {
          file.Seek(position, CFile::begin);
          pBuf=new TCHAR[20];
          pBuf[20]=NULL;
          memset(pBuf,NULL,20);
          file.Read(pBuf,20);
          CString sResult=_T("");
          sResult+=pBuf;
          m_comboxItemName.InsertString(nIndex,sResult);//
          nIndex++;
          position+=0x450;
          sResult.Empty();
      }
      file.Close();
      UpdateData(false);
 
Share this answer
 
v3
Comments
Emilio Garavaglia 8-Apr-11 15:07pm    
Your code leaks memory (you allocate 20 TCHAR at every loop and never delete them)

Also memset(...) is wrong: 20 TCHARs are not 20 bytes long. You shold say 20*sizeof(TCHAR) instead.

But... if you need 20 TCHAR, why not just use a CString initialized to 20 as a size, get the buffer in it, write to it ad relase it?

See the methods of CString here: http://msdn.microsoft.com/en-us/library/aa300688%28v=vs.60%29.aspx
Sansan Fang 8-Apr-11 23:22pm    
Thank you so much, that is to say, my pBuf initialization errors.
I Should:
pBuf1=strRead.GetBuffer(20);
<pre lang="C++">
void CItemCodeGeneratorDlg::OnDropDownComboItem1()
{

    CFile file(_T("bin.data"),CFile::modeRead);
    wchar_t *pBuf1;
    CString strRead;
    int nMaxIndex1;
    long position1;
    long nGetCurSel=m_comboxItem0.GetCurSel();
    switch(nGetCurSel)
    {
    case -1: return;break;
    case 0:
    case 1:
    case 2: position1=nGetCurSel*0x48*3+0x327f44;nMaxIndex1=3;break;
        case 3: position1=0x48*9+0x327f44;nMaxIndex1=2;break;
            case 4: position1=0x48*11+0x327f44;nMaxIndex1=4;break;
                case 5: position1=0x48*13+0x327f44;nMaxIndex1=1;break;
                    case 6: position1=0x48*15+0x327f44;nMaxIndex1=2;break;
    }
    m_comboxItem1.ResetContent();//Eliminate all existing content
    for(int k=0;k<nMaxIndex1;k++) 
    {

        file.Seek(position1, CFile::begin);
        pBuf1=strRead.GetBuffer(20);
        file.Read(pBuf1,16);
        CString sResult=_T("");
        sResult+=pBuf1;
        strRead.ReleaseBuffer(); //Cannot lack
        m_comboxItem1.InsertString(k,sResult);
        position1+=0x48;
        sResult.Empty();
    }
    file.Close();
}
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900