Click here to Skip to main content
15,884,096 members
Articles / Desktop Programming / MFC
Article

Easy text document conversion - ANSI/Unicode and Unicode/ANSI

Rate me:
Please Sign up or sign in to vote.
3.60/5 (23 votes)
31 Oct 20043 min read 275.3K   3.2K   44   15
An article on direct ANSI to Unicode text document conversion from the source code.

Sample Project

Introduction

This article is about ANSI to Unicode and Unicode to ANSI document conversion. With the presented code, you will be able to simply load or save a text document from your project in either ANSI or Unicode format.

Background

You can read the article from Chris Maunder on enabling Unicode source compiling for a project. Also, David Pritchard has an interesting article on extending CStdioFile class to enable Unicode support when reading and writing to a file.

Using the code

You can use code fragments from this article and adjust them to your needs, or you can download a sample project and see how it deals with Unicode and ANSI files. The only thing that is important to know is that your project must define _UNICODE flag as preprocessor directive to enable Unicode source compiling. See above articles for explanation.

Loading Unicode or ANSI text document

Loading the most important thing (byte-order mask) of the Unicode text file looks like this:

//   You will notice that strFile is a file name that you have to supply.

   // Reading buffer
   _TCHAR buffer[1024];

   // Byte-order mark goes at the begining of the UNICODE file
   _TCHAR bom;

   CFile* pFile = new CFile();
   pFile->Open( strFile, CFile::modeRead );
   pFile->Read( &bom, sizeof(_TCHAR) );
   pFile->Close();

If there is a byte-order mask at the beginning of the text file and its value is 0xFEFF, you certainly have a Unicode text document to worry about. So, the question is how to read it to a simple CString object?

Follow next:

//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you will save text from the file.

   // If we are reading UNICODE file
   if ( bom == _TCHAR(0xFEFF ) )
   {
      CFile* pFile = new CFile();
      pFile->Open( strFile, CFile::modeRead );
      pFile->Read( &bom, sizeof(_TCHAR) );
      UINT ret = pFile->Read( buffer, 
                              _tcslen(buffer)*sizeof(_TCHAR) );
      buffer[ret] = _T('\0');
      pFile->Close();

      strText = buffer;

      // Release extra characters
      int nLen = strText.GetLength();
      strText = strText.Left( nLen/2 );
   }

Now, you have your file in CString object. If you are wondering what the last two lines of code do, then do know that this is the simple way to cut extra characters which appear due to double-byte encoding of Unicode text in the file stream.

But, what if your file isn't a Unicode file, that is, if the byte-order mask is not equal to 0xFEFF? Then, it is possible that you have to deal with ANSI file. I say it is possible because it doesn't mean that the file is ANSI, it may be encoded in some other way (to UTF-8 or to Unicode BIG ENDIAN or to something else).

But if the text file is ANSI encoded, then you should do the following:

//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you will save text from the file.

   // If we are reading ANSI file
   {
      CStdioFile* pStdioFile = new CStdioFile();
      pStdioFile->Open( strFile, CFile::modeRead );
      pStdioFile->ReadString( strText );
      pStdioFile->Close();
   }

As a result, an ANSI text file is loaded to a CString object.

Saving Unicode or ANSI text document

Saving a Unicode text file goes like this:

//
//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you hold text to be saved in the file.

   // Byte-order mark goes at the begining of the UNICODE file
   _TCHAR bom = (_TCHAR)0xFEFF;

   CFile* pFile = new CFile();
   pFile->Open( strFile, CFile::modeCreate | CFile::modeWrite );
   pFile->Write( &bom, sizeof(_TCHAR) );
   pFile->Write( LPCTSTR(strText), strText.GetLength()*sizeof(_TCHAR) );
   pFile->Close();

If you would like to save the file as ANSI, do the following:

//
//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you hold text to be saved in the file.

   CStdioFile* pStdioFile = new CStdioFile();
   pStdioFile->Open( strFile, CFile::modeCreate | CFile::modeWrite );
   pStdioFile->WriteString( strText );
   pStdioFile->Close();

What to do with loaded text?

You can use this CString object further in your source, like: display it on the screen (you will see the exact characters you typed, like in MSWord application). To do this, use simple TextOut method of CDC class to pass CString object and also the number of characters (that is the length of the string). But, do know that you won't see correct result on the screen if you use just any type of the font you have on your system. Used font must have table mappings for the selected Unicode character set.

This is how would I do it in OnDraw method:

CFont font;
font.CreateFont( 15, 8, 0, 0, FW_BOLD, FALSE, FALSE, FALSE, DEFAULT_CHARSET,
             OUT_DEFAULT_PRECIS, CLIP_DEFAULT_PRECIS, DEFAULT_QUALITY,
             DEFAULT_PITCH | FF_DONTCARE, _T("Times New Roman") );
CFont* pOldFont = pDC->SelectObject( &font );
pDC->TextOut( 100, 100, strText, strText.GetLength() );
pDC->SelectObject( pOldFont );
font.DeleteObject();

Points of Interest

While I was analyzing bytes from text documents written in Notepad, I found out that there is difference between Unicode, Unicode BIG ENDIAN, and UTF-8 encoding, but solution for simple and universal text document reader/writer might be close from this point.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Software Developer (Senior) Elektromehanika d.o.o. Nis
Serbia Serbia
He has a master degree in Computer Science at Faculty of Electronics in Nis (Serbia), and works as a C++/C# application developer for Windows platforms since 2001. He likes traveling, reading and meeting new people and cultures.

Comments and Discussions

 
AnswerIn an ANSI app on Chinese XP,WriteFilewill be binary for LPCWSTR. Pin
xinkmt7-Oct-11 23:03
xinkmt7-Oct-11 23:03 
Generali need this ... but in exec. Pin
kiss.andrei22-Sep-08 23:12
kiss.andrei22-Sep-08 23:12 
GeneralThanks alot Pin
mezik6-May-08 6:22
mezik6-May-08 6:22 
GeneralNice article Pin
rp_suman15-Apr-07 4:30
rp_suman15-Apr-07 4:30 
Generalperfect Pin
liuxisheng_shizi26-Jan-07 15:04
liuxisheng_shizi26-Jan-07 15:04 
GeneralDoes not work Pin
sanjjull21-Aug-06 20:10
sanjjull21-Aug-06 20:10 
i have tested this program with file having japanese characters , but it does not display them properly.
e.g. Wr场駻浄iteFi焅十le will be displayed as Wr场teFle.


sanjay
Generaltwo problems i think Pin
BrandonBrandon31-Aug-05 11:19
sussBrandonBrandon31-Aug-05 11:19 
AnswerRe: two problems i think Pin
Member 43510333-Mar-08 23:11
Member 43510333-Mar-08 23:11 
GeneralCannot support MBCS format text file Pin
_DESOLATED_19-Dec-04 21:36
_DESOLATED_19-Dec-04 21:36 
GeneralRe: Cannot support MBCS format text file Pin
Anonymous26-Dec-04 19:03
Anonymous26-Dec-04 19:03 
GeneralNice Pin
poiut9-Nov-04 6:47
poiut9-Nov-04 6:47 
GeneralDoesn't work in win XP Pin
mpancewicz4-Nov-04 4:36
mpancewicz4-Nov-04 4:36 
GeneralRe: Doesn't work in win XP Pin
darkoman4-Nov-04 19:16
darkoman4-Nov-04 19:16 
GeneralRe: Doesn't work in win XP Pin
Anonymous4-Nov-04 19:39
Anonymous4-Nov-04 19:39 
GeneralRe: Doesn't work in win XP Pin
darkoman5-Nov-04 2:24
darkoman5-Nov-04 2:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.