Click here to Skip to main content
Click here to Skip to main content

Easy text document conversion - ANSI/Unicode and Unicode/ANSI

By , 31 Oct 2004
Rate this:
Please Sign up or sign in to vote.

Sample Project

Introduction

This article is about ANSI to Unicode and Unicode to ANSI document conversion. With the presented code, you will be able to simply load or save a text document from your project in either ANSI or Unicode format.

Background

You can read the article from Chris Maunder on enabling Unicode source compiling for a project. Also, David Pritchard has an interesting article on extending CStdioFile class to enable Unicode support when reading and writing to a file.

Using the code

You can use code fragments from this article and adjust them to your needs, or you can download a sample project and see how it deals with Unicode and ANSI files. The only thing that is important to know is that your project must define _UNICODE flag as preprocessor directive to enable Unicode source compiling. See above articles for explanation.

Loading Unicode or ANSI text document

Loading the most important thing (byte-order mask) of the Unicode text file looks like this:

//   You will notice that strFile is a file name that you have to supply.

   // Reading buffer
   _TCHAR buffer[1024];

   // Byte-order mark goes at the begining of the UNICODE file
   _TCHAR bom;

   CFile* pFile = new CFile();
   pFile->Open( strFile, CFile::modeRead );
   pFile->Read( &bom, sizeof(_TCHAR) );
   pFile->Close();

If there is a byte-order mask at the beginning of the text file and its value is 0xFEFF, you certainly have a Unicode text document to worry about. So, the question is how to read it to a simple CString object?

Follow next:

//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you will save text from the file.

   // If we are reading UNICODE file
   if ( bom == _TCHAR(0xFEFF ) )
   {
      CFile* pFile = new CFile();
      pFile->Open( strFile, CFile::modeRead );
      pFile->Read( &bom, sizeof(_TCHAR) );
      UINT ret = pFile->Read( buffer, 
                              _tcslen(buffer)*sizeof(_TCHAR) );
      buffer[ret] = _T('\0');
      pFile->Close();

      strText = buffer;

      // Release extra characters
      int nLen = strText.GetLength();
      strText = strText.Left( nLen/2 );
   }

Now, you have your file in CString object. If you are wondering what the last two lines of code do, then do know that this is the simple way to cut extra characters which appear due to double-byte encoding of Unicode text in the file stream.

But, what if your file isn't a Unicode file, that is, if the byte-order mask is not equal to 0xFEFF? Then, it is possible that you have to deal with ANSI file. I say it is possible because it doesn't mean that the file is ANSI, it may be encoded in some other way (to UTF-8 or to Unicode BIG ENDIAN or to something else).

But if the text file is ANSI encoded, then you should do the following:

//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you will save text from the file.

   // If we are reading ANSI file
   {
      CStdioFile* pStdioFile = new CStdioFile();
      pStdioFile->Open( strFile, CFile::modeRead );
      pStdioFile->ReadString( strText );
      pStdioFile->Close();
   }

As a result, an ANSI text file is loaded to a CString object.

Saving Unicode or ANSI text document

Saving a Unicode text file goes like this:

//
//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you hold text to be saved in the file.

   // Byte-order mark goes at the begining of the UNICODE file
   _TCHAR bom = (_TCHAR)0xFEFF;

   CFile* pFile = new CFile();
   pFile->Open( strFile, CFile::modeCreate | CFile::modeWrite );
   pFile->Write( &bom, sizeof(_TCHAR) );
   pFile->Write( LPCTSTR(strText), strText.GetLength()*sizeof(_TCHAR) );
   pFile->Close();

If you would like to save the file as ANSI, do the following:

//
//   As before, you have to supply the file name (strFile)
//   and also a CString object (strText) 
//   where you hold text to be saved in the file.

   CStdioFile* pStdioFile = new CStdioFile();
   pStdioFile->Open( strFile, CFile::modeCreate | CFile::modeWrite );
   pStdioFile->WriteString( strText );
   pStdioFile->Close();

What to do with loaded text?

You can use this CString object further in your source, like: display it on the screen (you will see the exact characters you typed, like in MSWord application). To do this, use simple TextOut method of CDC class to pass CString object and also the number of characters (that is the length of the string). But, do know that you won't see correct result on the screen if you use just any type of the font you have on your system. Used font must have table mappings for the selected Unicode character set.

This is how would I do it in OnDraw method:

   CFont font;
   font.CreateFont( 15, 8, 0, 0, FW_BOLD, FALSE, FALSE, FALSE, DEFAULT_CHARSET,
                OUT_DEFAULT_PRECIS, CLIP_DEFAULT_PRECIS, DEFAULT_QUALITY,
                DEFAULT_PITCH | FF_DONTCARE, _T("Times New Roman") );
   CFont* pOldFont = pDC->SelectObject( &font );
   pDC->TextOut( 100, 100, strText, strText.GetLength() );
   pDC->SelectObject( pOldFont );
   font.DeleteObject();

Points of Interest

While I was analyzing bytes from text documents written in Notepad, I found out that there is difference between Unicode, Unicode BIG ENDIAN, and UTF-8 encoding, but solution for simple and universal text document reader/writer might be close from this point.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

darkoman
Software Developer (Senior) Elektromehanika d.o.o. Nis
Serbia Serbia
He has a master degree in Computer Science at Faculty of Electronics in Nis (Serbia), and works as a C++/C# application developer for Windows platforms since 2001. He likes traveling, reading and meeting new people and cultures.

Comments and Discussions

 
AnswerIn an ANSI app on Chinese XP,WriteFilewill be binary for LPCWSTR. Pinmemberxinkmt7-Oct-11 23:03 
Generali need this ... but in exec. Pinmemberkiss.andrei22-Sep-08 23:12 
GeneralThanks alot Pinmembermezik6-May-08 6:22 
GeneralNice article Pinmemberrp_suman15-Apr-07 4:30 
Generalperfect PinmemberMember #318233426-Jan-07 15:04 
GeneralDoes not work Pinmembersanjjull21-Aug-06 20:10 
Generaltwo problems i think PinsussBrandonBrandon31-Aug-05 11:19 
AnswerRe: two problems i think PinmemberMember 43510333-Mar-08 23:11 
GeneralCannot support MBCS format text file PinmemberAlva Chien19-Dec-04 21:36 
GeneralRe: Cannot support MBCS format text file PinsussAnonymous26-Dec-04 19:03 
GeneralNice Pinmemberpoiut9-Nov-04 6:47 
GeneralDoesn't work in win XP Pinmembermpancewicz4-Nov-04 4:36 
GeneralRe: Doesn't work in win XP Pinmemberdarkoman4-Nov-04 19:16 
GeneralRe: Doesn't work in win XP PinsussAnonymous4-Nov-04 19:39 
GeneralRe: Doesn't work in win XP Pinmemberdarkoman5-Nov-04 2:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140415.2 | Last Updated 1 Nov 2004
Article Copyright 2004 by darkoman
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid