Click here to Skip to main content
13,191,331 members (78,099 online)
Click here to Skip to main content
Add your own
alternative version


55 bookmarked
Posted 9 Feb 2005

A UTF-16 Class for Reading and Writing Unicode Files

Rate this:
Please Sign up or sign in to vote.
A UTF-16 class derived from CStdioFile for reading and writing Unicode files


As Unicode becomes more popular, programmers will find themselves performing more file based operations using Unicode. Currently, familiar MFC classes such as CFile and CStdioFile do not properly handle reading and writing of a Unicode file. The class file presented addresses the need to read and write files as UTF-16 Unicode files.

Using the Code

During construction or with the use of the Open() member function, the class will examine the first two bytes of the file after appropriate size checking. The two byte sequence (BOM) 0xFE, 0xFF indicates the file is UTF-16 encoded. If this is the case, m_bIsUnicode is set to TRUE. If the bytes are not present, the class performs a CStdioFile::Seek(0, CFile::begin ) to return the consumed bytes.

CStdioFile::Read( &wcBOM, sizeof( WCHAR ) );

if( wcBOM == UNICODE_BOM ) {

    m_bIsUnicode   = TRUE;
    m_bByteSwapped = FALSE;

if( wcBOM == UNICODE_RBOM ) {

    m_bIsUnicode   = TRUE;
    m_bByteSwapped = TRUE;

// Not a BOM mark - treat it as an ANSI file
//   and defer to CStdioFile...
if( FALSE == m_bIsUnicode ) {

    CStdioFile::Seek( 0, CFile::begin );

ReadString(...) occurs as follows: if m_bIsUnicode is FALSE, the class returns the appropriate CStdioFile::ReadString(...) operation. If the file is UTF-16 encoded, the class will draw from an internal accumulator until a "\r" or "\n" is encountered when using CUTF16File::ReadString(CString& rString ). If using the CUTF16File::ReadString( LPWSTR lpsz, UINT nMax ) overload, CStdioFile::ReadString() behavior is duplicated. See the underlying comment from fgets().

The above read is accomplished through an accumulator. The accumulator is a STL list of WCHARs. When filling the accumulator, byte swapping occurs if a Big Endian stream (0xFF, 0xFE) is encountered.

Writing to a file is accomplished by extending the normal function with WriteString(LPCTSTR lpsz, BOOL bAsUnicode ). CStdioFile will handle the ANSI conversion internally, so CUTF16File simply yields to CStdioFile. If bAsUnicode is TRUE, the program will write the BOM (if file position is 0), and then call CFile::Write(...).

The program will open two files on the hard drive, write out both Unicode and ANSI text files, then read the files back in. The driver program then uses OutputDebugString(...) to write messages to the debugger's output window.

CUTF16File output1( L"unicode_write.txt", CFile::modeWrite |
CFile::modeCreate );
output1.WriteString( L"Hello World from Unicode land!", TRUE );


CString szInput;
CUTF16File input1( L"unicode_write.txt", CFile::modeRead );
input1.ReadString( szInput );

Figure 1 is the result of writing a test file with the provided driver program. Notice that the BOM bytes are swapped on the disk.

Figure 1: Result of test program.

Figure 2 examines a similar file created with Notepad on Windows 2000 while saving the file as Unicode.

Figure 2: A Unicode sample created in Notepad.

Additional Reading

  • International Programming for Microsoft Windows by D. Schmitt, ISBN 1-57231-956-9
  • Programming Windows with MFC by J. Prosise, ISBN 1-57231-695-0
  • Programming Server-Side Applications for Microsoft Windows 2000 by J. Richter and J. Clark, ISBN 0-73560-753-2


  • 10 Feb 2005 Original release
  • 23 Dec 2006 Added Jordan Walters' improvements and bug fixes
  • 23 Dec 2006 Added Jordan Walters as an author
  • 17 Sep 2008 Fixed long-standing bug in 2nd constructor
  • 13 Jul 2009 Correct handling of Unicode characters. If UNICODE/_UNICODE project settings specified, writing ANSI still produces a Unicode output file.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Authors

Jeffrey Walton
Systems / Hardware Administrator
United States United States
No Biography provided

Jordan Walters
Software Developer (Senior)
United Kingdom United Kingdom
Ok, it's about time I updated this profile. I still live near 'Beastly' Eastleigh in Hampshire, England. However I have recently been granted a permamant migration visa to Australia - so if you're a potential employer from down under and like the look of me, please get in touch.
Still married - just, still with just a son and daughter. But they are now 8 and 7 resp and when together they have the energy of a nuclear bomb.
I worked at Teleca UK for over 8.5 years (but have now moved to TikitTFB) and have done loads of different things. Heavily involved with MFC, SQL, C#, The latest is ASP.NET with C# and Javascript. Moving away from Trolltech Qt3 and 4.

You may also be interested in...


Comments and Discussions

GeneralI have a problem Pin
Member 222813622-Jan-08 21:26
memberMember 222813622-Jan-08 21:26 
GeneralText getting truncated? [modified] Pin
jimwillsher26-Jun-06 7:24
memberjimwillsher26-Jun-06 7:24 
GeneralRe: Text getting truncated? Pin
Jordan Walters31-Aug-06 11:40
memberJordan Walters31-Aug-06 11:40 
GeneralNew Version for non-Unicode builds [modified] Pin
Jordan Walters14-Dec-05 9:45
memberJordan Walters14-Dec-05 9:45 
AnswerRe: New Version for non-Unicode builds Pin
robosport27-May-06 21:17
memberrobosport27-May-06 21:17 
GeneralRe: New Version for non-Unicode builds Pin
Jeffrey Walton23-Dec-06 9:05
memberJeffrey Walton23-Dec-06 9:05 
GeneralWell done! Pin
Paul-T2-Jun-05 20:46
memberPaul-T2-Jun-05 20:46 
GeneralUNICODE compile flag may not be needed. Pin
Jordan Walters27-Apr-05 5:52
memberJordan Walters27-Apr-05 5:52 
I have made the class work so that it does not need the UNICODE pre-processor flag to be specified in order to work. I have not checked that it will work in a UNICODE build but I can confirm that it works in a non-Unicode build. I make use of TCHAR's and the conversion macros T2W etc to allow the code to remain unchanged regardless of whether the UNICODE pre-processor flag is specified or not.
I invite the author ro contact me so that I can send him my source code so that he can a). verify that it works in both types of build, and b). if it does work to update the source code download available to other developers.

I have also incorporated a fix for the Seek function so that whenever a Unicode string is read and a CRLF encountered, the file position is set to the last characters of the string returned to the ReadString caller (including the CRLF which should NOT appear appended on the string BTW).

I also removed the call to LoadAccumulator in the Seek function as this was causing no end of problems. When the file is first opened and the GetLength<2 check made, the call to GetLength calls CUTF16File::Seek which used to do a LoadAccumulator. When one then tried to read the first two bytes to check for Unicode the file pointer was no longer pointing to the start of the file (thanks to the LoadAccumulator call). Even calling SeekToBegin did not help. (Don't know why SeekToBegin did not sort it).


Same s**t, different day.
GeneralRe: UNICODE compile flag may not be needed. Pin
Anonymous9-May-05 17:37
sussAnonymous9-May-05 17:37 
GeneralRe: UNICODE compile flag may not be needed. Pin
aimsoft29-Aug-05 2:55
memberaimsoft29-Aug-05 2:55 
GeneralRe: UNICODE compile flag may not be needed. Pin
Bernhard12-Dec-05 2:45
memberBernhard12-Dec-05 2:45 
GeneralSeek Pin
Anonymous23-Feb-05 14:50
sussAnonymous23-Feb-05 14:50 
GeneralRe: Seek Pin
Anonymous9-May-05 17:33
sussAnonymous9-May-05 17:33 
GeneralRe: Seek Pin
aimsoft29-Aug-05 0:17
memberaimsoft29-Aug-05 0:17 
GeneralRe: Seek Pin
Jeff Walton1-Nov-05 7:18
memberJeff Walton1-Nov-05 7:18 
GeneralRe: Seek Pin
mambo_jumbo28-Dec-05 5:13
membermambo_jumbo28-Dec-05 5:13 
GeneralEndianess Suggestion Pin
Johann Gerell9-Feb-05 18:52
memberJohann Gerell9-Feb-05 18:52 
GeneralRe: Endianess Suggestion Pin
Jeff Walton11-Feb-05 13:55
memberJeff Walton11-Feb-05 13:55 
GeneralRe: Endianess Suggestion Pin
Johann Gerell11-Feb-05 22:43
memberJohann Gerell11-Feb-05 22:43 
GeneralRe: Endianess Suggestion Pin
Jeff Walton12-Feb-05 13:09
memberJeff Walton12-Feb-05 13:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171017.1 | Last Updated 15 Jul 2009
Article Copyright 2005 by Jeffrey Walton, Jordan Walters
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid