Click here to Skip to main content
15,888,579 members
Articles / Desktop Programming / MFC
Article

Simple class to read and write from an UTF-8 encoded file

Rate me:
Please Sign up or sign in to vote.
3.23/5 (7 votes)
8 Jul 2004 71.7K   1.9K   14   6
A class derived from CStdioFile to read and write from an UTF-8 encoded file.

Disclaimer

This will certainly look like a very thin article, but it's all in the source :^)

I have looked around quite a bit, both here at Code Project and elsewhere, since I thought that there must be someone who has posted such a class already. Well, I couldn't find any, so here is my own quick hack to solve the problem.

Background

The CStdioFile_UTF8 class was initially done as a step towards making Dan Goodson's excellent TodoList program support Unicode. I'm posting it here in case someone else finds it useful.

Using the code

Use the class as a plug in replacement for MFC CStdioFile. The class overrides the ReadString and WriteString functions in order to do some conversion. It also provides the functions ReadBOM and WriteBOM to handle an optional bute order mark in the file.

If _UNICODE is defined, the UTF16 strings used internally are converted from/to UTF8 as used in the file. If the symbol is not defined, the class acts exactly like the parent class CStdioFile.

History

  • 9-Jul-2004

    Initial version.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Sweden Sweden
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralCompiler Error Pin
anderslundsgard29-Aug-04 23:10
anderslundsgard29-Aug-04 23:10 
GeneralRe: Compiler Error Pin
Uwe Sedlack11-Mar-05 6:12
sussUwe Sedlack11-Mar-05 6:12 
GeneralCStdioFileEx Pin
David Pritchard14-Jul-04 11:51
David Pritchard14-Jul-04 11:51 
Generalhalf-baked conversion Pin
umeca7413-Jul-04 19:51
umeca7413-Jul-04 19:51 
your ReadString method converts from utf8 -> windows text if building with unicode but for ansi it just leaves the multibyte text which isn't much use if you want e.g. to paste the text in an edit control.

your use of MultiByteToWideChar needs to be revised too
you read the file in chunks of nMax characters, which may happen to break a multibyte sequence. As your code stands now you will be losing all such characters

i'm glad to see utf8 interest since the world is much bigger than english-speaking countries!
GeneralRe: half-baked conversion Pin
Sven Axelsson16-Jul-04 0:42
Sven Axelsson16-Jul-04 0:42 
GeneralRe: half-baked conversion Pin
umeca7419-Jul-04 3:01
umeca7419-Jul-04 3:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.