Click here to Skip to main content
Sign Up to vote bad
good
I try to read one by one character in the Unicode (utf-8) file, but I don't know how to read a single character. So can you tell me what is the easiest way to read a single character?
Posted 6 Jan '12 - 15:34
Edited 9 Jan '12 - 2:09

Comments
johny10151981 - 9 Jan '12 - 8:09
Correcting Title

4 solutions

Due to the fact that UTF-8 encoded characters have a variable length, you have to check each byte read. A possible solution (using file a file handle opened in binary mode) would be:
 
typedef struct {
    int nLen;
    unsigned char cByte[6];
} utf8char_t;
 
// Read UTF-8 char into struct
// Return number of UTF-8 bytes read (0 upon EOF, -1 upon invalid codes)
int read_utf8_char(FILE *f, utf8char_t& tChar)
{
    tChar.nLen = 0;
    if (feof(f))
        return 0;
    unsigned char c = tChar.cByte[0] = 
        static_cast<unsigned char>(fgetc(f));
    if (c & 0x80)
    {
        while (c & 0x80)
        {
            ++tChar.nLen;
            c <<= 1;
        }
        for (int i = 1; i < tChar.nLen && i < 6)
        {
            if (feof(f))
                return 0;
            tChar.cByte[i] = static_cast<unsigned char>(fgetc(f));
            if ((tChar.cByte[i] & 0xC0) != 0x80)
                return -1;
        }
        if (tChar.nLen >= 6)
            return -1;
    }
    else
        tChar.nLen = 1;
    return tChar.nLen;
}
 
Please nothe that this example does not check for all possible wrong UTF-8 codes.
  Permalink  
Comments
johny10151981 - 8 Jan '12 - 22:24
Dude OP Said Unicode Unicode is 2 bytes long, UTF-8 is variable length
Jochen Arndt - 9 Jan '12 - 4:03
He said Unicode in the title and stated more precisely UTF-8 in the question.
After reading a good article referenced above by DrBones69, you can also use sample code from this thread: Read unicode file into wstring[^]
  Permalink  
Maybe this article will get you started in the right direction.
  Permalink  
Comments
Emilio Garavaglia - 7 Jan '12 - 13:16
:-O
There are several options depending on the type of stream you're using like fgetc or ReadFile or fstream.>> etc.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Your Filters
Interested
Ignored
     
0 OriginalGriff 213
1 Sergey Alexandrovich Kryukov 159
2 Richard MacCutchan 150
3 Maciej Los 136
4 Tadit Dash 110
0 Sergey Alexandrovich Kryukov 10,264
1 OriginalGriff 7,917
2 CPallini 4,181
3 Rohan Leuva 3,522
4 Maciej Los 3,125


Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 9 Jan 2012
Copyright © CodeProject, 1999-2013
All Rights Reserved. Terms of Use
Layout: fixed | fluid