Click here to Skip to main content
13,140,231 members (45,441 online)
Rate this:
Please Sign up or sign in to vote.
See more:

I am reading a .csv file which contains Japanese characters.
I am using ifstream to parse the file and getline() to read each line in the given file.

As the file contains Japanese characters so file format is Multibyte character set (MBCS).
So it appends 3 special characters at the begining of the file, out of which one is the header to denote that the file format is MBCS.

Now how can i read this file having Japanese character by making sure that i should not read those 3 special characters..

so far i am using this .....

TCHAR *filename = "F:\\GPDM.csv";
	wstring temp;
	wifstream fstr;, ios::in);
	if (
		std::cout<<"File Open Error\n";
		std::cout<<"File Open Success\n";
	getline(fstr, temp);

Here "temp" still contains those 3 special characters...

Please help me out in this.
Posted 22-Feb-13 1:27am
Updated 22-Feb-13 23:21pm
Jochen Arndt200.9K
jeron1 22-Feb-13 17:54pm
Isn't there some sort of seekg() function or equivalent to move the read pointer past the 3 characters?
Dilip K Sharma 22-Feb-13 22:59pm

one more thing i should tell you that Presence of Japanese characters are not certain, file may or may not have japanese characters.

So if i use seekg() and if file doesn't contain such characters then i will lose 3 characters from file.
Jochen Arndt 23-Feb-13 5:21am
Check the first three characters of your temp string and if they are your special ones remove them using the erase() function: temp.erase(0, 3).
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

You should parse the first characters of the file according to BOM[^] rules and act appropriately.
Rate this: bad
Please Sign up or sign in to vote.

Solution 2

Assuming you have an open file already:

char a, b, c;
f.Read(&a, 1);
f.Read(&b, 1);
f.Read(&c, 1);
if (a == (char)0xEF && b==(char)0xBB && c==(char)0xBF)
    f.Seek(3, CFile::begin);


Do your next move (like Read or ResdString or of the sort).
Of course you can always translate the code above to whatever way of using files suites you.

Maybe a little dirty, but works for me.
Steve44 22-Mar-13 19:38pm
In addition to the code above I would recommend to add a f.Seek(0, CFile::begin); in the else branch, when the 3 chars are not matching. This way you rewind and read the non-MBCS file from the beginning.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy |
Web03 | 2.8.170915.1 | Last Updated 22 Mar 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100