Click here to Skip to main content
15,891,253 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,

I am reading a .csv file which contains Japanese characters.
I am using ifstream to parse the file and getline() to read each line in the given file.

As the file contains Japanese characters so file format is Multibyte character set (MBCS).
So it appends 3 special characters at the begining of the file, out of which one is the header to denote that the file format is MBCS.

Now how can i read this file having Japanese character by making sure that i should not read those 3 special characters..

so far i am using this .....

C++
TCHAR *filename = "F:\\GPDM.csv";
	
	wstring temp;

	wifstream fstr;

	fstr.open(filename, ios::in);

	if (fstr.fail())
	{
		std::cout<<"File Open Error\n";
	}
	else
	{
		std::cout<<"File Open Success\n";
	}

	getline(fstr, temp);



Here "temp" still contains those 3 special characters...

Please help me out in this.
Posted
Updated 22-Feb-13 23:21pm
v2
Comments
jeron1 22-Feb-13 17:54pm    
Isn't there some sort of seekg() function or equivalent to move the read pointer past the 3 characters?
Dilip K Sharma 22-Feb-13 22:59pm    
Hi,

one more thing i should tell you that Presence of Japanese characters are not certain, file may or may not have japanese characters.

So if i use seekg() and if file doesn't contain such characters then i will lose 3 characters from file.
Jochen Arndt 23-Feb-13 5:21am    
Check the first three characters of your temp string and if they are your special ones remove them using the erase() function: temp.erase(0, 3).

You should parse the first characters of the file according to BOM[^] rules and act appropriately.
 
Share this answer
 
Assuming you have an open file already:

C#
char a, b, c;
f.Read(&a, 1);
f.Read(&b, 1);
f.Read(&c, 1);

if (a == (char)0xEF && b==(char)0xBB && c==(char)0xBF)
    f.Seek(3, CFile::begin);

...

Do your next move (like Read or ResdString or of the sort).
Of course you can always translate the code above to whatever way of using files suites you.

Maybe a little dirty, but works for me.
 
Share this answer
 
v2
Comments
Steve44 22-Mar-13 19:38pm    
In addition to the code above I would recommend to add a f.Seek(0, CFile::begin); in the else branch, when the 3 chars are not matching. This way you rewind and read the non-MBCS file from the beginning.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900