Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++
Hi,
 
I am reading a .csv file which contains Japanese characters.
I am using ifstream to parse the file and getline() to read each line in the given file.
 
As the file contains Japanese characters so file format is Multibyte character set (MBCS).
So it appends 3 special characters at the begining of the file, out of which one is the header to denote that the file format is MBCS.
 
Now how can i read this file having Japanese character by making sure that i should not read those 3 special characters..
 
so far i am using this .....
 
TCHAR *filename = "F:\\GPDM.csv";
	
	wstring temp;
 
	wifstream fstr;
 
	fstr.open(filename, ios::in);
 
	if (fstr.fail())
	{
		std::cout<<"File Open Error\n";
	}
	else
	{
		std::cout<<"File Open Success\n";
	}
 
	getline(fstr, temp);
 

Here "temp" still contains those 3 special characters...
 
Please help me out in this.
Posted 22-Feb-13 2:27am
Edited 23-Feb-13 0:21am
v2
Comments
jeron1 at 22-Feb-13 17:54pm
   
Isn't there some sort of seekg() function or equivalent to move the read pointer past the 3 characters?
Dilip K Sharma at 22-Feb-13 22:59pm
   
Hi,
 
one more thing i should tell you that Presence of Japanese characters are not certain, file may or may not have japanese characters.
 
So if i use seekg() and if file doesn't contain such characters then i will lose 3 characters from file.
Jochen Arndt at 23-Feb-13 5:21am
   
Check the first three characters of your temp string and if they are your special ones remove them using the erase() function: temp.erase(0, 3).
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

You should parse the first characters of the file according to BOM[^] rules and act appropriately.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Assuming you have an open file already:
 
char a, b, c;
f.Read(&a, 1);
f.Read(&b, 1);
f.Read(&c, 1);
 
if (a == (char)0xEF && b==(char)0xBB && c==(char)0xBF)
    f.Seek(3, CFile::begin);
...
 
Do your next move (like Read or ResdString or of the sort).
Of course you can always translate the code above to whatever way of using files suites you.
 
Maybe a little dirty, but works for me.
  Permalink  
v2
Comments
Steve44 at 22-Mar-13 19:38pm
   
In addition to the code above I would recommend to add a f.Seek(0, CFile::begin); in the else branch, when the 3 chars are not matching. This way you rewind and read the non-MBCS file from the beginning.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 240
1 Kamal Rocks 184
2 CPallini 155
3 PIEBALDconsult 150
4 BillWoodruff 148
0 OriginalGriff 5,695
1 DamithSL 4,506
2 Maciej Los 4,007
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,180


Advertise | Privacy | Mobile
Web01 | 2.8.141216.1 | Last Updated 22 Mar 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100