How to read a CSV file (google doc) line by line in c++

Question

0.00/5 (No votes)

See more:

Hi All,

I have to read a CSV file created as google doc containing some french accent. When I tried to read this file using fgets() function, it replaced the french character with some garbage values.

I am new to c++ and does not has the idea how to read a unicoded file.
Please guide me in order to get the solution. It will be more nice to me if you provide the code base.

Thanks a lot in advance.

Rajesh

Posted 27-Feb-11 20:20pm

Member 7660635

Add a Solution

6 solutions

Add a Solution

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Andrew Phillips · Answer 1 · 2011-02-28T02:03:00

The first thing is to determine the character set that the text file is using. The "garbage values" are how the French character is encoded in the character set. From your comments I am pretty sure it is not Unicode but if you posted the values (in hex) of the string that you read in that would make it much easier to be sure. I suspect the text uses a MBCS (multi-byte character set) in which case to display it properly on Windows you may need to use the correct code page.

BTW The "r" mode is correct as fopen defaults to text mode, but you can use "rt" if you want. In any case reading a text file as text or binary makes little difference (except for the way lines are terminated).

Olivier Levrey · Answer 2 · 2011-02-28T03:14:00

From your comment:
I am observing the garbage characters inside Visual Studio IDE while debugging. I am using VS6.0.

OK so I think it is a problem about character sets. As Andrew said it, your text file is probably not a Unicode text file and since your computer is not french you can't display the characters properly. (Visual Studio will take the current locale settings and load the corresponding characters set for the 8-bits characters).

I suggest that you convert your input string into unicode strings using these functions:

C#

//just to set a max size for the string buffers
#define MAX_SIZE 1000
//change this value to use a different characters set
#define CODE_PAGE 1250

//converts an ansi string (8 bits per character)
//into a unicode string (16 bits per character)
//using the code page provided by constant CODE_PAGE
//Note: do not delete or free the returning pointer!
WCHAR* AnsiToUnicode(LPCSTR ansiString)
{
    static WCHAR unicodeString[MAX_SIZE];
    MultiByteToWideChar(
        CODE_PAGE,          // code page
        MB_PRECOMPOSED,     // character-type options
        ansiString,         // address of string to map
        -1,                 // number of bytes in string
        unicodeString,      // address of wide-character buffer
        MAX_SIZE            // size of buffer
    );
    return unicodeString;
}

//converts a unicode string (16 bits per character)
//into an ansi string (8 bits per character)
//using the code page provided by constant CODE_PAGE
//Note: do not delete or free the returning pointer!
char* UnicodeToAnsi(LPCWSTR unicodeString)
{
    static char ansiString[MAX_SIZE];
    WideCharToMultiByte(
        CODE_PAGE,      // code page
        0,              // performance and mapping flags
        unicodeString,  // address of wide-character string
        -1,             // number of characters in string
        ansiString,     // address of buffer for new string
        MAX_SIZE,       // size of buffer
        NULL,           // address of default for unmappable characters
        NULL            // address of flag set when default
    );
    return ansiString;
}

//test
void test()
{
     FILE *fp;
     char str[100];
     fp = _tfopen(_T("D:\\myfile.csv"), _T("rt"));
     while (fgets(str, 100, fp))
     {
          //convert the string
          WCHAR* wstr = AnsiToUnicode(str);
 
          //do something......
     }
}

Or you may use cleaner versions of these functions:

C#

int AnsiToUnicode(LPCSTR ansiString, LPWSTR unicodeString, int maxSize)
{
    return MultiByteToWideChar(
        CODE_PAGE,          // code page
        MB_PRECOMPOSED,     // character-type options
        ansiString,         // address of string to map
        -1,                 // number of bytes in string
        unicodeString,      // address of wide-character buffer
        maxSize             // size of buffer
    );
}

int UnicodeToAnsi(LPCWSTR unicodeString, char* ansiString, int maxSize)
{
    return WideCharToMultiByte(
        CODE_PAGE,      // code page
        0,              // performance and mapping flags
        unicodeString,  // address of wide-character string
        -1,             // number of characters in string
        ansiString,     // address of buffer for new string
        maxSize,        // size of buffer
        NULL,           // address of default for unmappable characters
        NULL            // address of flag set when default
    );
}

And don't forget to enable unicode string display under Visual Studio:
To set your debugger options to display Unicode strings, click the Tools menu, click Options, click Debug, then check the Display Unicode Strings check box.

Olivier Levrey · Answer 3 · 2011-02-27T22:19:00

Solution 1

This function should work properly (well it does for me!).
You are maybe trying to read a "french" file from a non-french version of windows?

Try changing the locale before reading the file:

C

//required for the locale function
#include <locale.h>

void yourFunction()
{
    //changed locale settings for the current thread only to french
    setlocale(LC_ALL, "French");
    //then open and read your text file
    //...
}

Posted 27-Feb-11 22:19pm

Olivier Levrey

Comments

Member 7660635 28-Feb-11 5:08am

Thanks for your response. I tried this but it did not work. Here is the my code
//required for the locale function
#include <locale.h>

void yourFunction()
{
//changed locale settings for the current thread only to french
setlocale(LC_ALL, "French");
FILE *fp = fopen("D:\\myfile.csv","r");
while(fgets(str,1000,fp))
{
// do something
}
}

"str" still contains garbage values.

Olivier Levrey 28-Feb-11 5:19am

Use fopen with "rt" and not "r" otherwise you will get binary data

**ThatsAlok** · Answer 4 · 2011-02-27T22:30:00

Solution 2

i belive you should use UNICODE version of fgets()

Posted 27-Feb-11 22:30pm

ThatsAlok

Hans Dietrich · Answer 5 · 2011-02-27T22:32:00

Solution 3

fgets() will try to open and read file as ANSI. If you are using TCHARs, you should use _fgetts(). Otherwise, to read Unicode file, you should use fgetws(), and character buffer should use WCHAR, not char.

Posted 27-Feb-11 22:32pm

Hans Dietrich

Comments

Member 7660635 28-Feb-11 5:10am

I tried both _fgetts() and fgetws() but could not resolve it.

Hans Dietrich 28-Feb-11 5:17am

Show us your code.

Member 7660635 28-Feb-11 8:13am

Please find the code below:

FILE *fp;
TCHAR str[100];
fp = _tfopen("D:\\myfile.csv", _T("rb"));
while( _fgetts( str, 100, fp ))
{
//do something......
}

Olivier Levrey · Answer 6 · 2011-02-27T23:24:00

Solution 4

In the code you sent me, you use fopen with "r". If you want to read text, you should use "rt".

Posted 27-Feb-11 23:24pm

Olivier Levrey

Comments

Member 7660635 28-Feb-11 5:53am

Thanks for your quick response.
I tried this but could not resolve.

Olivier Levrey 28-Feb-11 6:10am

With all the answers you have there, it should work.

Where are you observing the garbage characters? Inside Visual Studio IDE while debugging? Inside a dialog box you created? Inside another text file you wrote?
Provide your full code or tell us more details about where you see the problem, because it SHOULD work.

Member 7660635 28-Feb-11 8:07am

I am observing the garbage characters inside Visual Studio IDE while debugging. I am using VS6.0.

How to read a CSV file (google doc) line by line in c++

6 solutions

Solution 5

Solution 6

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

How to read a CSV file (google doc) line by line in c++

6 solutions

Solution 5

Solution 6

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

Existing Members

...or Join us