|
|
Comments and Discussions
|
|
 |

|
Someone has drawn my attention to a probable bug. In ggets.cpp,
delete buffer
should be
delete[] buffer
I'll give it a test tomorrow. Yeah, I know, I have to update the demo project.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
First, yes it's a bug, but second, it's code that never gets executed in StdioFileEx because an initial fixed buffer is always passed to ggets.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Thanks for the great class!
Please just update the demo project to the latest version as already mentioned in some previous comment.
|
|
|
|

|
I have the need to read a file, 1 "character" at a time. This file is stored in a format which is only known at runtime; it could be in Unicode OR Multi-Byte, and could have any code-page. All of this is specified by the user at run-time.
The StdioFileEx class seems to be the closest I've found to reading files like this (as long as I specify the code-page), but I can't seem to find an API in the StdioFileEx class to just "read the next character." Do you have any suggestions?
|
|
|
|
|

|
Hi, David Pritchard
I'm using your StdioFileEx and it seems working fine.
But I think your should upgrade the demo which using version 1.5.
For the new version 1.6, I spent a whole day to find out how to use it.
Because I don't know the ggets.cpp shouldn't be in the project list.
Or there should be a "readme" included in the new version.
Thanks for your hard work!
|
|
|
|

|
For the others that had linker errors - Don't include the ggets.cpp under the Project Source Files.
Thanks.
|
|
|
|

|
I've just now discovered this bug, so I thought I'd warn you all.
Using the constructor with parameters (the one that includes the filename and flags) is NOT a good idea, because, since it calls "Open" directly, it will guarantee that the override of "open" will never be called (overrides in derived classes won't be called from base class constructors -- it's too early!!!).
This will have detrimental effects, not least the fact that Unicode files will never be detected as such.
Instead, use the default constructor and the "Open" function.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
You saved my life. You are a gentleman.
|
|
|
|

|
Why thank you!!
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Hola David,
first of all thanks very much for your implementation. It seem incredible that actually MFC CStdioFile doesn't support this kind of stuff natively.
I has just add your class to my free software Perfils to draw height profiles of GPS track data to allow reading and writting of utf-8 GPX tracks (http://www.amigosdelciclismo.com/perfils/[]). In the last weeks I have migrated all the program from multibyte to unicode and add support fo utf-8 using your class (I have made some minor changes removing some commented parts in the reading).
But I have discovered some extrange performance issue when writing a file through a local network:
- Using the standar class CStdioFile, writting or reading in a local disk drive are more or less the same as writting to a network disk drive. For example, a 500KB file in less than 0.5 seconds.
- Using your CStdioFileEx, when I read or write a file in a local disk drive, the performance is also good (less than 0.5 seconds), and reading a network file has also a good performance. But when I write the same file in a network disk drive, performance is very bad: (6 seconds, 10 or 20 seconds depending on the network, to write the same file).
Do you know why could be happen this thing?
Thanks in advance.
Ruben
|
|
|
|

|
Hi Ruben
Sorry, I haven't been paying much attention to my emails recently, as you can see.
I'm not sure what the cause might be. Have you been able to follow what the code is doing in debug? I can try to do some tests and get back to you.
Are you writing to Unicode? Unicode to Unicode doesn't require conversion, so it's a very simple case. I can't see why that would be slow (in fact, in that case it just calls CFile::Write and so shouldn't take any more time than a standard CStdioFile write).
Regards,
David
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
not worked for my unicode file
|
|
|
|

|
Hi,
Could you give me your file to test?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Hi David,
When i write or read "" character using CStdioFileEx Class i get assertion on the line
ASSERT(nCharsWritten > 0);
But when i make
it like
ASSERT(nCharsWritten > -1);
Then i doesn't get it.
so is i'm following rigth way or is there any alternative solution for reading or writing ""(empty) character
And yes well wishes to you for recovering Tenis elbow injury
|
|
|
|

|
Hi,
I'd need to take a closer look at it, but you'd need to make sure the conversion functions return 0 when they correctly write an empty string, and -1 when they fail. So you might need to modify them to stop removing the \0 from the count of characters written. But then you would need to check what else the return value is used for!
I'll need to take a closer look at the code and follow all the paths.
Thanks for pointing this out.
And thank you, my tennis elbow is much better now. Ice, exercises and cream work wonders. Anybody else with the same problem should see a physio and follow the same regimen.
Regards,
David
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|
|

|
I take it that the string is not strictly empty, since it contains a line break, right? In my test suite (BTW, it seems that the test suite is not included in the current source code on CodeProject, not sure what happened there), clearly there are blank lines included, and no assertions occur. Or is it just a blank string without line breaks?
Are you using any particular code page?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Really thanks for reply but,
have you tried
CStdioFileEx myFile;
myFile.WriteString("");
writing Empty string vice versa
|
|
|
|

|
Hi,
OK, the reason why I never hit this problem is because I always compile in Unicode .
And if you just remove the ASSERT? It's not appropriate for this case. Otherwise, you could only ASSERT if the input string is not blank.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
I'm recovering from tennis elbow (should that be mouse elbow??) right now, so I won't be answering messages or writing any code for a little while.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Hi David,
First things first - the class works just dandy, thank-you, mixing Russian, Hebrew and Chinese in the same file.
However, it's a good deal slower reading a file than using the CFile equivalent (reading the file in one hit).
Any thoughts/suggestions on speeding it up?
|
|
|
|

|
Hi Dan!
Yeah, someone has mentioned this before, and it is sort of inevitable that reading character by character will be slower. Ideally it should have the option to read either the whole file into one huge buffer (or array), or to read data until a buffer of size x is exhausted. These optimisations could be used even for line-by-line reading too, providing you know that the user intends to read the whole file, or a large chunk of it. I'll look into it.
Cheers!
David
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
David Pritchard wrote: Ideally it should have the option to read either the whole file into one huge buffer (or array)
Off the top of my head, reading and processing the whole file ought to be fairly simple.
1. Read the entire file as a byte-stream.
2. If file is unicode and _UNICODE is defined then remove BOM and cast to WCHAR*.
3. If file is unicode and _UNICODE is NOT defined then remove BOM and convert to multi-byte string (WideToMultiByte).
4. If file is NOT unicode and _UNICODE is defined then convert to wide string (MultiByteToWide).
5. If file is NOT unicode and _UNICODE is NOT defined then we're done already.
And we could easily put these in a function called ReadAll() so that they cannot be called accidentally by overloading an existing function name.
I'll give this a go and see if it achieves my goal. Any other suggestions (esp. if my suggestion is flawed) are welcomed.
|
|
|
|

|
Sure, that's pretty much it. I was planning to allow reading n bytes from the file as a byte stream, where n could be -1 (the whole file), or just some number of characters, and use the bytes read as a cache from which lines could be read in the normal way. That would take care of intermediate cases where you didn't need or want to read the whole thing but wanted to speed things up.
In any case, if you need this now and you have more time than me, please go ahead and do it. Whatever you do, I'll try to incorporate it into a new version.
Regards,
David
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|
|

|
Could you be a little more specific?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
Hi
can u please tell me how can i write file in utf-8 format?
thanks in advance.
|
|
|
|

|
Its such a fantastic class
|
|
|
|

|
Hi,
this is such a nice class,i m using this to write unicode txt and html files and its working fine but when i m trying to write unicode cvs file with this class its write the all data but not in correct format means when i open the file its not diplay the data in correct format.
please tell me how can i write unicode csv file in correct format with your class.
thanks in advance.
|
|
|
|

|
Thank you for uploading excellent class. I have one question. Is not the constructer which uses lpszFileName and nOpenFlags be supported? CStdioFileEx::CStdioFileEx(LPCTSTR lpszFileName,UINT nOpenFlags): m_bCheckFilePos(true), m_bIsUnicodeText(false), m_nFileCodePage(-1), m_cUnicodeFillerChar(sDEFAULT_UNICODE_FILLER_CHAR), m_bWriteBOM(true), // By default, write the BOM CStdioFile(lpszFileName, nOpenFlags) { } In 1.6, this constructer just hands nOpenFlags to CStdioFile without adding CFile::typeBinary, and does not check nOpenFlags and BOM such as CStdioFileEx::Open.
|
|
|
|

|
After struggling for hours with CFile/CStdioFile my co-worker found this class as a replacement. Very well done. Not only does your class save a Unicode file correctly, it also takes a unicode string as a parameter and can save it in ANSI. Microsoft needs people like you.
|
|
|
|

|
Thanks . I agree, it's a rather big and obvious hole in Microsoft's libraries.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
An excellent replacement for the standard class.
|
|
|
|

|
Very much appreciated.
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
Tools that support "all of UTF-8 as long as it starts with ASCII" and tools that cannot handle these three [BOM] bytes at all are not really supporting UTF-8.
- Michael Kaplan
|
|
|
|

|
With export to excel or .txt file, for arabic characters i'm getting a junk
|
|
|
|

|
great i got it right.. we were using your Version 1.5 18, i added these lines only WriteString .. poor guess
if (m_nFileCodePage == CP_UTF8)
{
CFile::Write(UTF8_BOM, sizeof(UTF8_BOM));
}
and got it working
many thanks
|
|
|
|
|

|
oops its not working with export to excel......
|
|
|
|

|
Hi there,
Sorry, I haven't been paying much attention to my email lately .
Is it working now? Is the problem that no BOM is being written in UTF8?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
|
|
|
|

|
Thanks Dave
Arabic export to excel is not working still, with the latest version.
have to find a way before nov.
|
|
|
|

|
What exactly do you mean by export to Excel? What's the process you're following?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
|
|
|
|

|
Thanks for your nice class...
We are using your CStdioFileEx class to write unicode strings to files(.txt , .xls, .csv etc).
It works well with other languages.
With Arabic strings,
When we try to write arabic strings to the .xls file, we get junk characters in the file.
But With arabic strings the writing to .txt file is fine.
We use the SetCodePage(CP_UTF8) function.
|
|
|
|

|
Are you compiling in Unicode? The languages that work, the ones you mentioned in the other message, seem to use the western character set. Have you tried Russian or anything of that sort?
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
|
|
|
|

|
Sorry actually no change was required.
.txt,.csv &.xls files are coming fine, for arabic also
That was due to the missing language setting(control panel) in windows i was getting junk characters. Thanks to my leads who helped me to figure out the issue.
|
|
|
|

|
Ok, great!
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
|
|
|
|

|
what is the value for UTF8_BOM
|
|
|
|

|
Nice code.
I've just added it into Microplanet Gravity to help me parse registry files. Why didn't MS just make them plain ASCII text is beyond me!
|
|
|
|

|
Well there's a straightforward reason for that - a lot of data is now Unicode, not least NTFS file paths, which the registry is full of.
Glad you like the class!
- Pfft. Coddled kids. In my day, we used to telnet to port 80, then render the page with pencil and paper-- and that's the way we liked it!
- Pshaw! Youngster. Your UID barely fits inside 16 bits. In _my_ day we had to whistle the 1's and 0's through an acoustic coupler!
|
|
|
|

|
I get two errors :
1) StdioFileEx.h(304): error C2555: 'CStdioFileEx::Seek' : le type de retour de la fonction virtuelle de substitution diffère de 'CStdioFile::Seek' et n'est pas covariant
(The return type of the virtual function is not the same as 'CStdioFile::Seek' and is not covariant)
2) ggets.cpp(31): fatal error C1083: cannot open : 'stdafx.h' : No such file or directory
Of course I have tried to rebuild the project but with no success. Have an idea ? Thx
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
|
A class, derived from CStdioFile, which transparently reads and writes both Unicode and multibyte files. Version 1.5.
| Type | Article |
| Licence | CPOL |
| First Posted | 12 May 2003 |
| Views | 293,333 |
| Downloads | 7,991 |
| Bookmarked | 95 times |
|
|