Click here to Skip to main content
15,399,440 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to place what I read from a utf-8 *.txt file (with Unicode characters in it) to a MessageBox.

(This is a further question after I received useful logic from you to the last question at https://www.codeproject.com/Questions/5335822/Save-file-with-unicode-string-in-old-windows-syste )

As I said in the previous question:
Quote:
If I want to update some software for my uncle's old saw mill machine that has a programmable controller with C, and I want to code it in C or C++98 or in C++03, nothing newer, then how do make the following work for that?

I am doing a test on an old Windows XP Pro 32 bit system with every (at the time of installation) code page installed, and with CodeBlocks 17.12 and GCC of 5.1 . I think that I should be able to code it for the older system with this. [...]

If I can do it all in C then that is OK.

I prefer to do this in C, but if 98 or 03 works that is OK also.


I think that I need to convert from an LPVOID to a const wchar_t* or to a wstring to do this.

For my LPVOID I am using

C++
LPVOID lpBuffer;
    //    An LPCVOID is a 32-bit pointer to a constant of any type.
    //    This type is declared as follows:
    //    typedef const void* LPCVOID;

const char* pString_from_lpBuffer;


I am reading from the file with


C++
BOOL  bErr01 = ReadFile(
                  HANDLE_Of_File_to_READ,       // HANDLE
                  lpBuffer,                     // LPCVOID
                  sizeof(lpBuffer),             // DWORD
                  &nNumberOfBytesToRead,        // LPDWORD,
                  NULL                          // LPOVERLAPPED
                  );


Which I created with


C++
HANDLE HANDLE_Of_File_to_READ = CreateFile(TheFile_to_READ, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);



Here is where I am:

I am receiving a pointer to a constant of which I do not know the type. I have studied this and I have not found what this type is except that the LPVOID accepts whatever is sent to it. So helpful but not helpful for me to convert. From an unknown constant type to a const wchar_t*.

I want to see, in a MessageBox what the text is that I am reading, including the Unicode characters.

Hours later, I am posting this question.

Thank you.

What I have tried:

        // File name
        const wchar_t* TheFile_to_READ;
        TheFile_to_READ = L"utf8_UsingByteOrderMark_C_天堂.txt";

        // Read a file
        //     by using the CreateFile dwCreationDisposition of "OPEN_ALWAYS",
        //     if the old file exists with the same name in the specified directory then append to it,
        //     but if the old file does not exist then create it.
        // This time use "OPEN_ALWAYS".
        HANDLE HANDLE_Of_File_to_READ;

        HANDLE_Of_File_to_READ = CreateFile(TheFile_to_READ, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
                // A simple error checking example
                    if (HANDLE_Of_File_to_READ == INVALID_HANDLE_VALUE)
                        {
                            MessageBox(nullptr, L"CreateFile(HANDLE_Of_File_to_READ)   utf8_UsingByteOrderMark_C_天堂.txt   failed", L"CreateFile(HANDLE_Of_File_to_READ)   utf8_UsingByteOrderMark_C_天堂.txt   failed", MB_ICONEXCLAMATION | MB_OK);
                            // return; or something else as you decide.
                        }

        LPVOID lpBuffer;
            //    An LPCVOID is a 32-bit pointer to a constant of any type.
            //    This type is declared as follows:
            //    typedef const void* LPCVOID;

        const char* pString_from_lpBuffer;

        DWORD  nNumberOfBytesToRead;

        BOOL bErr01;
        bErr01 = false;


        // In this file read something,

        //    BOOL ReadFile(
        //      [in]                HANDLE       hFile,
        //      [out]               LPVOID       lpBuffer,              A pointer to the buffer that receives the data read from a file or device.
        //      [in]                DWORD        nNumberOfBytesToRead,  The maximum number of bytes to be read.
        //      [out, optional]     LPDWORD      lpNumberOfBytesRead,   A pointer to the variable that receives the number of bytes read when using a synchronous hFile parameter. ReadFile sets this value to zero before doing any work or error checking.
        //                                                              Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.
        //      [in, out, optional] LPOVERLAPPED lpOverlapped
        //    );

        bErr01 = ReadFile(
                          HANDLE_Of_File_to_READ,       // HANDLE
                          lpBuffer,                     // LPCVOID
                          sizeof(lpBuffer),             // DWORD
                          &nNumberOfBytesToRead,        // LPDWORD,
                          NULL                          // LPOVERLAPPED
                          );

//--------------
// Put the lpBuffer into a message box by converting it to a const wchar_t* .
// lpBuffer to const wchar_t*

// Does NOT work.
//        pString_from_lpBuffer = reinterpret_cast<const char *>(*lpBuffer);

// Does NOT work.
//        const wchar_t* Test_READ;
//
//        Test_READ = lpBuffer;

// Can't even get to this line.
//        MessageBox(nullptr, L"pString_from_lpBuffer", pString_from_lpBuffer, MB_ICONEXCLAMATION | MB_OK);

//--------------

        CloseHandle(HANDLE_Of_File_to_READ);
Posted
Updated 3-Jul-22 13:21pm
v3

Quote:
LPVOID lpBuffer;
// An LPCVOID is a 32-bit pointer to a constant of any type.
// This type is declared as follows:
// typedef const void* LPCVOID;

const char* pString_from_lpBuffer;

DWORD nNumberOfBytesToRead;

BOOL bErr01;
bErr01 = false;


// In this file read something,

// BOOL ReadFile(
// [in] HANDLE hFile,
// [out] LPVOID lpBuffer, A pointer to the buffer that receives the data read from a file or device.
// [in] DWORD nNumberOfBytesToRead, The maximum number of bytes to be read.
// [out, optional] LPDWORD lpNumberOfBytesRead, A pointer to the variable that receives the number of bytes read when using a synchronous hFile parameter. ReadFile sets this value to zero before doing any work or error checking.
// Use NULL for this parameter if this is an asynchronous operation to avoid potentially erroneous results.
// [in, out, optional] LPOVERLAPPED lpOverlapped
// );

bErr01 = ReadFile(
HANDLE_Of_File_to_READ, // HANDLE
lpBuffer, // LPCVOID
sizeof(lpBuffer), // DWORD
&nNumberOfBytesToRead, // LPDWORD,
NULL // LPOVERLAPPED
);

This is plain wrong. you are passing an invalid (not even initialised) pointer to ReadFile.
(Note: sizeof(lpBuffer) is 8 on a 64-bit machine).
It should be something similar to
C++
//..
BYTE buffer[1024];
//..
bErr01 = ReadFile(
                          HANDLE_Of_File_to_READ,       // HANDLE
                          buffer,                     
                          sizeof(buffer),             // DWORD
                          &nNumberOfBytesToRead,        // LPDWORD,
                          NULL                          // LPOVERLAPPED
                          );

The you should probably use MultiByteToWideChar function (stringapiset.h) - Win32 apps | Microsoft Docs[^] with CP_UTF8 in order to convert the UTF8 string for the MessageBox call.
   
v3
Comments
Member 15078716 4-Jul-22 1:44am
   
@ CPallini, why did you chose BYTE buffer[1024]; ? I am guessing that you said that so that later I could trim down the buffer to fit: I am currently looking into that.

I added your code then added the following:

const wchar_t* FromTextFile = reinterpret_cast<const wchar_t*>(lpBuffer);MessageBox(nullptr, FromTextFile, FromTextFile, MB_ICONEXCLAMATION | MB_OK);


Which gave me a message box with the following:
hello - J - こんにちは - abcdefghijklmnopqrstuvwxy[with about 10 more Asian characters and a little r at the end]

I had written in the txt file the following:
hello - J - こんにちは - abcdefghijklmnopqrstuvwxy

I have not yet found documentation on how to read the total character length of the utf-8 file including all possible 17 planes. I think that I might also have to account for some null or 0 characters at the end, maybe. I am looking into this now. Almost got it. Thank you.

Thank you.
It is possible to do this in standard C, but it depends on whether your compiler supports the UTF-8 character encoding. See the setlocale and mbstowcs functions for details.

Once the data are converted to wide characters, you may use the wcsXXX family of functions to process the data.

If you need to write the data to file in UTF-8 format, see the wcstombs function.


A way that is guaranteed to work on all versions of Windows from NT 4.0 on is to convert them to Unicode using the MultiByteToWideChar API. Use CP_UTF8 as the codepage.

Once you have the data in Unicode format, you may use Windows' Unicode functions (e.g. MessageBoxW, not MessageBox) to process the data. This will work even if your program is not Unicode-enabled.

Lastly, if the data must be written back to file in UTF-8 format, look up the WideCharToMultiByte API. Again, use CP_UTF8 as the codepage.
   
Comments
Member 15078716 4-Jul-22 2:05am
   
@Daniel Pfeffer, I think that I have had difficulty geting this compiler GCC 5.1 to work with WideCharToMultiByte and MultiByteToWideChar. And Microsoft does not seem to have such great compatibility with C++11/Unicode/utf-8 all combined. You said that it is possible to do this in standard C. I am interested in that.

Currently the following seems to be indicating that I am getting the conversions at least close.

const wchar_t* FromTextFile = reinterpret_cast<const wchar_t*>(lpBuffer);

MessageBox(nullptr, FromTextFile, FromTextFile, MB_ICONEXCLAMATION | MB_OK);


Thank you.
Daniel Pfeffer 4-Jul-22 3:15am
   
The MultiByteToWideChar() API is part of the Windows SDK, and requires including windows.h

There is a good example using C only here: https://en.cppreference.com/w/cpp/string/multibyte/mbstowcs

Please note that the string passed to setlocale() is compiler-dependent; you will have to consult your compiler's documentation to see how to set up a UTF-8 locale.

Member 15078716 4-Jul-22 10:22am
   
@Daniel Pfeffer, I have been and I am using windows.h . Thank you for giving that as part of your logic.

Limited to this setup, and not as a general statement, I do not agree that setlocale() is "as" compiler-dependent as it is system local dependent. But, my use of GCC 5.1 via this IDE and system has been causing me some problems that I am working around. The books are nice, the documentation is fine, but if the combination being used ignores them, then work around it. That is what I am doing.

.imbue gives this compiler problems.
mbstowcs gives this compiler problems.
I am using this compiler by choice becuause it fits with the system that I am using and the old hardware that I am coding for. I am almost required to use this setup. Again, I am looking for work-arounds. I have found work-arounds for a lot of other problems that this combination has given me. I, with help, will get this one also.

Thank you. All of the input from all of you is appreciated.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900