Click here to Skip to main content
15,894,546 members

Comments by PotatoSoup (Top 20 by date)

PotatoSoup 10-Dec-20 6:02am View    
I think that I found part of the answer to the Unicode parsing, (which was not part of the original question) "Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols)." (https://en.wikipedia.org/wiki/UTF-8). That means, I think, that if I can see and parse Plane 1 Unicode then I can see and parse Plane 1 through Plane 16. Thus all 17 planes are available to be seen and parsed by my CLI program. I was working on that while I was working on changing the code over to GUI. Multi-tasking. While waiting for my subconcious to catch up with the logic that I had studied of one part, I went on to another.

Thanks for trying to help. I do appreciate it.

C++ can be used to program a CNC machine as well as it can be used to read and parse Unicode. C++ is not a CNC machine and not part of Unicode. I got that already. But, thank you for offering that logic.

I am using C++ as a tool. I know that C++ is not Unicode nor the other way. It is OK. I am using one in reading the other.

I want to be able to drag-n-drop or copy-n-paste into a text box in my program using any and all Unicode (planes 0 through 16) and then parse that. That is very important. It is a foundation of what I am trying to learn to do.

Then I want to be able to parse those Unicode symbols both via full words and via separate symbols. I started this question's post asking how to go from a CLI interface to a GUI interface. I am still working on that, and over time I am starting to grasp more of the logic of placing the Unicode into a CreateWindowExW [thing] and working with it. (I am extremely new at this). I can work in each CLI and GUI separately, in Unicode, and I can parse wchar and wchar_t, but I was having problems converting from CLI to GUI.

Placing CLI output into a GUI text box should have been easy. I have done it before. But, the logic seemed to have disappeared. I went to do that and I could not do it. It is simple. I have done it before, but I do not remember how. So I asked here. With the wrong words. I might as well have asked Einstein why one plus one equals two, and watched him faint at the question.

Thanks.

Still working on that part of getting CLI output to a GUI text box. I am getting it though. Almost.

By the way, the Japanese figured out how to use Unicode almost as fast as it was released by the Unicode Consortium. I have attempted to ask the Japanese, but I do not (currently) understand the Japanese language. I asked the Unicode Consortium, but they do not seem to know how to use Unicode via C++, which I already have shown can be done with the previous post. Or whatever it was that is called, using, or abusing C++, etc.

Earlier I was asked "Why"? I have been giving some background into why. I hope that helps. It is a different subject from going from CLI output to GUI text box output, but it addresses, "Why".


So, I am here. Where giants of code live.

Again, thanks.
PotatoSoup 10-Dec-20 4:23am View    
I used some Plane 0 Unicode for testing. That was easy to do "abc123" etc.

I found some examples of Plane 1 Unicode at http://www.i18nguy.com/unicode-plane1-utf8.html which I used for testing by copying the Unicode characters and Unicode strings into my program. I verified that the bytes were correct which my program told me I was getting via https://onlineunicodetools.com/convert-unicode-to-bytes.

For example:
The Script Etruscan "𐌓𐌀𐌔𐌍𐌀" placed between two Plane 0 Unicode symbols "a" and "b" for testing "a𐌓𐌀𐌔𐌍𐌀b" gave me this:

My CLI program reported:

Unicode Plane 0 a =01100001
Unicode Plane 1 ≡ =11110000 É=10010000 î=10001100 ô=10010011
Unicode Plane 1 ≡ =11110000 É=10010000 î=10001100 Ç=10000000
Unicode Plane 1 ≡ =11110000 É=10010000 î=10001100 ö=10010100
Unicode Plane 1 ≡ =11110000 É=10010000 î=10001100 ì=10001101
Unicode Plane 1 ≡ =11110000 É=10010000 î=10001100 Ç=10000000
Unicode Plane 0 b =01100010


And the Script Linear B Syllabary "𐀶𐀪𐀰" etc gave me this:

Unicode Plane 0 a =01100001
Unicode Plane 1 ≡ =11110000 É=10010000 Ç=10000000 ╢=10110110
Unicode Plane 1 ≡ =11110000 É=10010000 Ç=10000000 ¬=10101010
Unicode Plane 1 ≡ =11110000 É=10010000 Ç=10000000 ░=10110000
Unicode Plane 0 b =01100010

I checked out my original question's string in my CLI program and found this:

For "I like cats = 猫が好きです": (English text and Japanese text)

Unicode Plane 0 I =01001001
Unicode Plane 0 =00100000 (There is a blank that is not showing)
Unicode Plane 0 l =01101100
Unicode Plane 0 i =01101001
Unicode Plane 0 k =01101011
Unicode Plane 0 e =01100101
Unicode Plane 0 =00100000
Unicode Plane 0 c =01100011
Unicode Plane 0 a =01100001
Unicode Plane 0 t =01110100
Unicode Plane 0 s =01110011
Unicode Plane 0 =00100000
Unicode Plane 0 = =00111101
Unicode Plane 0 =00100000
Unicode Plane 0 τ =11100111 î=10001100 ½=10101011
Unicode Plane 0 π =11100011 ü=10000001 î=10001100
Unicode Plane 0 σ =11100101 Ñ=10100101 ╜=10111101
Unicode Plane 0 π =11100011 ü=10000001 ì=10001101
Unicode Plane 0 π =11100011 ü=10000001 º=10100111
Unicode Plane 0 π =11100011 ü=10000001 Ö=10011001

Which is interesting, but it is all in Plane 0 Unicode.


So, C++11 can work with Plane 0 Unicode and with Plane 1 Unicode. But, I did not find symbols to test it with Plane 2 and above.

I looked for Plane 2 Unicode and above, but I did not find any Unicode characters and Unicode strings to use for testing. A lot is not filled in yet by the Unicode consortium but, maybe some in the later Planes might be. I am asking here if any of you know where I can find these symbols that I can copy and paste into my C++ code to check if I am getting the correct Byte returns.

I am doing this in CLI for now. I will try later to get it into a GUI and text box.

I am still a beginner. There is still lots of stuff that I do not know. I might have used incorrect terms. But at your level I hope you know what I am asking.


I typed this fast, and it is late ~4:37 am, I am tired, and I might have some spelling errors. Trying not to.


Constructive comments please.

Thank you all.
PotatoSoup 7-Dec-20 16:50pm View    
Thank you Richard MacCutchan and all the rest of you here.

PotatoSoup 7-Dec-20 16:03pm View    
Thank you.

I am retired from an industry where I was required to learn and comprehend and be able to apply very fast. Within minutes. I was a general industrial repair man, called in to repair or replace mistakes by others that often required me to learn new applications on-site almost immediately and fix or repair quickly at potential great cost to my employer if I failed.

I have found C++ to be an incredibly intense strain to success at that speed. I have found that with C++ I have NOT! This is a new world. A new foreign language of its own intensity and structure.

I worked with others that had the same demands upon them and we freely offered help to each other quickly without any judgmental comments. I am not yet used to those two changes. I will get there. Thus, in this new field, I shall adjust.

That all said, I think that I have fallen in love with C++ programming. I might even suggest that this is probably one of the best mental exercises that the retired could do.

I apologize. Your comments were, I hope, as among fellows with friendly teasing. I see that now. I again apologize. Please forgive me.

Thank you.

Thank you all.

Every one of you.

Thank you.
PotatoSoup 7-Dec-20 15:38pm View    
Looking at your example. Trying to understand what I have been reading about some of it.

(1)

I have been reading about WinMain vs wWinMain and I am not certain that I understand the best application for each.

Microsoft uses, "PWSTR pCmdLine," and "pCmdLine contains the command-line arguments as a Unicode string." ( https://docs.microsoft.com/en-us/windows/win32/learnwin32/winmain--the-application-entry-point ). I was using, "LPSTR lpCmdLine," because I thought it was better for both CLI and GUI. Now that I read that particular Microsoft page, I think that wWinMain with PWSTR is preferred for CLI and not for GUI. Which is it?



Also,
(2)

Which is better to use WinMain or wWinMain for my program if I want it to be later compiled on 32 bit Windows like XP 32 bit, and later compiled on 64 bit Windows 10? Or should I use WinMain on one and wWinMain on the other?




I see that you used PWSTR in your wWinMain where I used LPSTR in WinMain. Other than the [...]Main parts:

I am not certain what I am being told by Microsoft for the following.


(3)

Microsoft says, that PWSTR is "A 64-bit unsigned integer." ( https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types?redirectedfrom=MSDN ) I think that a DWORD is a 64-bit unsigned integer, which I thought works in the old XP 32 bit operating system. But, I see it here, so I am now not so certain if I correctly understand it. Does that mean that if I want my programs to be backward compatible to 32 bit systems like XP 32 bit, that I should not use this? I am not clear on what Microsoft is saying here.



(4)

I was reading that for "LPWSTR and PWSTR in the Windows Data Types": ( https://social.msdn.microsoft.com/Forums/vstudio/en-US/52ab8d94-f8f8-427f-ad66-5b38db9a61c9/difference-between-lpwstr-and-pwstr ) "Yes, they are the same. The L in LPWSTR stands for "long/far pointer" and it is a leftover from 16 bit when pointers were "far" or "near". Such a distinction no longer exists on 32/64 bit, all pointers have the same size." Does that mean that I should use PWSTR in both 32 bit and 64 bit C++ code?


(5)

Microsoft says, "The LPSTR type and its alias PSTR specify a pointer to an array of 8-bit characters, which MAY be terminated by a null character." ( https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/3f6cc0e2-1303-4088-a26b-fb9582f29197 )
and, "The format of the characters MUST be specified by the protocol that uses them. Two common 8-bit formats are ANSI and UTF-8."

Microsoft says, "The LPWSTR type is a 32-bit pointer to a string of 16-bit Unicode characters, which MAY be null-terminated."

If I use in my Unicode string parsing (whether by the actual characters or by the hex(?) number) UTF-8 exclusively, does that mean that for me LPSTR is better?

If I use similar to the previous question, but more than UTF-8 (also UTF-16 and UTF-32), does that mean that for me LPWSTR is better?


Thank you.