I'm seriously struggling to parse a line from a UTF-8 file into an array of strings.
My file has content:
The first name "John" starts at the 10th position. (Here the 2nd character is UTF-8 U+022F.)
In code I need to do
to get "John", where it should be
with normal characters.
My question is of course how to detect that I need to do a Substring of 11 instead of 10, in this case?
I tried things like
If Not System.Text.Encoding.UTF8.GetCharCount(System.Text.Encoding.UTF8.GetBytes(LineRead)) = System.Text.Encoding.UTF8.GetByteCount(LineRead) Then
but that's also the case for "à" which counts as only 1 in String.Length but has 2 bytes in UTF-8...
How to handle common cases like this?
How to prevent splitting up bytes of 1 character into several wrong characters? That way I could progress through the string character by character and count them?
Thanks in advance!