|I have some CSV files encoded in GBK, aka codepage 936, and need to load them as strings (or something sufficiently string-like, whatever) for further processing. In the old days, I could call some function such as
File.ReadAllText (or read the file line by line etc) and specify CP936 as the encoding. But in .NET 6, I can't. The only valid options are ASCII, Latin 1, and a couple of flavours of UTF.
That sounds unlikely, right? But here is the documentation for the Encoding class, and in the big table halfway down the page, you can see that almost everything is gone. Almost as if the thinking is now "people should just use UTF-8 or UTF-16 nowadays". If it were up to me, those file would be encoded in UTF-8, but they're just not.
So, right now what I do is this, assuming that I've read the file into an array
byte raw and
int size bytes were successfully read into it:
char buffer = new char[size];
fixed (char* bufferptr = buffer)
fixed (byte* rawptr = raw)
numberOfChars = MultiByteToWideChar(936, 0, rawptr, readSize, bufferptr, buffer.Length);
Calling MultiByteToWideChar via a dllimport. Then afterwards I can use
numberOfChars to create a
Span<char> of the appropriate length.
That works, but it seems like a serious step backwards compared to .NET 4. Also there seems to be no way (no reasonable way anyway) to read/convert chunks of the file this way, as
MultiByteToWideChar does not report the leftover bytes at the end of the chunk.
Are there any better options?