UTF-8 issue in .rtf function of richtextbox

Question

0.00/5 (No votes)

See more:

Hi everyone,

I have a problem in passing utf8 character into .rtf function of richtextbox object

example:
------------------
Dim utf8 as String = the quick brown Ǣ fox jump
richtextbox.rtf = utf8
-------------------

In above scenario, once you view the richtextbox, the Ǣ will be converted into one question mark.

anybody can suggest on how to perfectly render the utf8 characters upon passing it into the .rtf function of richtextbox?

I need to pass it in .rtf since i need to maintain the emphasis type of each word.

Thank you,
Chaegie

Posted 20-Jun-11 21:19pm

Chaegie

Add a Solution

Comments

Sergey Alexandrovich Kryukov 21-Jun-11 4:31am

For starters, string is represented in memory not as UTF-8 but as UTF-16. Did you know that?
--SA

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2011-06-20T22:45:00

I found it: RTF is damn archaic!

It works with Unicode but internally convert it in legacy Windows format based on code pages (as of Windows 95 and earlier), 8 bits per characters, but the fragment of text is prefixed with the Windows code. I saw this encoding when I types Unicode characters (code points above FF) in WordPad. If also supports Unicode escapes like \u1234?. The question marks denoted fallback for the programs not supporting Unicode ("show ?"). Also RTF does not support code points beyond BOM in any way (by all new versions of Windows do support it via surrogate pairs; as far as I remember — since some service pack of Windows 2000).

You can simply try type what you need in WordPad, save the file and open it with the plain-text editor, to make sure your understand how it should be coded in all details. I just did it.

See http://en.wikipedia.org/wiki/Rich_Text_Format[^].

Why using such archaic legacy stuff? What's wrong with HTML? PDF, to the worst end?

—SA