Click here to Skip to main content
15,075,394 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
Hello,

I am currently in a Charset hell.

Basically i am working on a database driven site. Not the site contains lot of pages coming directly from the database and few are static html.

Now db charset is -
Default character set = utf8
Default collation = utf8_general_ci


In front end -

if I set charset to utf8_general_ci, then content coming from db works fine but the static pages show weird characters on hyphen etc.

Now if I set charset to utf-8 instead, then it works on static html pages but displays weird question mark symbols.

That's not it -

In CMS,

if i set to utf8_general_ci, content gets deleted when editor tries to add -- (ndash) but chaning it to utf8 sorts this problem out but there are still few html characters like greater than equal to etc that doesn't work.

Any ideas what are we doing wrong here?

Thanks
Posted
Comments
BillWoodruff 13-Nov-14 8:35am
   
If you are using a CMS, please state which one: that might help.
RedDk 13-Nov-14 13:47pm
   
I'd suggest going a step further into the FAIL state and try, as you did with utf-8 default, an even more general but inherently more ubiquitous SQL/codepage entity Latin1_General_CI_AS or something along this line. When I get into the hell of TSQL and can't get out I usually exercise that "dig in". At least that way, one programs to exhaustion. Sleep on that.

Then possibly refresh your approach tomorrow.

1 solution

Tough to answer as you haven't stated what the frontend is.
But if you look at the settings you can do in the Web.Config file when using ASP.Net you might get an idea on what you can change.
XML
<configuration>
  <system.web>
    <globalization>
      fileEncoding="utf-8"
      requestEncoding="utf-8"
      responseEncoding="utf-8"
      culture="en-US"
      uiCulture="de-DE"
    </globalization>
  </system.web>
</configuration>
The key here is that you can set the file encoding, alternatively you can open the static pages in for example notepad++ and save them using the same encoding as you use in the database.

BTW, you should use UTF8_Unicode_CI instead of UTF8_General_CI as it's only about 10% slower when sorting but does it correctly for doublebyte characters
   
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900