Introduction
This sample shows how to put your text completely into the operation system or database which does not support the text's character set.
Background
In web application development, frequently I have to connect each kind of old database and OS from customers such as SCO5.05 which does not support UTF-8, GB2312 and other character sets. So, how to completely store or take out my text data has became an important job.
Once on a project, I needed to put some Chinese words in UTF-8 into the system of SCO5.05 + Informix7.3. But when I check the database, found that all characters were changed into "->" (\0x7F) in fact. Many ways I tried, but ever could not solve this problem.
Why?
I found the answer later: this is the trouble of character encoding.
Open the file named web.config
in the ASP.NET project. The value of requestEncoding
attribute in globalization
element is "utf-8". It means the requested texts were encoded as UTF-8 character set. Because SCO5.05 does not support UTF-8, therefore the requested texts where changed.
I got it. The texts should be encoded into the western language (iso8859-1) which SCO5.05 can distinguish from UTF-8 before saving, and converted back after loading.
Solution code
For example, to put the message "���Pi(\u03a0)", means "Hello Pi(��)", into "memo" field of database, use the following code:
string unicodeStr = "���Pi(\u03a0)";
OdbcConnection conn = new OdbcConnection();
System.Data.IDbCommand cmd = conn.CreateCommand();
conn.ConnectionString = "your connection string";
cmd.Connection = conn;
cmd.CommandText = "INSERT INTO encoding VALUES ('"
+ CEncoding.unicode_iso8859(unicodeStr) + "')";
cmd.Connection = conn;
conn.Open();
cmd.ExecuteNonQuery();
conn.Close();
I used the function unicode_iso8859()
above. It can convert the texts from UTF-8 to ISO8859-1.
public static string unicode_iso8859(string src) {
Encoding iso = Encoding.GetEncoding("iso8859-1");
Encoding unicode = Encoding.UTF8;
byte[] unicodeBytes = unicode.GetBytes(src);
return iso.GetString(unicodeBytes);
}
public static string iso8859_unicode(string src) {
Encoding iso = Encoding.GetEncoding("iso8859-1");
Encoding unicode = Encoding.UTF8;
byte[] isoBytes = iso.GetBytes(src);
return unicode.GetString(isoBytes);
}
Select your database and take a look. Is that all the texts converted into ISO symbol which you do not recognized?
Then you can convert back reversely by using iso8859_unicode()
function. Of course, you can convert back with other encodings as you want.
If you are using an adapter and binding a DataSet
to a DataGrid
, it is easy to encode the data with these two methods, too. But you will pay the cost of more time. Use it or not? It is under your own judgment. J
OdbcAdapter adapter = new OdbcAdapter();
DataSet1 ds = new DataSet1();
DataGrid grid = new DataGrid();
OdbcConnection conn = new OdbcConnection();
conn.ConnectionString = "your connection string";
adapter.Connection = conn;
adapter.Fill(ds);
string xml = ds.GetXml();
ds.Clear();
ds.ReadXml(new System.IO.StringReader(CEncoding.iso8859_unicode(xml)));
grid.DataBind();