Click here to Skip to main content
Click here to Skip to main content

Chinese Style Converter & Services

, 6 Dec 2003
Rate this:
Please Sign up or sign in to vote.
The converter for Chinese style & encoding, including the services build on it

Setup Package (260KB)

Download Source Codes (73.7KB)

Introduction in Chinese (Content little different with English introduction, especially for Chinese!) (47.9KB)

<!-- Article image -->

Sample screenshot

<!-- Add the rest of your HTML here -->

Introduction

Microsoft .net framework is very powerful, not only the design, but also the functions provided by it.  China is an old country; it has special & beautiful culture. One of the interesting points is its characters; each character is like a picture, very different with letters such as English.  And the people of it make the characters – Chinese to two kinds of styles: Simplified & Traditional. And this situation is also taken cared by .net framework; it provides the function in Microsoft.VisualBasic.Strings.StrConv method.

       This document is talking about using this feature and giving ideas to make this function to be the service (since we are living in the Service Oriented Programming world).  Also, I am giving the example for packaging them into setup package, wish you come with me and enjoy it.

Let us get started with the converting of Chinese style at first:

l         Chinese Style Converting

The styles of Chinese (Simplified & Traditional) could be very different (even some of them would be the same looking). 

Taking example, for the meaning: China is a beautiful country.  It would be:

中国是一个美丽的国家。 (In Simplified Chinese)

中國是一個美麗的國家。 (In Traditional Chinese)

But they ARE the same meaning and almost every Chinese know each corresponding character of the above sentences are the same even the characters having the difference looking.

Some Chinese are familiar with one style while the other likes the other.  This brings the people working in IT need to provide some function to transfer the looking of the characters between these two styles.  There are already many tools to provide this feature, but for the easy & free tool in the world of .net, it is the Microsoft.VisualBasic.Strings.StrConv function in the Microsoft .net framework. 

We just need to use this function in Microsoft.VisualBasic.Strings.StrConv (string sourceString, Microsoft.VisualBasic.VbStrConv targetStyle, int localID) format, then the style of the sourceString would be changed to specified target style and return as a string.  Here is the example:

using Microsoft.VisualBasic;

……

string source = “中国是一个美丽的国家。”;

string target = Strings.StrConv(source, VbStrConv.TraditionalChinese, 0);

 

Then the result (target string) would be中國是一個美麗的國家。.

If wish to convert to Simplified Chinese style, then just simply using the VBStrConv.SimplifiedChinese as the second parameter of the StrConv function.

It is easy, right ?  Yes, and that is all for the funny of converting for Chinese styles.

 

l         Encoding converting

We are living in the world with many languages.  Each language might have different looking.  And for making it to be electric data, in the past days, many languages have its own coding system.  Then to now we even got Unicode coding system that trying to include almost all languages, and even for Unicode, we also got UTF-8, UTF-16 and UTF-7 format.

This might be a headache but it was true.  And Even for Chinese, the two style also has two different coding system, one is codepage 936 (the mostly common used coding system for Simplified Chinese, but sure, there are also other code pages for this style of Chinese), the other is codepage 950 (the mostly common used coding system for Traditional Chinese, and again, there are also other code pages for this style of Chinese too).

Taking the previous example, let us look at the following table:

 

936

D6D0

B9FA

CAC7

D2BB

B8F6

C3C0

C0F6

B5C4

B9FA

BCD2

UTF-8

E4B8AD

E59BBD

E698AF

E4B880

E4B8AA

E7BE8E

E4B8BD

E79A84

E59BBD

E5AEB6

UTF-16

2D4E

FD56

2F66

004E

2A4E

8E7F

3D4E

8476

FD56

B65B

 

936

D6D0

87F8

CAC7

D2BB

8280

C3C0

FB90

B5C4

87F8

BCD2

950

A4A4

B0EA

AC4F

A440

ADD3

ACFC

C452

AABA

B0EA

AE61

UTF-8

E4B8AD

E59C8B

E698AF

E4B880

E5808B

E7BE8E

E9BA97

E79A84

E59C8B

E5AEB6

UTF-16

2D4E

0B57

2F66

004E

0B50

8E7F

979E

8476

0B57

B65B

Do you see? Even for the simply Chinese, they could be almost different in each coding system.  But don’t worry, most programmers already had the lesson for dealing with the encoding.  In Microsoft .net framework, dealing with encoding converting is quite simple, just need to call System.Text.Encoding.Convert method is OK. The usage of it could be System.Text.Encoding.Convert(Encoding source, Encoding target, byte[] source Bytes) (this method is overloaded, please refer to the document for more detail), for example:

using System.Text;

……

byte[] source = Encoding.Unicode.GetBytes(“中国是一个美丽的国家”);

byte[] target = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, source);

 

Then we got the target bytes array from Unicode to UTF-8 encoding format.

              Not to had, but just for the simply string, we got two pages to describe the basic knowledge for style and encoding converting.  Now let us put them together.

 

l         Put style & encoding converting together

I wish to create one class that could let the programmer to call it to do the converting both for the Chinese style & encoding at just one call.  And since the source would be in many format (string, bytes array, stream and file format), so I am making several overloaded methods for the different scenarios.  This is the many to many relation scenario, with these overloaded methods, we could convert the Chinese from many formats to the target Style and specified encoding, in specified format of cause.

I am not going to copy the full code here, but just making the screen shoot here, for the more detail of the code, please view the code in file Converter.cs(I’ve make this class to inherited from System.MarshalByRefObject for can hosting it in .net remoting).

figure1

Let us take a deeper look in the core conversion parts:

figure2

If the user wants to change the style then we need to convert the source bytes to target style, then we need to do some works for the adjusting conversion from Simplified to Traditional Chinese (I will describe this in more detail later).  In the end, we change the Unicode bytes to target encoding and return the result. 

l         Adjusting the conversion from Simplified to Traditional Chinese

There is some trick while we convert the Simplified Chinese to Traditional Chinese.  The same characters in Simplified style would be difference after converted to Traditional style for different cases, for example:

Simplified Chinese

Traditional Chinese

头发

頭髮

发达

發達

注意

注意

备注

備註

So we need to make the conversion table and save it to XML file, read it & store in memory, then before doing the encoding conversion we do the replacing works.  Yes, this would need us to add more conversion cases to the XML file day by day, but since Microsoft doesn’t do this for us, so we just need to take care of it.

While doing the replacing works, we would like to take care of the case for spaces between Chinese characters for adjusting format, such as头发 &   .  So I made the

figure3

function to taking care about this case (just trying to forward the pointer to check does if the next byte is space.

Since we would modify the XML file on the fly, so I also create the ChineseMappingXMLFilename property and create a setting class with FileWatcher to monitor the file specified in ChineseMappingXMLFilename property, then this Converter class could reflect the changes of the XML file on the fly.

l         Making the consuming tool

After we get the Converter class, we would like to use it.  I just create a simply Conversion Pad for using it.  I create another winform control library to use it and make another winform application to use this winform control.  And add the ability for this winform application to consume the conversion service in 3 ways: local assembly, .net remoting, Web Service.

 

    That is OK for introducing the conversion of Chinese style & encoding.  Now, we would going to enjoy how could making it to be a service.  There are many ways to provide service to other programs, I would like to simply making them to be COM+ Application, .net remoting, Web Service. 

l         COM+ Application

COM+ Application is very useful in DCOM architecture world. By making the conversion to be host in COM+ Application, we could consume it in legacy programs (written in VB6, VFP, or other COM programming language program) or even the VB script!  Making a COM interface, then the wrapper class inherited from System.EnterpriseServices.ServicedComponent, then also set other attributes for COM+ Application.  Then it is OK.  But if we wish do helps for implementation, we would like to create an Installer class for creating & remove the COM+ Application, just adding few lines code then it is fine:

figure4

      Then we could enjoy this COM+ Application after deploy it with the simple scripts:

dim objConverter

set objConverter  = createobject("ChineseUtilComPlus.ConverterService")

msgbox objConverter.StringToStringEncodingIntegerFormat("中国是一个美丽的国家", 2, 936, 65001, 936)

set objConverter = nothing

l         .net remoting

We often use IIS or Windows Service to host the .net remoting service, I would like to make the Converter class to be hosted by Windows Service.  After building wrapper class inherited from System.ServiceProcess.ServiceBase, and add the ability for specifying Chinese mapping XML file in configuration file, I need to do more works for listening the request in HTTP & TCP channel and also setting them could be config in the configuration file. 

figure5

Then it is OK.  Also I’ve made the Installer class for making implementation more easier. It is easy for Windows Service Installer, just follow the example code in MSDN is enough, no need to care about any more.

l         Web Service

Making Converter class to be hosted in Web Service would be fine for consuming it in other soap compatible program.  Before making the wrapper class, I give up to making the System.Text.Encoding can be serialized to XML format.  So I would like the Converter could be called by the methods that specifying encoding in int or string format, then try to using System.Text.Encoding.GetEncoding function to get the System.Text.Encoding instance.  For Web Service, we had to give up more thing such as not going to expose the stream or file format conversion because of the serialize & security problem.  So in the end, I only expose the conversion for bytes array and string. 

 

l         Deployment

Building service is easy, since all need to do is follow the guide line in MSDN, but for the easy of deployment, the MSDN does not such help in using Visual Studio.net 2003 to build setup package in Windows Installer format.

It took me much time to study and try using VS.net 2003 to making one setup package that can deploy all services or selected services.  Finally, I have the my design for making Merge Module for each service (including my example conversion pad too), then make one Setup Package project to include all merge modules, and plus some selection interface.  Using the properties of selection interface, I can filter the only selected merger module to be installed to the computer, that makes the target for install the only selected service.

I am not going to use any other setup tools since I wish to keep using all solution come from Microsoft and once you have the VS.net 2003 and the rest is quite easy.

 

l         What could be more ?

Yes, after so many pages of instruction of my Chinese Converter utility, you might be boring & tired.  But I suggest you to take a look at the code & source of setup package while you wish to get some idea about making a function to be a service.  And you would found that, since lack of time, I only make the simply codes and many parts of them could be made more beautiful. 

Such as we could make more selection interface to let the user to select using both HTTP or TCP protocols or only one of them to listen the request; specifying the port by enter the number in text boxes;  making asynchronous functions for Converter class; adding conversion for all files in a folder in the conversion pad winform and more…

 

That is all at this moment, maybe I would spend more time of this utility in the coming months.  So please stay tune for this free Chinese conversion utility.  And have a nice dream after read my bad & boring English instruction.

 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Share

About the Author

Kent Liu
Chief Technology Officer
China China
No Biography provided

Comments and Discussions

 
GeneralAlways showing 㼿㼿㼿㼿㼿 Pinmembergordon_matt24-Dec-10 6:11 
Ni hao Kent,
 
It seems you have done some realy good work here, but I am having some problems getting it to work.
 
I tried to tranlate this text:
中国是一个美丽的国家
 
but I always get: 㼿㼿㼿㼿㼿
 
It is the same for all chinese text I try to translate and for all encodings.
 
I found that I can make it work if I change a small thing in "convertChinese()" method in ConvertControl.cs file.
 
If I change:
 
result = _converter.StringToString(chineseString,
   getTargetChineseStyle(), getSourceEncoding(), getTargetEncoding(),
   System.Text.Encoding.Default);
 
to:
 
result = _converter.StringToString(chineseString,
   getTargetChineseStyle(), getSourceEncoding(), getTargetEncoding(),
   System.Text.Encoding.Unicode);
 
then it works ok. But I am worried that maybe this is not correct and sometimes again it not work. Can you please help me to fix this problem? Is the solution above acceptable?
 
Xie xie ni Smile | :)
 
Matt
AnswerRe: Always showing 㼿㼿㼿㼿㼿 PinmemberKent Liu26-Dec-10 16:10 
GeneralRe: Always showing 㼿㼿㼿㼿㼿 Pinmembergordon_matt26-Dec-10 19:54 
GeneralConvert number to Chinese PinmemberPatrick wong29-May-09 11:21 
GeneralRe: Convert number to Chinese PinmemberKent Liu25-Oct-10 23:29 
QuestionWhere can I translate strings into Chinese in .Net C#? PinmemberAndrew Praetorian17-Aug-07 12:03 
AnswerRe: Where can I translate strings into Chinese in .Net C#? PinmemberKent Liu25-Oct-10 23:38 
QuestionConverting Big5 code to GB code Pinmembermakinha13-Jul-07 0:39 
AnswerRe: Converting Big5 code to GB code PinmemberKent Liu13-Jul-07 16:36 
QuestionConverting application to Chinese PinmemberTien Pham6-Jun-07 5:29 
AnswerRe: Converting application to Chinese PinmemberKent Liu6-Jun-07 16:01 
GeneralConvert UTF-8 code to Text PinmemberKLKO14-May-07 20:58 
GeneralRe: Convert UTF-8 code to Text PinmemberKent Liu15-May-07 16:17 
GeneralChinse Style Converter Install Error Pinmemberethan nobody13-Mar-07 9:31 
GeneralRe: Chinse Style Converter Install Error PinmemberKent Liu14-Mar-07 5:22 
GeneralPrint the result after conversion Pinmembermakinha15-Feb-07 21:14 
GeneralRe: Print the result after conversion PinmemberKent Liu16-Feb-07 17:43 
Generalsql 2005 help me please Pinmemberserhaneker14-Jan-07 0:50 
GeneralRe: sql 2005 help me please PinmemberKent Liu20-Jan-07 18:04 
Generaldifferent between ChineseMappingStringPairs and FixingBugChineseStrings Pinmembermakinha30-Nov-06 17:32 
AnswerRe: different between ChineseMappingStringPairs and FixingBugChineseStrings PinmemberKent Liu1-Dec-06 0:04 
GeneralRe: different between ChineseMappingStringPairs and FixingBugChineseStrings Pinmembermakinha3-Dec-06 23:09 
AnswerRe: different between ChineseMappingStringPairs and FixingBugChineseStrings PinmemberKent Liu5-Dec-06 15:40 
Generalincorrect result under win2003, but winXP works fine Pinmembermakinha28-Nov-06 21:53 
GeneralRe: incorrect result under win2003, but winXP works fine PinmemberKent Liu29-Nov-06 14:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141223.1 | Last Updated 7 Dec 2003
Article Copyright 2003 by Kent Liu
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid