Click here to Skip to main content
15,879,535 members
Articles / Programming Languages / C#
Article

Chinese Style Converter & Services

Rate me:
Please Sign up or sign in to vote.
4.76/5 (34 votes)
6 Dec 200311 min read 622K   5.3K   72   119
The converter for Chinese style & encoding, including the services build on it

Setup Package (260KB)

Download Source Codes (73.7KB)

Introduction in Chinese (Content little different with English introduction, especially for Chinese!) (47.9KB)

Sample screenshot

Introduction

Microsoft .net framework is very powerful, not only the design, but also the functions provided by it.  <st1:country-region><st1:place>China is an old country; it has special & beautiful culture. One of the interesting points is its characters; each character is like a picture, very different with letters such as English.  And the people of it make the characters – Chinese to two kinds of styles: Simplified & Traditional. And this situation is also taken cared by .net framework; it provides the function in Microsoft.VisualBasic.Strings.StrConv method.

       This document is talking about using this feature and giving ideas to make this function to be the service (since we are living in the Service Oriented Programming world).  Also, I am giving the example for packaging them into setup package, wish you come with me and enjoy it.

Let us get started with the converting of Chinese style at first:

l         Chinese Style Converting

The styles of Chinese (Simplified & Traditional) could be very different (even some of them would be the same looking). 

Taking example, for the meaning: <st1:country-region><st1:place>China is a beautiful country.  It would be:

中国是一个美丽的国家。 (In Simplified Chinese)

中國是一個美麗的國家。 (In Traditional Chinese)

But they ARE the same meaning and almost every Chinese know each corresponding character of the above sentences are the same even the characters having the difference looking.

Some Chinese are familiar with one style while the other likes the other.  This brings the people working in IT need to provide some function to transfer the looking of the characters between these two styles.  There are already many tools to provide this feature, but for the easy & free tool in the world of .net, it is the Microsoft.VisualBasic.Strings.StrConv function in the Microsoft .net framework. 

We just need to use this function in Microsoft.VisualBasic.Strings.StrConv (string sourceString, Microsoft.VisualBasic.VbStrConv targetStyle, int localID) format, then the style of the sourceString would be changed to specified target style and return as a string.  Here is the example:

using Microsoft.VisualBasic;

……

string source = “中国是一个美丽的国家。”;

string target = Strings.StrConv(source, VbStrConv.TraditionalChinese, 0);

<o:p> 

Then the result (target string) would be中國是一個美麗的國家。.

If wish to convert to Simplified Chinese style, then just simply using the VBStrConv.SimplifiedChinese as the second parameter of the StrConv function. <o:p>

It is easy, right ?  Yes, and that is all for the funny of converting for Chinese styles.<o:p>

<o:p> 

l         Encoding converting

We are living in the world with many languages.  Each language might have different looking.  And for making it to be electric data, in the past days, many languages have its own coding system.  Then to now we even got Unicode coding system that trying to include almost all languages, and even for Unicode, we also got UTF-8, UTF-16 and UTF-7 format.

This might be a headache but it was true.  And Even for Chinese, the two style also has two different coding system, one is codepage 936 (the mostly common used coding system for Simplified Chinese, but sure, there are also other code pages for this style of Chinese), the other is codepage 950 (the mostly common used coding system for Traditional Chinese, and again, there are also other code pages for this style of Chinese too).

Taking the previous example, let us look at the following table:

<o:p> 

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

936<o:p>

D6D0<o:p>

B9FA<o:p>

CAC7<o:p>

D2BB<o:p>

B8F6<o:p>

C3C0<o:p>

C0F6<o:p>

B5C4<o:p>

B9FA<o:p>

BCD2<o:p>

UTF-8<o:p>

E4B8AD<o:p>

E59BBD<o:p>

E698AF<o:p>

E4B880<o:p>

E4B8AA<o:p>

E7BE8E<o:p>

E4B8BD<o:p>

E79A84<o:p>

E59BBD<o:p>

E5AEB6<o:p>

UTF-16<o:p>

2D4E<o:p>

FD56<o:p>

2F66<o:p>

004E<o:p>

2A4E<o:p>

8E7F<o:p>

3D4E<o:p>

8476<o:p>

FD56<o:p>

B65B<o:p>

<o:p> 

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

<o:p>

936<o:p>

D6D0<o:p>

87F8<o:p>

CAC7<o:p>

D2BB<o:p>

8280<o:p>

C3C0<o:p>

FB90<o:p>

B5C4<o:p>

87F8<o:p>

BCD2<o:p>

950<o:p>

A4A4<o:p>

B0EA<o:p>

AC4F<o:p>

A440<o:p>

ADD3<o:p>

ACFC<o:p>

C452<o:p>

AABA<o:p>

B0EA<o:p>

AE61<o:p>

UTF-8<o:p>

E4B8AD<o:p>

E59C8B<o:p>

E698AF<o:p>

E4B880<o:p>

E5808B<o:p>

E7BE8E<o:p>

E9BA97<o:p>

E79A84<o:p>

E59C8B<o:p>

E5AEB6<o:p>

UTF-16<o:p>

2D4E<o:p>

0B57<o:p>

2F66<o:p>

004E<o:p>

0B50<o:p>

8E7F<o:p>

979E<o:p>

8476<o:p>

0B57<o:p>

B65B<o:p>

Do you see? Even for the simply Chinese, they could be almost different in each coding system.  But don’t worry, most programmers already had the lesson for dealing with the encoding.  In Microsoft .net framework, dealing with encoding converting is quite simple, just need to call System.Text.Encoding.Convert method is OK. The usage of it could be System.Text.Encoding.Convert(Encoding source, Encoding target, byte[] source Bytes) (this method is overloaded, please refer to the document for more detail), for example:

using System.Text;

……

byte[] source = Encoding.Unicode.GetBytes(“中国是一个美丽的国家”);

byte[] target = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, source);

<o:p> 

Then we got the target bytes array from Unicode to UTF-8 encoding format.

              Not to had, but just for the simply string, we got two pages to describe the basic knowledge for style and encoding converting.  Now let us put them together.

<o:p> 

l         Put style & encoding converting together

I wish to create one class that could let the programmer to call it to do the converting both for the Chinese style & encoding at just one call.  And since the source would be in many format (string, bytes array, stream and file format), so I am making several overloaded methods for the different scenarios.  This is the many to many relation scenario, with these overloaded methods, we could convert the Chinese from many formats to the target Style and specified encoding, in specified format of cause.

I am not going to copy the full code here, but just making the screen shoot here, for the more detail of the code, please view the code in file Converter.cs(I’ve make this class to inherited from System.MarshalByRefObject for can hosting it in .net remoting).

figure1

<v:shapetype id="_x0000_t75" stroked="f" filled="f" path="m@4@5l@4@11@9@11@9@5xe" o:preferrelative="t" o:spt="75" coordsize="21600,21600"><v:stroke joinstyle="miter"><v:formulas><v:f eqn="if lineDrawn pixelLineWidth 0"><v:f eqn="sum @0 1 0"><v:f eqn="sum 0 0 @1"><v:f eqn="prod @2 1 2"><v:f eqn="prod @3 21600 pixelWidth"><v:f eqn="prod @3 21600 pixelHeight"><v:f eqn="sum @0 0 1"><v:f eqn="prod @6 1 2"><v:f eqn="prod @7 21600 pixelWidth"><v:f eqn="sum @8 21600 0"><v:f eqn="prod @7 21600 pixelHeight"><v:f eqn="sum @10 21600 0"><v:path o:connecttype="rect" gradientshapeok="t" o:extrusionok="f"><o:lock aspectratio="t" v:ext="edit">

Let us take a deeper look in the core conversion parts:

<o:p>

figure2

If the user wants to change the style then we need to convert the source bytes to target style, then we need to do some works for the adjusting conversion from Simplified to Traditional Chinese (I will describe this in more detail later).  In the end, we change the Unicode bytes to target encoding and return the result. 

l         Adjusting the conversion from Simplified to Traditional Chinese

There is some trick while we convert the Simplified Chinese to Traditional Chinese.  The same characters in Simplified style would be difference after converted to Traditional style for different cases, for example:

Simplified Chinese

Traditional Chinese

头发

頭髮

发达

發達

注意

注意

备注

備註

So we need to make the conversion table and save it to XML file, read it & store in memory, then before doing the encoding conversion we do the replacing works.  Yes, this would need us to add more conversion cases to the XML file day by day, but since Microsoft doesn’t do this for us, so we just need to take care of it.

While doing the replacing works, we would like to take care of the case for spaces between Chinese characters for adjusting format, such as头发 &   .  So I made the

figure3

function to taking care about this case (just trying to forward the pointer to check does if the next byte is space.

Since we would modify the XML file on the fly, so I also create the ChineseMappingXMLFilename property and create a setting class with FileWatcher to monitor the file specified in ChineseMappingXMLFilename property, then this Converter class could reflect the changes of the XML file on the fly.

l         Making the consuming tool

After we get the Converter class, we would like to use it.  I just create a simply Conversion Pad for using it.  I create another winform control library to use it and make another winform application to use this winform control.  And add the ability for this winform application to consume the conversion service in 3 ways: local assembly, .net remoting, Web Service.

<o:p> 

    That is OK for introducing the conversion of Chinese style & encoding.  Now, we would going to enjoy how could making it to be a service.  There are many ways to provide service to other programs, I would like to simply making them to be COM+ Application, .net remoting, Web Service. 

l         COM+ Application

COM+ Application is very useful in DCOM architecture world. By making the conversion to be host in COM+ Application, we could consume it in legacy programs (written in VB6, VFP, or other COM programming language program) or even the VB script!  Making a COM interface, then the wrapper class inherited from System.EnterpriseServices.ServicedComponent, then also set other attributes for COM+ Application.  Then it is OK.  But if we wish do helps for implementation, we would like to create an Installer class for creating & remove the COM+ Application, just adding few lines code then it is fine:

figure4

      Then we could enjoy this COM+ Application after deploy it with the simple scripts:

dim objConverter

set objConverter  = createobject("ChineseUtilComPlus.ConverterService")

msgbox objConverter.StringToStringEncodingIntegerFormat("中国是一个美丽的国家", 2, 936, 65001, 936)

set objConverter = nothing<o:p>

l         .net remoting

We often use IIS or Windows Service to host the .net remoting service, I would like to make the Converter class to be hosted by Windows Service.  After building wrapper class inherited from System.ServiceProcess.ServiceBase, and add the ability for specifying Chinese mapping XML file in configuration file, I need to do more works for listening the request in HTTP & TCP channel and also setting them could be config in the configuration file. 

figure5

Then it is OK.  Also I’ve made the Installer class for making implementation more easier. It is easy for Windows Service Installer, just follow the example code in MSDN is enough, no need to care about any more.<o:p>

l         Web Service

Making Converter class to be hosted in Web Service would be fine for consuming it in other soap compatible program.  Before making the wrapper class, I give up to making the System.Text.Encoding can be serialized to XML format.  So I would like the Converter could be called by the methods that specifying encoding in int or string format, then try to using System.Text.Encoding.GetEncoding function to get the System.Text.Encoding instance.  For Web Service, we had to give up more thing such as not going to expose the stream or file format conversion because of the serialize & security problem.  So in the end, I only expose the conversion for bytes array and string. 

<o:p> 

l         Deployment

Building service is easy, since all need to do is follow the guide line in MSDN, but for the easy of deployment, the MSDN does not such help in using Visual Studio.net 2003 to build setup package in Windows Installer format.

It took me much time to study and try using VS.net 2003 to making one setup package that can deploy all services or selected services.  Finally, I have the my design for making Merge Module for each service (including my example conversion pad too), then make one Setup Package project to include all merge modules, and plus some selection interface.  Using the properties of selection interface, I can filter the only selected merger module to be installed to the computer, that makes the target for install the only selected service.

I am not going to use any other setup tools since I wish to keep using all solution come from Microsoft and once you have the VS.net 2003 and the rest is quite easy.

<o:p> 

l         What could be more ?

Yes, after so many pages of instruction of my Chinese Converter utility, you might be boring & tired.  But I suggest you to take a look at the code & source of setup package while you wish to get some idea about making a function to be a service.  And you would found that, since lack of time, I only make the simply codes and many parts of them could be made more beautiful. 

Such as we could make more selection interface to let the user to select using both HTTP or TCP protocols or only one of them to listen the request; specifying the port by enter the number in text boxes;  making asynchronous functions for Converter class; adding conversion for all files in a folder in the conversion pad winform and more…

<o:p> 

That is all at this moment, maybe I would spend more time of this utility in the coming months.  So please stay tune for this free Chinese conversion utility.  And have a nice dream after read my bad & boring English instruction.<o:p>

 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Program Manager
China China
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
AnswerRe: different between ChineseMappingStringPairs and FixingBugChineseStrings Pin
Kent Liu30-Nov-06 23:04
professionalKent Liu30-Nov-06 23:04 
GeneralRe: different between ChineseMappingStringPairs and FixingBugChineseStrings Pin
MKHA23-Dec-06 22:09
MKHA23-Dec-06 22:09 
AnswerRe: different between ChineseMappingStringPairs and FixingBugChineseStrings Pin
Kent Liu5-Dec-06 14:40
professionalKent Liu5-Dec-06 14:40 
Generalincorrect result under win2003, but winXP works fine Pin
MKHA228-Nov-06 20:53
MKHA228-Nov-06 20:53 
GeneralRe: incorrect result under win2003, but winXP works fine Pin
Kent Liu29-Nov-06 13:52
professionalKent Liu29-Nov-06 13:52 
GeneralRe: incorrect result under win2003, but winXP works fine Pin
MKHA229-Nov-06 19:16
MKHA229-Nov-06 19:16 
GeneralRe: incorrect result under win2003, but winXP works fine Pin
Kent Liu30-Nov-06 2:43
professionalKent Liu30-Nov-06 2:43 
GeneralRe: incorrect result under win2003, but winXP works fine Pin
MKHA230-Nov-06 15:29
MKHA230-Nov-06 15:29 
so you mean, the code have been converted to gb code correctly, but since I set the locale to "HK SAR"; and in HK SAR encoding(unicode), there is no each char to represent the converted simplified chinese char, so that the "??" will be show?

Big-5 code --(Chinese converter pad")--> GB code --(view on winXP)--> Unicode

But the problem is, I use the direct convert file function to load and save a new file without showing on the "Chinese converter pad"; and I check the converted simplified chinese char in notepad (text format) under english windows xp, even the simplified chinese can not be display in the notepad, but I should still available to read the "corrupted" chinese char, and for each 2 "corrupted" char should display 1 chinese char in MS word/IE. However, some "??" show in the converted text file. Also, isn't the ISO-10646 included all char in bg-5 and GB plus more? If so, the default win XP unicode setting(HK SAR) should have the correct mapping char for those "??" char. (please correct me if I have some conceptional mistake, chinese encoding is really complicant :-<)

In additional, I only change the "Standard and foramt" setting not the "System locale", and the "Standard and foramt" should only affect the formatting stuff, like date format, currency format.... and I have no idea why it would affect the conversion.

Standards and formats: English
Location: HK SAR
Language for non-unicode program: HK SAR
GeneralRe: incorrect result under win2003, but winXP works fine Pin
Kent Liu30-Nov-06 22:29
professionalKent Liu30-Nov-06 22:29 
Questionnamespace name 'Converter' does not exist Pin
MKHA227-Nov-06 17:26
MKHA227-Nov-06 17:26 
AnswerRe: namespace name 'Converter' does not exist Pin
Kent Liu28-Nov-06 2:33
professionalKent Liu28-Nov-06 2:33 
Questioni need help badly Pin
Ahmed.mb19-Aug-06 6:42
Ahmed.mb19-Aug-06 6:42 
AnswerRe: i need help badly Pin
Kent Liu19-Aug-06 23:19
professionalKent Liu19-Aug-06 23:19 
GeneralRe: i need help badly Pin
Ahmed.mb19-Aug-06 23:26
Ahmed.mb19-Aug-06 23:26 
GeneralRe: i need help badly Pin
Kent Liu20-Aug-06 15:23
professionalKent Liu20-Aug-06 15:23 
GeneralRe: i need help badly Pin
Ahmed.mb21-Aug-06 5:31
Ahmed.mb21-Aug-06 5:31 
GeneralRe: i need help badly Pin
Kent Liu21-Aug-06 14:50
professionalKent Liu21-Aug-06 14:50 
GeneralRe: i need help badly Pin
Ahmed.mb21-Aug-06 18:15
Ahmed.mb21-Aug-06 18:15 
GeneralRe: i need help badly Pin
Kent Liu22-Aug-06 14:54
professionalKent Liu22-Aug-06 14:54 
GeneralRe: i need help badly Pin
Ahmed.mb25-Aug-06 3:54
Ahmed.mb25-Aug-06 3:54 
GeneralRe: i need help badly Pin
Kent Liu5-Sep-06 16:11
professionalKent Liu5-Sep-06 16:11 
GeneralRe: i need help badly Pin
Ahmed.mb13-Sep-06 22:24
Ahmed.mb13-Sep-06 22:24 
GeneralRe: i need help badly Pin
Kent Liu29-Nov-06 14:04
professionalKent Liu29-Nov-06 14:04 
GeneralRe: i need help badly Pin
Ahmed.mb11-Dec-06 0:02
Ahmed.mb11-Dec-06 0:02 
QuestionNo solution approved? Pin
kpchan221-Jun-06 0:58
kpchan221-Jun-06 0:58 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.