Click here to Skip to main content
Click here to Skip to main content
Go to top

ASCII/Multibyte to Unicode Conversion

, 16 Feb 2007
Rate this:
Please Sign up or sign in to vote.
An article about converting ASCII/Multibyte to Unicode using C# and SQL/CLR integration.

Sample image

Introduction

A common problem when converting legacy database applications is that customers have a lot of data and they don't want to loose it. Data strings are usually encoded in ASCII /Multibyte, but our new system uses NVARCHAR fields, so the problem is that we have to convert old data into Unicode. This is exactly what we'll do here.

I have to highlight that the approach I will present is 'oversized' for most cases: in a common scenario, you will deal with a single codepage; in that case, SQL Server built-in tools are enough. But there are situations that require a more advanced approach, such as when you have a legacy database that stores strings with different codepages.

As a complete example to show you capabilities of SQL-CLR integration, I've also decided to use a Win32 API to perform the conversion so that you can also see how to use P/Invoke from SQL. This can be useful if you have an old DLL and want to use it... but *beware* that it can be really dangerous... if you don't have full knowledge of what you're doing, you can keep down the entire SQL Server process !!!

Just a note: I do not provide a test project since the code I will show here is really simple, you can use copy & paste, and it's faster!

Solution

The solution I want to show is simple and powerful at the same time. It is made of two logical parts: first, we'll build the conversion routine as a standard C# function. Second, we'll integrate it into SQL Server as a function, so you can use with T-SQL.

Clearly, depending on your application scenario, this may not be the best approach, so once you have the conversion routine, you may choose to follow my approach or you may prefer to use it in an external application that performs a batch conversion, or you may think of something else.

Part #1: Create the conversion routine

I've used the following well known Win32 API:

int MultiByteToWideChar(
  UINT CodePage, 
  DWORD dwFlags,         
  LPCSTR lpMultiByteStr, 
  int cbMultiByte,       
  LPWSTR lpWideCharStr,  
  int cchWideChar        
);

Here, there's the C# signature that can be used for P/Invoking:

[DllImport("kernel32.dll")]
private static extern int MultiByteToWideChar(
  uint CodePage,
  uint dwFlags,
  [MarshalAs(UnmanagedType.LPArray)] Byte[] lpMultiByteStr,
  int cbMultiByte,
  [Out, MarshalAs(UnmanagedType.LPArray)] Byte[] lpWideCharStr,
  int cchWideChar);

Now it's quite easy to write a class that performs a conversion:

namespace ConvUtils {

  public static class Unicode {

    public static SqlString ConvToUnicode(SqlInt32 codepage , SqlString multibyteString) {
      byte[] b = (byte[])iConvToMultibyteArray(multibyteString);
      return (SqlString)ToUnicode((uint)(int)codepage, b);
    }

    private static string ToUnicode(uint codepage, Byte[] lpMultiByteStr) { 
      Byte[] lpWideCharStr = new Byte[2*lpMultiByteStr.Length];
      MultiByteToWideChar(codepage, 0, lpMultiByteStr, lpMultiByteStr.Length,
         lpWideCharStr, 2*lpMultiByteStr.Length);
      return System.Text.Encoding.Unicode.GetString(lpWideCharStr);
    }


    private static SqlBinary iConvToMultibyteArray(SqlString multibyteString) {
      byte[] result = multibyteString.GetUnicodeBytes();
      return (SqlBinary)result;
    }
  }
}

The example is quite easy, and does not require any other explanation. The SQL types has been used because I will integrate it into SQL Server, but if you don't need it, you may replace them with strings and byte arrays.

I've also decided to let the codepage be a parameter since the codepage on your PC or on SQL Server can be different from the one needed for the conversion.

Part #2: SQL Server code

OK, now we have a C# code that converts ASCII/Multibyte into Unicode. The next step is to integrate it into SQL Server so that any database user can have access to this conversion routine:

First, create a DLL that can be hosted by SQL Server: all we have to do is to add the [SQLFunction] attribute to ConvToUnicode:

[SQLFunction] 
public static SqlString ConvToUnicode( ...

Finally, build the DLL and integrate it into SQL Server with a script like the following:

use TESTDB 
go 
exec sp_configure "clr enabled", '1' 
go 
reconfigure 
go 
-- for test purpouses only, not recommended in production environments
ALTER DATABASE TESTDB SET TRUSTWORTHY ON 
go 

begin try 
  CREATE ASSEMBLY [asmUni] FROM 'c:\project_output_dir\uniconv.dll'
     with permission_set=UNSAFE 
end try 
begin catch 
  alter assembly [asmUni] FROM 'c:\project_output_dir\uniconv.dll' WITH UNCHECKED DATA 
end catch 
go

if exists(
    select name from sys.objects where name = 
    'csConvToUnicode') drop function [dbo].[csConvToUnicode] 
go 
CREATE FUNCTION [dbo].[csConvToUnicode] ( 
@codepage int, 
@multibytestr nvarchar(max) 
) returns nvarchar(max) 
AS EXTERNAL name [asmUni].[ConvUtils.Unicode].[ConvToUnicode] 
go

That's all!

Now you can use this function like any other, for example, in a classic Select statement, to create a View or to create a Trigger that automatically keeps your data updated.

Here is a final example of how we can use this function in a T-SQL statement (950 is the codepage for traditional Chinese):

select 
  description, 
  dbo.csConvToUnicode(950, description) as converted 
from testtable

description             converted
----------------------- -------------------
¨à µ£ºô¸ô¦w¥þ            兒童網路安全
°ê»Ú¸ê°T °T°T°T°T°T        國際資訊 訊訊訊訊訊
°ê»Ú¸ê°T °T°T°T°T°T        國際資訊 訊訊訊訊訊
a                    a
Áô¨pÅv¬Fµ¦            隱私權政策
test c                test c

Conclusion

I have shown a really simple but powerful way to leverage CLR integration provided by SQL Server to convert old ASCII/Multibyte data into Unicode. I hope that this is a good starting point for your personal solution.

History

  • February 16, 2007 - Added more comments on introduction.
  • February 8, 2007 - First version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

carlop()
Software Developer (Senior)
Italy Italy
10 years of hard coding... and it's going on !

Comments and Discussions

 
QuestionNot work in my case PinmemberRuohong23-Nov-10 2:11 
AnswerRe: Not work in my case Pinmembercarlop()23-Nov-10 10:08 
QuestionNot work with arabic Pinmemberaishar11-Jan-09 23:27 
AnswerRe: Not work with arabic Pinmembercarlop()12-Jan-09 9:52 
GeneralRe: Not work with arabic Pinmemberaishar12-Jan-09 9:56 
GeneralRe: Not work with arabic Pinmembercarlop()12-Jan-09 10:19 
GeneralIt's just what I want, would u please share your dll here? many thks! Pinmemberfrx21-Dec-08 17:09 
GeneralRe: It's just what I want, would u please share your dll here? many thks! Pinmembercarlop()21-Dec-08 21:01 
GeneralRe: It's just what I want, would u please share your dll here? many thks! Pinmemberfrx22-Dec-08 22:48 
GeneralPerhaps an alternative using System.Text.Encoding PinmemberCyrus Chan26-Mar-07 22:29 
QuestionWhy reinvent the wheel? PinmemberMihai Nita8-Feb-07 8:26 
AnswerRe: Why reinvent the wheel? Pinmembercarlop [on codeproject]8-Feb-07 9:37 
GeneralRe: Why reinvent the wheel? [modified] PinmemberMihai Nita8-Feb-07 12:39 
GeneralRe: Why reinvent the wheel? Pinmembercarlop [on codeproject]8-Feb-07 13:06 
GeneralRe: Why reinvent the wheel? PinmemberMihai Nita8-Feb-07 16:26 
GeneralRe: Why reinvent the wheel? Pinmembercarlop [on codeproject]8-Feb-07 21:08 
GeneralRe: Why reinvent the wheel? PinmemberMihai Nita8-Feb-07 23:24 
GeneralRe: Why reinvent the wheel? Pinmembercarlop()8-Feb-07 23:42 
GeneralRe: Why reinvent the wheel? Pinmemberpemb31-Jul-07 17:10 
AnswerRe: It's still a good example of CLR integration PinmemberJcmorin8-Feb-07 11:26 
GeneralRe: It's still a good example of CLR integration Pinmembercarlop [on codeproject]8-Feb-07 11:33 
GeneralRe: It's still a good example of CLR integration PinmemberMihai Nita8-Feb-07 12:42 
GeneralRe: It's still a good example of CLR integration Pinmembercarlop [on codeproject]8-Feb-07 21:11 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140916.1 | Last Updated 16 Feb 2007
Article Copyright 2007 by carlop()
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid