Click here to Skip to main content
15,881,092 members
Articles / PInvoke

P/Invoke Tutorial: Passing strings (Part 2)

Rate me:
Please Sign up or sign in to vote.
4.33/5 (3 votes)
12 Jun 2012CPOL2 min read 13.6K   8   2
P/Invoke Tutorial: Passing strings (Part 2)

In the previous tutorial, we passed a single string to a native C/C++ function by using P/Invoke.

This function was defined like this:

C++
// C++
void print_line(const char* str);
C#
// C#
[DllImport("NativeLib.dll")]
private static extern void print_line(string str);

However, there exists a hidden pitfall here:

What happens when the user passes a non-ASCII character to this function?

ASCII and Unicode: A Historical Overview

Historically, there was ASCII which defined characters up to character number 127 (i.e., everything that fits into 7 bits). However, these 128 characters contained only letters used in English. Umlauts (like ä, ö, ü) and other characters were not present. So, the 8th bit was used to map these characters, but the mapping was not standardized. Basically, each country had its own mapping of the region 128 – 255. These different mappings were called code pages.

For example, on code page 850 (MS-DOS Latin 1), the character number 154 is Ü (German Umlaut) while on code page 855 (MS-DOS Cyrillic), the very same character number represents ? (Cyrillic small letter DZHE).

To unify these different mappings, the Unicode standard was established in 1991. The idea was (and is) to give each existing character a unique id. These ids are called code points. So basically, the Unicode standard is “just” a much bigger version of the ASCII standard. The latest version as of writing is Unicode version 6.1 which covers over 110,000 characters.

Along with the Unicode standard, several encodings were developed. Each encoding describes how to convert Unicode code points into bytes. The most famous ones are UTF-8 and UTF-16.

Please note that all encodings can encode all Unicode code points. They just differ in the way they do this.

If you want to experiment a little bit with Unicode, there is a Unicode Explorer I’ve written. Go ahead and give it a try.

P/Invoke String Conversions

Back to the actual problem. With the parameter of print_line() defined as const char* (and char being 8 bit), it’s not clear which code page to use for the strings passed to this function.

Instead, let’s change the parameter type to Unicode (also sometimes referred to as “wide characters”):

C++
void print_line(const wchar_t* str);

No, let’s also adopt the C# mapping:

C#
[DllImport("NativeLib.dll", CharSet = CharSet.Unicode)]
private static extern void print_line(string str);

The only difference here it that we specified the CharSet to be Unicode.

With this, C# will pass strings as UTF-16 encoded strings to the C++ function.

UTF-16 is, as said before, an encoding the converted Unicode code points into bytes and the other way around. In UTF-16, each code point is either encoded with one or with two WORDs (16 bit values). The most frequently used code points will fit into one WORD, the less frequently used code points fit into two WORDs (called a “surrogate pair“).

Important: There is no ISO C way of how to print Unicode characters to the console. wprintf() won’t work – at least on Windows.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer University of Stuttgart
Germany Germany
I have studied Software Engineering and am currently working at the University of Stuttgart, Germany.

I have been programming for many years and have a background in C++, C#, Java, Python and web languages (HTML, CSS, JavaScript).

Comments and Discussions

 
GeneralSomething I happened to know... Pin
Rompage15-Jan-17 22:05
Rompage15-Jan-17 22:05 
QuestionNice article Pin
verence13-Jun-12 22:04
verence13-Jun-12 22:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.