Click here to Skip to main content
Click here to Skip to main content

PInvoke Performance

, 13 Sep 2011
Rate this:
Please Sign up or sign in to vote.
Performance comparison of P/Invoke versus a C++/CLI wrapper.

Introduction

Skip to the results

There is a performance penalty when using P/Invoke to cross the managed/unmanaged boundary. But how serious is this penalty? Can this penalty be reduced by not using P/Invoke, but writing a C++/CLI library that exposes functions from a traditional API?

In this article, we will look into the performance of P/Invoke compared to a C++/CLI wrapper.

Screenshot1.png

Background

I am currently working on the new version of my OpenGL wrapper and Scene Graph called SharpGL (http://www.codeproject.com/KB/openGL/sharpgl.aspx). OpenGL is a very 'talkative' API - the functions are called many thousands of times per second. Whilst working on this library I wondered, would it be faster to write a C++/CLI class library to expose OpenGL functions or would it be faster to P/Invoke them directly? A brief Google suggested a C++/CLI wrapper but I wanted to look into this further.

I have written a tiny API called 'TraditionalAPI' which exposes three functions - this project invokes the functions a number of times using different methods.

Part 1: The Traditional API

The traditional API exposes three basic functions:

Test Function 1: IncrementCounter

This is the most basic function I could come up with, testing this function should be a good way of testing the overhead of a P/Invoke call:

//    A global counter.
unsigned int g_uCounter = 0;

TRADITIONALAPI_API void __stdcall TA_IncrementCounter()
{
    g_uCounter++;
}

Test Function 2: Square Root

The second function calculates the square root of a double. No complicated marshalling should be required:

//    A slightly more complex function, find the square root of a double.
TRADITIONALAPI_API double __stdcall TA_CalculateSquareRoot(double dValue)
{
    return ::sqrt(dValue);
}

Test Function 3: Dot Product

The next function calculates the dot product of two three-tuples. This function takes two arrays - so in the managed world we will have to pin memory to marshal this:

TRADITIONALAPI_API double __stdcall TA_DotProduct(
               double arThreeTuple1[], double arThreeTuple2[])
{
    return arThreeTuple1[0] * arThreeTuple2[0] + arThreeTuple1[1] * 
           arThreeTuple2[1] + arThreeTuple1[2] * arThreeTuple2[2];
}

Part 2: The C++/CLI Wrapper

The second part of the solution is a C++/CLI wrapper that wraps each function:

Test Function 1 Wrapper

void IncrementCounter()
{
    //    Call the unmanaged function.
    ::TA_IncrementCounter();
}

Nothing special here - this is a C++/CLI class so we will be able to call IncrementCounter from another .NET application.

Test Function 2 Wrapper

double CalculateSquareRoot(double value)
{
    //    Call the unmanaged function.
    return ::TA_CalculateSquareRoot(value);
}

Again, nothing complicated is required for this function.

Test Function 3 Wrapper

double DotProduct(array<double>^ threeTuple1, array<double>^ threeTuple2)
{
    //    Pin the arrays.
    pin_ptr<double> p1(&threeTuple1[0]);
    pin_ptr<double> p2(&threeTuple2[0]);
            
    //    Call the unmanaged function.
    return TA_DotProduct(p1, p2);
}

Now in this case, we actually have to do some work - pinning the managed arrays so that we can access them directly in the unmanaged API.

Part 3: The C# Test Application

The final part of the solution is a C# WPF application that runs the tests. The TraditionalAPI DLL can run each test function individually or run each test a number of times. Because of this, we can compare the following:

  • The time taken to run x tests directly in TraditionalAPI
  • The time taken to run x tests via the C++/CLI interface
  • The time taken to run x tests via P/Invoke

The Results

Below we have the results of running each test 10000 times:

Graph1.png

And the results of running each test 100000 times:

Graph2.png

Conclusion

Certainly not what I would have expected. According to my research, I was expecting to see the C++/CLI interface be at least an order of magnitude faster - as it does less error checking than a P/Invoke call. Even in the case of a million function calls, the C++/CLI interface is barely faster than using P/Invoke.

As we would expect, the cost of calling any native function from a CLI application is very high if we are calling it many times - in the case of using a very talkative API, it may even be worth writing a second C++ API that takes an aggregated set of parameters and calls the functions many times - the managed to unmanaged boundary is expensive.

Further Research

Has anyone found a case where a C++/CLI wrapper really does give a solid performance boost? Is there another way to do this that I have overlooked? Please provide any suggestions and I will update the article as necessary.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Dave Kerr
Software Developer
United Kingdom United Kingdom
Follow my blog at www.dwmkerr.com and find out about my charity at www.childrenshomesnepal.org.
Follow on   Twitter

Comments and Discussions

 
QuestionPerformance of the unmanaged code itself [modified] PinprofessionalCanny Brisk25-May-14 5:13 
AnswerRe: Performance of the unmanaged code itself PinmvpDave Kerr30-May-14 23:26 
QuestionPlease update the article Pinmemberaudacia13-Nov-12 5:10 
AnswerRe: Please update the article PinprofessionalShawn-USA28-Apr-13 9:54 
GeneralRe: Please update the article PinmemberMember 811817028-Apr-13 10:15 
GeneralRe: Please update the article PinmvpDave Kerr28-Apr-13 21:13 
GeneralRe: Please update the article [modified] PinprofessionalShawn-USA29-Apr-13 9:16 
QuestionMy own trials differ Pinmemberroylawliet20-Aug-12 13:33 
AnswerRe: My own trials differ PinmvpDave Kerr20-Aug-12 21:08 
GeneralRe: My own trials differ Pinmemberroylawliet21-Aug-12 8:12 
GeneralMy vote of 5 PinmemberPokiaka2-Jun-12 13:30 
GeneralMy vote of 5 PinmemberSteppenwolfe22-Jan-12 4:34 
GeneralMy vote of 5 Pinmembermatthias Weiser19-Jan-12 5:39 
GeneralP/Invoke vs. C++/CLI interop PinmemberNov0x30-Sep-11 3:55 
SuggestionSecurity check is significantly affecting performance, and doubles are not blittable. PinmemberAndreasSk19-Sep-11 11:29 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberDave Kerr19-Sep-11 11:44 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. Pinmemberhfrmobile19-Sep-11 19:00 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberIsh7920-Sep-11 0:28 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberDave Kerr20-Sep-11 0:35 
SuggestionRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberAndreasSk20-Sep-11 11:10 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberDave Kerr20-Sep-11 21:40 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable. PinmemberMember 17592243-Dec-11 0:56 
SuggestionWhat about the managed "equivalents"? [modified] Pinmembercwienands19-Sep-11 9:45 
GeneralRe: What about the managed "equivalents"? PinmemberDave Kerr19-Sep-11 11:05 
QuestionMany thanks for sharing PinmemberRoberto Guerzoni13-Sep-11 21:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140827.1 | Last Updated 13 Sep 2011
Article Copyright 2011 by Dave Kerr
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid