Click here to Skip to main content
Click here to Skip to main content

PInvoke Performance

By , 13 Sep 2011
 

Introduction

Skip to the results

There is a performance penalty when using P/Invoke to cross the managed/unmanaged boundary. But how serious is this penalty? Can this penalty be reduced by not using P/Invoke, but writing a C++/CLI library that exposes functions from a traditional API?

In this article, we will look into the performance of P/Invoke compared to a C++/CLI wrapper.

Screenshot1.png

Background

I am currently working on the new version of my OpenGL wrapper and Scene Graph called SharpGL (http://www.codeproject.com/KB/openGL/sharpgl.aspx). OpenGL is a very 'talkative' API - the functions are called many thousands of times per second. Whilst working on this library I wondered, would it be faster to write a C++/CLI class library to expose OpenGL functions or would it be faster to P/Invoke them directly? A brief Google suggested a C++/CLI wrapper but I wanted to look into this further.

I have written a tiny API called 'TraditionalAPI' which exposes three functions - this project invokes the functions a number of times using different methods.

Part 1: The Traditional API

The traditional API exposes three basic functions:

Test Function 1: IncrementCounter

This is the most basic function I could come up with, testing this function should be a good way of testing the overhead of a P/Invoke call:

//    A global counter.
unsigned int g_uCounter = 0;

TRADITIONALAPI_API void __stdcall TA_IncrementCounter()
{
    g_uCounter++;
}

Test Function 2: Square Root

The second function calculates the square root of a double. No complicated marshalling should be required:

//    A slightly more complex function, find the square root of a double.
TRADITIONALAPI_API double __stdcall TA_CalculateSquareRoot(double dValue)
{
    return ::sqrt(dValue);
}

Test Function 3: Dot Product

The next function calculates the dot product of two three-tuples. This function takes two arrays - so in the managed world we will have to pin memory to marshal this:

TRADITIONALAPI_API double __stdcall TA_DotProduct(
               double arThreeTuple1[], double arThreeTuple2[])
{
    return arThreeTuple1[0] * arThreeTuple2[0] + arThreeTuple1[1] * 
           arThreeTuple2[1] + arThreeTuple1[2] * arThreeTuple2[2];
}

Part 2: The C++/CLI Wrapper

The second part of the solution is a C++/CLI wrapper that wraps each function:

Test Function 1 Wrapper

void IncrementCounter()
{
    //    Call the unmanaged function.
    ::TA_IncrementCounter();
}

Nothing special here - this is a C++/CLI class so we will be able to call IncrementCounter from another .NET application.

Test Function 2 Wrapper

double CalculateSquareRoot(double value)
{
    //    Call the unmanaged function.
    return ::TA_CalculateSquareRoot(value);
}

Again, nothing complicated is required for this function.

Test Function 3 Wrapper

double DotProduct(array<double>^ threeTuple1, array<double>^ threeTuple2)
{
    //    Pin the arrays.
    pin_ptr<double> p1(&threeTuple1[0]);
    pin_ptr<double> p2(&threeTuple2[0]);
            
    //    Call the unmanaged function.
    return TA_DotProduct(p1, p2);
}

Now in this case, we actually have to do some work - pinning the managed arrays so that we can access them directly in the unmanaged API.

Part 3: The C# Test Application

The final part of the solution is a C# WPF application that runs the tests. The TraditionalAPI DLL can run each test function individually or run each test a number of times. Because of this, we can compare the following:

  • The time taken to run x tests directly in TraditionalAPI
  • The time taken to run x tests via the C++/CLI interface
  • The time taken to run x tests via P/Invoke

The Results

Below we have the results of running each test 10000 times:

Graph1.png

And the results of running each test 100000 times:

Graph2.png

Conclusion

Certainly not what I would have expected. According to my research, I was expecting to see the C++/CLI interface be at least an order of magnitude faster - as it does less error checking than a P/Invoke call. Even in the case of a million function calls, the C++/CLI interface is barely faster than using P/Invoke.

As we would expect, the cost of calling any native function from a CLI application is very high if we are calling it many times - in the case of using a very talkative API, it may even be worth writing a second C++ API that takes an aggregated set of parameters and calls the functions many times - the managed to unmanaged boundary is expensive.

Further Research

Has anyone found a case where a C++/CLI wrapper really does give a solid performance boost? Is there another way to do this that I have overlooked? Please provide any suggestions and I will update the article as necessary.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Dave Kerr
Software Developer
United Kingdom United Kingdom
Member
Follow my blog at www.dwmkerr.com and find out about my charity at www.childrenshomesnepal.org.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
Hint: For improved responsiveness ensure Javascript is enabled and choose 'Normal' from the Layout dropdown and hit 'Update'.
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionPlease update the articlememberaudacia13 Nov '12 - 5:10 
AnswerRe: Please update the articleprofessionalShawn-USA28 Apr '13 - 9:54 
GeneralRe: Please update the articlememberMember 811817028 Apr '13 - 10:15 
GeneralRe: Please update the articlemvpDave Kerr28 Apr '13 - 21:13 
GeneralRe: Please update the article [modified]professionalShawn-USA29 Apr '13 - 9:16 
QuestionMy own trials differmemberroylawliet20 Aug '12 - 13:33 
AnswerRe: My own trials differmvpDave Kerr20 Aug '12 - 21:08 
GeneralRe: My own trials differmemberroylawliet21 Aug '12 - 8:12 
GeneralMy vote of 5memberPokiaka2 Jun '12 - 13:30 
GeneralMy vote of 5memberSteppenwolfe22 Jan '12 - 4:34 
GeneralMy vote of 5membermatthias Weiser19 Jan '12 - 5:39 
GeneralP/Invoke vs. C++/CLI interopmemberNov0x30 Sep '11 - 3:55 
SuggestionSecurity check is significantly affecting performance, and doubles are not blittable.memberAndreasSk19 Sep '11 - 11:29 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberDave Kerr19 Sep '11 - 11:44 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberhfrmobile19 Sep '11 - 19:00 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberIsh7920 Sep '11 - 0:28 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberDave Kerr20 Sep '11 - 0:35 
SuggestionRe: Security check is significantly affecting performance, and doubles are not blittable.memberAndreasSk20 Sep '11 - 11:10 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberDave Kerr20 Sep '11 - 21:40 
GeneralRe: Security check is significantly affecting performance, and doubles are not blittable.memberMember 17592243 Dec '11 - 0:56 
SuggestionWhat about the managed "equivalents"? [modified]membercwienands19 Sep '11 - 9:45 
GeneralRe: What about the managed "equivalents"?memberDave Kerr19 Sep '11 - 11:05 
QuestionMany thanks for sharingmemberRoberto Guerzoni13 Sep '11 - 21:01 
AnswerRe: Many thanks for sharingmemberDave Kerr13 Sep '11 - 21:26 
GeneralMy vote of 5mvpAspDotNetDev13 Sep '11 - 7:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 13 Sep 2011
Article Copyright 2011 by Dave Kerr
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid