Click here to Skip to main content
Click here to Skip to main content

The Impossibly Fast C++ Delegates

By , 17 Jul 2005
 

Introduction

It is an answer to the article "Member Function Pointers and the Fastest Possible C++ Delegates by Don Clugston". Don proposed an approach to delegates (further called FastDelegate) which requires the same invocation code as is produced for invocation by a pointer to member function in the simplest case (he described why some compilers produce more complex code for polymorphic classes and classes with virtual inheritance). He described why many other popular approaches are inefficient. Unfortunately his approach is based on 'a horrible hack' (as he said). It works on many popular compilers, but is incompatible with the C++ Standard.

It seems to be true that the FastDelegate is the fastest possible way. But I suppose that such a claim needs a proof because modern C++ optimizing compilers make incredible things. I believe that boost::function and other dynamic memory allocation based delegates are slow, but who said there are no other good approaches?

I'm going to propose another approach, which:

  1. is fast,
  2. doesn't use dynamic allocated memory,
  3. is completely compatible with the C++ Standard.

Yet another approach to delegates

Let's consider a delegate which receives one argument and returns no value. It may be defined in the following way using the preferred syntax (as the boost::function and the FastDelegate, my library supports preferred and compatibility syntaxes; see documentation for details):

delegate<void (int)>

I've simplified its code to help you understand how it works. The following code has been derived by removing unnecessary lines under and above the considered code and by replacing template parameters with concrete types.

class delegate
{
public:
    delegate()
        : object_ptr(0)
        , stub_ptr(0)
    {}

    template <class T, void (T::*TMethod)(int)>
    static delegate from_method(T* object_ptr)
    {
        delegate d;
        d.object_ptr = object_ptr;
        d.stub_ptr = &method_stub<T, TMethod>; // #1
        return d;
    }

    void operator()(int a1) const
    {
        return (*stub_ptr)(object_ptr, a1);
    }

private:
    typedef void (*stub_type)(void* object_ptr, int);

    void* object_ptr;
    stub_type stub_ptr;

    template <class T, void (T::*TMethod)(int)>
    static void method_stub(void* object_ptr, int a1)
    {
        T* p = static_cast<T*>(object_ptr);
        return (p->*TMethod)(a1); // #2
    }
};

So, a delegate consists of an untyped pointer to data (because a delegate mustn't depend on the type of receiver) and a pointer to a function. This function receives the pointer to data as an extra parameter. It converts the data pointer to object pointer ('void*', unlike member pointers can be safely converted back to object pointers: [expr.static.cast], item 10) and calls the required member function.

When you create a nonempty delegate, you implicitly instantiate a stub function by getting its address (see line #1 above). It is possible because the C++ Standard allows using a pointer to member or a pointer to function as a template parameter ([temp.params], item 4):

SomeObject obj;
delegate d = delegate::from_member<SomeObject, 
              &SomeObject::someMethod>(&obj);

Now 'd' is containing a pointer to stub function bound to 'someMethod' at compile time. Although a member pointer was specified, invocation at line #2 is as fast as direct method invocation (because its value is known at compile time).

As usual, the delegate may be invoked by an inline function call operator which redirects the call to the target method through the stub function:

d(10); // invocation of SomeObject::someMethod
       // for obj and passing them 10 as a parameter

Of course, it assumes an additional function call, but the overhead essentially depends on the optimizer. Actually, there may be no overhead at all.

Performance measurement

I've measured performance of the delegate invocation with various combinations of virtual/non-virtual methods, various numbers of arguments and with various types of inheritance. Also I've measured performance of delegates bound to a function and a static method. I've compared performance of FastDelegate with my approach using MS Visual C++ 7.1 and Intel C++ 8.0 compilers on a P4 Celeron processor.

In tangled cases, using a stub function may be a cause of a noticeable overhead (up to 5.5 times on MSVC and up to 2.4 times on Intel). But sometimes The Fastest Possible Delegates are slower (up to 15% on Intel and a little bit on MSVC). They are always slower on static members and on free functions. How could it be?

During disassembled code analysis, I've found an interesting fact. In the worst case, the compiler copies all parameters of the stub function and passes them into a target method. In some cases (if target method has no arguments or conversion is trivial), the optimizer reduces the stub function to a single jump instruction. And when a target method is inlinable, optimizer puts its code into the stub function. In this case there is no overhead at all.

The Fastest Possible Delegates are forced to use 'thiscall' calling convention. My delegates are free to use any calling convention (except 'thiscall') including '__fastcall'. It allows passing up to two int-size arguments through registers ('thiscall' passes only 'this' pointer through ECX).

Actually there exists a simple way to make your delegate based code extremely fast (if you really need it):

  • don't use complex objects as argument types and return values (use references instead),
  • don't use a virtual method as a target for delegates (because usually it may not be inlined),
  • put a target method implementation and delegate creation code into the same compilation unit.

You can try to use my benchmark code to measure performance on your platform and your compiler.

Copying and comparing delegates

Performance of copying constructor is not the matter for both types of delegates (in contrast to delegates based on dynamic memory allocation, such as boost::function). Nevertheless my delegates can be copied a bit faster because they tend to occupy less space.

My delegates cannot be compared. Comparison operators are not defined because a delegate doesn't contain a pointer to method. Pointer to a stub function can be different in various compilation units. Actually this feature cut was the main reason why Don Clugston was not satisfied with my approach.

However, I suppose that a possibility of comparing pointer-to-methods is dangerous. It may work well until once you make some class inlinable.

I know only one reason why you may need to compare delegates. It is event syntax such as that of C#. It looks nice, but it can't be implemented without dynamic memory allocation. Moreover in C++ it may not work well in some cases. I would like to suggest another event propagation mechanism, more suitable for C++ in my opinion.

Portability

Although this approach is compatible with the C++ Standard, unfortunately it doesn't work on some compilers. I haven't managed to compile a test code on Borland C++. The preferred syntax doesn't work on MSVC 7.1 although it successfully compiles boost::function in the same syntax.

I think it is because of the rarely used language features.

Event library

I'm proposing an event library to demonstrate that delegates don't really need comparison operations. Actually this event library isn't tight with my delegates. It can work with many kind of delegates including boost::function. Also it can work with callback interfaces (like those of Java).

My event library provides a fast method to subscribe and unsubscribe to an event producer (even during event emitting) and doesn't use dynamically allocated memory as well (it must be important to you if you are interested in fast delegates).

This library provides two entities: event_source (it is a simplified analogue of boost::signal) and event_binder (an analogue of boost::signals::scoped_connection). Usually an event producer keeps event_source and an event consumer keeps event_binder. A connection between a producer and a consumer exists while both event_source and event_binder exist.

You can't use an anonymous connection. Actually in Boost you can use it in two ways:

  1. you are absolutely sure that the event consumer exists longer than the event producer and
  2. you should use boost::signals::trackable as the base class of an event consumer (it is possible to implement the analogue in my library, but I'm not sure it is a good idea).

You could use it in C#-style multicast delegates, but there is another problem: you must maintain pairs of actions (subscription and unsubscription), but their correctness can't be checked at compile time.

For more details, see documentation.

Conclusion

May be some details of C++ design are not ideal, but I don't see any reason to break the C++ Standard. Moreover, sometimes hacking doesn't allow optimizers to present all of their abilities.

References

License

This article, along with any associated source code and files, is licensed under The MIT License

About the Author

Sergey Ryazanov
Web Developer
Russian Federation Russian Federation
Member
I'm MS in Math (with honor). I've been working as a software engineer in the Vessel Traffic System department since the year 2002.
 
I like the C++, riddle solving, travelling and sports. My daughter was born when I was participating in a rock climbing competition (4 Jul 2004). Also I took up kayaking and skydiving, but unfortunately now I have no time for it.
 
I have about 4 years experience in ACM ICPC (ICP contests). The best achievement is 18-th place in World Finals (2002).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberom7 Jun '12 - 9:27 
Sergey, this is a genius article! The simplicity of your method is unbelievable. Thanks for sharing your brilliant idea!!!
QuestionThe article is fab : thanksmemberrevram30 Mar '12 - 22:50 
I had some trouble getting started with events. I made a small example to get it working. In the hope that it could be useful to someone, I am posting the example code.
 

 
#include <iostream>
using namespace std;
using namespace srutil;
 
struct HouseOnFire
{
	HouseOnFire(int nHouseNum):nHouseNum_(nHouseNum){}
 
	void Fire()
	{
		delegate_invoker<void (int)> a(nHouseNum_);
		eonFire.emit(a);
	}	
 
	typedef delegate<void (int)> FireDelegate;
	typedef event_source<FireDelegate> EmitOnFire;
 
	EmitOnFire eonFire;
 
private:
 
	int nHouseNum_;
};
 
struct FireDepartment
{
 
	void Enroll(HouseOnFire& house)
	{
		binder.bind(house.eonFire, Sink::from_method<FireDepartment, &FireDepartment::OnFire>(this));
	}
 
	void OnFire(int nHouseNum)
	{
		ActOnFire(nHouseNum);
	}
 
	void ActOnFire(int nHouseNum)
	{
		cout<< "The house number :" << nHouseNum << " is on fire" << endl;
		cout<< "Take action now. "<< endl;
	}
 
private:
 
	typedef delegate<void (int)> Sink;
	typedef event_binder<Sink> Binder;
 
	Binder binder;
};
 

 
void testEvents()
{
	FireDepartment fd;
	HouseOnFire h(42);
 
	h.Fire();
 
	fd.Enroll(h);
	h.Fire();
}
 
~ Ram

QuestionWhat is the purpose of delegate_invoker?member aphazel 死神10 Jun '10 - 1:53 
Inside delegate template there is only
typedef delegate_invoker<signature_type> invoker_type;
 
but invoker_type is not used anywhere?
 
What is the purpose and how to use the delegate_invoker class?
AnswerRe: What is the purpose of delegate_invoker?memberMember 832308614 Aug '12 - 13:03 
It is a class template, that copies all provided arguments and then later on applies them on a provided functor. This functionality is used in the event library, where the stored arguments are applied on a provided delegate.
GeneralPreferred syntax on G++ 4.3.2 [modified]membermindbound10 Nov '09 - 15:36 
Compiling the delegate_demo.cpp on said compiler ("g++ ./delegate_demo.cpp -I. -Ipath_to_SRDelegates_dir/include -o delegate_demo") with SRUTIL_DELEGATE_PREFERRED_SYNTAX enabled produces the following error: delegate_demo.cpp:34: error: variable 'srutil::delegate_invoker<void(*)(int, int)> inv' has initializer but incomplete type. The error suggests that compiler has no access to full body of the given class, although this knowledge helps me little as to understanding this particular case.
 
modified on Tuesday, November 10, 2009 9:54 PM

AnswerComparing Delegatesmemberme6553516 Sep '09 - 6:53 
"Pointer to a stub function can be different in various compilation units."
 
AFAIK, this is not true. Compilers are required to re-use template functions generated in different compilation units (this I am sure of - but I think Borland once violated this rule). I think it is because classes (ones not in 'nameless' namespaces) use external linkage and the way you use the stub functions will always prevent them from being inlined (although this shouldn't be an issue either as taking the address of the function will force a non-inline version to be generated and 'external linkage' performed by the linker will eliminate all but one similarly named function (they are assumed and required to be identical by the standard))...
 
If you define a template function one translation unit (cpp file) and then define the same function differently in another translation unit, only one of the two versions will make it into the final executable. (This actually violates the "One Definition Rule", but works on GCC, at least... not sure about MSVC.) The point is: the address [of the stub] will be the same in different units.
 
I would urge you to update the article (including comparison capability) if you find this to be true for MSVC - if MSVC is standards conformant, in this regard.
 
You have some very nice code here, and I'm beating myself over the head for not thinking of it before. (I have seen Don Clugston's code, and this is very elegant in comparison.)
GeneralComparison of your delegates...memberOekobratze15 Sep '09 - 1:08 
I were searching for way of implenting a fast dataflow framework (the boost.dataflow proposal is nice but has too much overhaed for my needs) when I stumbled over this article.
The idea is nice but I were missing the possibility to compare two delegates (to remove connections between processing blocks).
Wouldn't it be possible to store an additional binary image of the encapsulated function?
 
Something like this:
template<....>
class delegate
{
public:
    //...
    template<class T, return_type (T::*TMethod)(P1)>;
    static delegate from_method(T* object_ptr)
    {
        return from_stub(object_ptr, &method_stub<T, TMethod>, to_image_ptr(TMethod), sizeof(TMethod));
    }
    //...
    bool operator==(delegate const & other) const
    {
        //compare the images of the original fun_ptrs instead of the stub functions
        return 
            (image_size == other.image_size) &&
            (object_ptr == other.object_ptr) &&
            (std::memcmp(ptr_image, other.ptr_image, image_size) == 0);
    }
 
private:
    //...
    void* object_ptr;
    stub_type stub_ptr;
    unsigned char fun_ptr_image[largest_fun_ptr_size];
    size_t image_size;
 
    //...

    static delegate from_stub(void* object_ptr, stub_type stub_ptr, unsigned char const* image_ptr, size_t image_size)
    {
        delegate d;
        d.object_ptr = object_ptr;
        d.stub_ptr = stub_ptr;
        d.image_size = image_size;
 
        //the image
        std::memcpy(d.ptr_image, image_ptr, image_size);
 
        return d;
    }
    //...

    //workaround for uncastable lvalue
    template<typename T>;
    static unsigned char const* to_image_ptr(T const& t)
    {
        return reinterpret_cast<unsigned char const*>(&t);
    }
 
};
I've tried on different compilers and it'd worked on all. But I'm still not sure if it's really a good idea.
GeneralVery good delegate class, thanks for sharing [modified]memberPizzaGolem200014 Mar '08 - 13:41 
I included your headers directly in my source code and tried using it. It compiled perfectly (with the alternate, non-preferred syntax) under CodeWarrior with no modifications required. In my opinion, the code is tight and simple and easy to use. Thanks for sharing! 5 stars.
 
modified on Friday, March 14, 2008 7:48 PM

GeneralPreferred sintaxmemberyalmar14 Oct '06 - 10:53 
In reference to following paragraph:
Portability
 
Although this approach is compatible with the C++ Standard, unfortunately it doesn't work 
on some compilers. I haven't managed to compile a test code on Borland C++. 
The preferred syntax doesn't work on MSVC 7.1 although it successfully compiles 
boost::function in the same syntax.
 
I compiled your delegates implementation in Vs.Net c++ 2003 (7.1) with SRUTIL_DELEGATE_PREFERRED_SYNTAX option, it really doesn't compile. The same in MinGW 3.4.x latest version. But just modifying the declaration form:
...
typedef srutil::delegate_invoker0 < void > TheInvoker;
...

 
by
 
...
typedef srutil::delegate_invoker < void (void) > TheInvoker;
...

 
it's work fine, but as you said, it successfully compiles boost::function. I believe that boost uses common compatible sintax

template < typename P1, typename P2, ..., typename PN, typename RType>

unlike of template

template < typename (typename P1, typename P2, ..., typename PN) >

 
best regards!
 

 
http://www.lcg.ufrj.br/Members/yalmar
Computer graphics

GeneralThank youmemberDavid O'Neil10 Aug '05 - 16:38 
I've incorporated a minimal version of your delegates into DWinLib, and it is working well. Thank you for sharing this method.
 
David

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 18 Jul 2005
Article Copyright 2005 by Sergey Ryazanov
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid