Click here to Skip to main content
15,503,829 members
Articles / Desktop Programming / Win32
Posted 13 Apr 2008


19 bookmarked

Generic Thunk with 5 combinations of Calling Conventions

Rate me:
Please Sign up or sign in to vote.
2.78/5 (8 votes)
13 Apr 2008CPOL12 min read
A simple and generic solution of making a member function become a callback function with the help of thunk technology.


This article presents a generic way of making a member function become a callback function with the help of thunk technology.

It mostly talks about the theory. An implement and sample are available too.


Many libraries need us to provide a function as a callback, and it makes OOP difficult.
It is because member functions (non-static) need “this” pointer which plain C functions don’t have.

The thunk technology is a fast (but platform dependency) way to solve this problem.

I have read many article on thunk technology recently , and many of these solutions are for some SPECIFIC problems , I think.
I have designd a set of thunk classes to provide a GENERIC solution.


development : IA32 , Windows Xp SP2 , Visual Studio 2005.


There are 5 (4 actually) classes (all in namespace Thunk).
Each object of these classes has two attribute : Obj and Method(non-static).
They can dynamically create some machine codes.
Executing the machine codes will logically be the same as call Obj.Method(...);

For exmaple,if we want to design a class to do subclassing work,we could use it with 5 steps as follows:

class CSubClassing {
    Thunk::ThisToStd m_thunk; 
        //ONE.choose a correct Thunk class. 
        /* this ThisToStd class makes a __thiscall method (LRESULT SubProc(…) )
            become a __stdcall callback function. (Win32 WNDPROC) */
        //TWO.Instantiate a thunk object.
    CSubClassing() {
        //THREE.attach the object
        //FOUR.attach the method
        // to do
    void Attach(HWND hWnd) {
        m_oldProc = (WNDPROC)SetWindowLong(hWnd,GWL_PROC
        //FIVE.convert it to callback function
        /* SetWindowLong function specifys WNDPROC by LONG value */
        // to do
    // this non-static member function now will be callbacked by Windows
    LRESULT SubProc(HWND hWnd,UINT msg,WPARAM wParam,LPARAM lParam) {
        if (msg!=WE_NEEDED)
            return CallWndProc(m_oldProc,hWnd,msg,wParam,lParam);
          // to do
    WNDPROC m_oldProc;

All the 5 classes have the same interfases and usage.
Once you choose a Thunk class According to Calling Convention of both Member function's and callback function's,you can do some useful thing with the help of it such as : WNDPROC,THREADPROC,hooking,etc.

See Thunk.h, sample project for more details.

The sample project contains 5 programs’ source code.
No executable files is available,because they are too large.
The project could be compiled by Microsoft Visual Studio 2005 well,with no change of the directory structure.

These 5 programs use one test codes – TestClass.h,TestClass.cpp and main.cpp.
The difference is in the preprocessor’s definition.
Then,they test ThisToStd,ThisToCdecl,StdToStd,StdToCdecl and CdeclToCdecl respectively.

In addition,you can get the information about what’s the necessary files to be included and to be added in a project if you want to use a Thunk class.
(Including Thunk.h and adding Thunk.cpp in the project can work,but it is not the best way.)


About the theory the most important thing we should know is Calling Convention —— the convention between the caller and the callee.

A plain C functions uses one of the 3 Calling Conventions typically :
"__cdecl","__stdcall" or "__fastcall".
And a member function typically uses :
"__thiscall","__stdcall" or "__cdecl".

We must focus on the following 3 points :
1.How the caller(plain C function) prepare the parameters and return address ?
2.For the callee(member function) ,what are the expectative and required parameters and return address ? How does the callee get them?
3. Whose responsibility is it to balance the stack ?

The parameters and return address prepared by the caller are always NOT the same as the callee expected, because of the need of "this" pointer. And the way of balancing the stack may be different too.

Our work is to prepare the "this" pointer at the right place the callee expected, and make up for the difference of stack balancing.

To make it simple, let's take " void func(int); void C::func(int); " as samples.

First, let’s see what will happen when func calls using __stdcall convention.

func(1212);    the compiler prepares the arguments and return address like this:
PUSH 1212 ;    lead stack increases by 4
CALL func ;    lead stack increases by 4,too (because the return address is pushed)
0x50000 :... ; the callee return here. And we suppose the address here is 0x50000

the caller EXPECTs the callee uses RET 4 (lead stack decreases by 8 : 4 for the argument 1212,the and other 4 for the return address 0x50000 ) to balance the stack, so there is no extra machine code.

So, after this, the stack is like this :

0x50000 <- ESP

Second, let's see what the expectative parameters and return address will be like if the callee uses __thiscall.
When a real member function calls.

C obj;

the compilers prepare the arguments like this :

PUSH 1212;
MOV ECX,obj;
CALL C::func

So, after this, the stack likes this :

0x50000 <-ESP

and ECX is storing this pointer.
This is exactly what the callee ( void __thiscall C::func(int); ) expected.

Third, let’s see how the the callee returns.

Actually it will return to 0x50000 by using RET 4

So, our work is only to preparing "this" pointer and then jumping to the member function.
(without extra work, parameters and return address is in right place, stack will be balanced correctly too.)

Design ThisToStd

Three more kinds of information are necessary, before we design the first and simplest class —ThisToStd.

1.we need a method to get a function's address.

Unlike a data pointer,which can be cast to an int value,

void *p = &someValue;
int address = reinterpret_cast<int>(p);
/* a warning if checking the portability for 64-bit machine
it can be ignored because this thunk is only used on 32-bit machine ^_^ */

a function pointer can not because of more limits.

void __stdcall fun(int) { ... }
void C::fun(int) {}

//int address = (int)fun;     // not allow!
//int address = (int)&C::fun; // error,too

there are two methods to do a powerful cast

template<typename dst_type,typename src_type>
dst_type pointer_cast(src_type src) {
    return *static_cast<dst_type*>( static_cast<void*>(&src) );
template<typename dst_type,typename src_type>
dst_type union_cast(src_type src) {
    union {
        src_type src;
        dst_type dst;
    } u = {src};
    return u.dst;

so we can implement a method :

template<typename Pointer>
int PointerToInt32(Pointer pointer)
    return pointer_cast<int>(pointer); // or union_cast<int>(pointer);

int address = PointerToInt32(&fun);   // works!
int address = (int)&C::fun;           // works,too!

See ThunkBase.h for more details.

2. destination of transfer instruction

Destinations of many transfer instructions are specified by OFFSET TO the source

for example :
when CPU executes the instruction at 0xFF000000 , the instruction like this :

0xFF000000 : 0xE9 0x33 0x55 0x77 0x99
0xFF000005 : ...;

0xE9 is a JMP instruction and the following 4 bytes will be interpreted as OFFSET
offset = 0x99775533 (on Intel x86 ,the lower byte stored on lower address) = -1720232653
source (src) = 0xFF000000 (the address of JMP instruction ) = 4278190080
destination (dst) = src+offset+5 ( 1 byte,JMP,4 bytes offset ) = 4278190080 – 1720232653 +5 = 2557957432 = 0x98775538

so after the instruction “ JMP -1720232653 “ the next instruction to be executed will be at :

0x98775538 : ...;

we can implement 2 methods based on this:

void SetTransterDST(
    void *src /* the address of transfer instruction*/
    ,int dst  /* the destination*/ )
    unsigned char *op = static_cast<unsigned char *>(src);
    switch (*op++) {
    case 0xE8: // CALL offset (dword)
    case 0xE9: // JMP  offset (dword)
            int *offset = reinterpret<int*>(op);
            *offset = dst – reinterpret<int>(src) - sizeof(*op)*1sizeof(int);
    case 0xEB: // JMP offset (byte)
    case ...:
    default :
        assert(!"not complete!");

int GetTransnferDST(const void *src) {
    const unsigned char *op = static_cast< const unsigned char *>(src);
    switch (*op++) {
    case 0xE8: //CALL offset (dword)
    case 0xE9: //JMP  offset (dword)
            const int *offset = reinterpret_cast<const int*>(op);
            return *offset + PointerToInt32(src) + sizeof(*op) +sizeof(int);
    case 0xEB: //JMP offset(byte)
    case ...:
        assert(!"not complete!");
    return 0;

See ThunkBase.cpp for more details.

3. growth of stack.

In win32 , the stack grows down to lower address.
It means,as stack increases by N ESP decreases by N , vice versa

We design the class:

class ThisToStd
    ThisToStd(const void *Obj = 0,int memFunc = 0);
    const void *Attach(const void *newObj);
    int Attach(int newMemFunc);

#pragma pack( push , 1) /* this will force the compiler to align following structure with 1 byte size */
    unsigned char MOV_ECX;
    const void *m_this;
    unsigned char JMP;
    const int m_memFunc;
#pragma pack( pop , 1)  // restore the alignment

ThisToStd:: ThisToStd(const void *Obj,int memFunc)
: MOV_ECX(0xB9),JMP(0xE9) {
    Attach(Obj);       // set this pointer
    Attach(memFunc);   // set member function address (by offset)

const void* ThisToStd::Attach(const void *newObj) {
    const void *oldObj = m_this;
    m_this = newObj;
    return oldObj;

int ThisToStd::Attach(int newMemFunc) {
    int oldMemFunc = GetTransferDST(&JMP);
    return oldMemFunc;

We use it like this:

typedef void ( __stdcall * fun1)(int);
class C { public : void __thiscall fun1(int){} };

C obj; 
ThisToStd thunk;

thunk.Attach(&obj);                        // suppose &obj = OBJ_ADD
int memFunc = PointerToInt32(&C::fun1);    // suppose memFunc = MF_ADD
thunk.Attach(memFunc);                     /* thunk.m_memFunc will set to MF_ADD – (&t.JMP)-5 */

fun1 fun = reinterpret_cast<fun1>(&thunk); // suppose &thunk = T_ADD
fun(1212); // the same as;

how it works

when the CPU executes at fun(1212) , the machine code is

PUSH 1212; 
0x50000 : ... ; suppose RET_ADD = 0x50000
// CALL DOWRD PTR [fun] different to CALL(0xE8) offset(dword)
// the only thing we need to know is : it push RET_ADD and JMP to T_ADD

after these 2 instructions ,the stack would be :


and the next instruction to be executed is at the address of thunk (T_ADD)

the first byte of thunk is “const unsigned char MOV_ECX” – initialized with 0xB9.
the following 4 bytes are “const void *m_this” – after thunk.Attach(&obj) ,m_this = OBJ_ADD.
these 5 bytes constitute a legal instruction :


the 6th byte of thunk is “const unsigned char JMP” – initialized with 0xE9.
the following 4 bytes are “const int m_memFunc” –modified by thunk.Attach(memFunc)
these 5 bytes constitute a other legal instruction :

T_ADD+5 : JMP offset

the offset = MF_ADD - &thunk.JMP – 5 ( set by thunk.Attach and SetTransferDST )

so,after this instructons, the next instruction to be executed will be here:
MF_ADD : ...;

now this pointer is ready ,(so are arguments and return adress by fun(1212) ), and the C::fun1 will return to RET_ADD using RET 4 and balance the stack correctly.
Then it works!

Design StdToStd

Let's make an analysis with the following 3 steps :

1. how does the caller prepare parameters and return address?
Generally speaking,a plain C function with __stdcall will push the arguments from right to left,leading the statck to increase by N,N is not always equal to the number of parameters × 4 !
CALL instruction pushes the return address and leads stack increase by 4 once more.

Arg m            <- ESP + 4 + N
Arg m-1
Arg 1            <- ESP + 4
Return Address   <- ESP

It gives the work of balancing the stack to the callee(using RET N).

2. how does the callee get the parameters and return address? (What is the expectation?)
A member function with __stdcall (has the same paramter list ) expects the arguments,return address and this pointer to be like this:

Arg m             <- ESP + 8 + N
Arg m-1
Arg 1             <- ESP + 8
this              <- ESP + 4
Return Address    <- ESP

3. how does the callee return ?
It returns using by RET N+4.

So our work is to insert this pointer between Arg1 and Return Address and jumping to the member function.
( we insert a this pointer and lead the stack increase 4,so callee use RET N+4 is correct.)

Before the design of StdToStd let’s define some useful marcos.
Believe me.It will make the source code easy to read and improve.


#undef CONST
#undef CODE

#define CONST const
#define CODE(type,name,value) type name;
#define CODE_FIRST(type,name,value) type name;
#define CONST
#define CODE(type,name,value) ,name(value)
#define CODE_FIRST(type,name,value) :name(value)


#include <span class="code-keyword"><MachineCodeMacro.h></span>
namespace Thunk {
    typedef unsigned char  byte;
    typedef unsigend short word;
    typedef int            dword;
    typedef const void*    dword_ptr;


#include <span class="code-keyword"><ThunkBase.h></span>
#define STD_TO_STD_CODES()          \
/* POP EAX */                       \
/* PUSH m_this */                   \
CONST CODE(byte,PUSH,0x68)          \
      CODE(dword_ptr,m_this,0)      \
/* PUSH EAX */                      \
CONST CODE(byte,PUSH_EAX,0x50)      \
/* JMP m_memFunc(offset) */         \
CONST CODE(byte,JMP,0xE9)           \
CONST CODE(dword,m_memFunc,0)

namespace Thunk {
    class StdToStd {
        StdToStd(const void *Obj = 0,int memFunc = 0);
        StdToStd(const StdToStd &src);
        const void* Attach(const void *newObj);
        int Attach(int newMemFunc);
#pragma pack( push ,1 )
#pragma pack( pop )


#include <span class="code-keyword"><StdToStd.h></span>
#include <span class="code-keyword"><MachineCodeMacro.h></span>

namespace Thunk {
    StdToStd::StdToStd(dword_ptr Obj,dword memFunc)
    StdToStd::StdToStd(const StdToStd &src)
        Attach( GetTransferDST(&src.JMP) ); 
     dwrod_ptr StdToStd::Attach(dword_ptr newObj) {
          dword_ptr oldObj = m_this;
        m_this = newObj;
        return oldObj;

    dword StdToStd::Attach(dword newMemFunc) {
        dword oldMemFunc = GetTransferDST(&JMP);
        return oldMemFunc;

The macro CONST CODE_FIRST(byte,POP_EAX,0x58)
in StdToStd.h will be replaced by : “const byte POP_EAX;”
(THUNK_MACHINE_CODE_IMPLEMENT is not defined in it)
in StdToStd.cpp will be replaced by : “:POP_EAX(0x58)”

the difference between CODE_FIRST and CODE is in the StdToStd.cpp
CODE is replaced by “, something” not “: something” . so the initializer list is legal.

The commet of STD_TO_STD_CODES() explains how it works.

See StdToStd.h and StdToStd.cpp for more details.

Design ThisToCdecl

Let's make an analysis with the following 3 steps :

1. when a plain C function (with __cdecl) calls.
the complier pushes the arguments from right to left ,leading the stack increase by N.
CALL instruction pushes the return address,and leads the stack increase by 4 onec more.
the stack will be like this :

Arg m          <- ESP + 4 + N
Arg m-1
Arg 1          <- ESP + 4
Return Address <- ESP

It balances the stack using ADD ESP,N

2.when a member function (has same paramter list with __thiscall) is about to be called.
It expects the arguments to be pushed from right to left,and ECX to store this pointer.

Arg m          <- ESP + 4 + N
Arg m-1
Arg 1          <- ESP + 4
Return Address <- ESP
ECX : this

3.when callee returns
It uses RET N !

Then,our work is
1. to store this pointer in ECX before the member function is called
2. to set ESP a right value after the member function returns
3. return to the caller. So the right value should be equals to the old EPS value before the caller calls after the caller executes ADD ESP,N.

dword oldESP = ESP;
... ; prerare arguments
CALL ThunkAddress

The number of parameters ×4 is not always equals to N , so we couldn’t use SUB ESP,N to set the ESP value.
(parameter list contains double for instance)

We can’t modify the return address to let it cross instruction “ADD ESP,N”,because this instruction is not always followed by CALL (call the caller).
(return type is double for instance)

A possible implement is to save the ESP int some place,and MOV it to ESP after callee retruns.
Let’s see the first implement.

ThisToCdecl 36.h

#define __THIS_TO__CDECL_CODES()               \
/* MOV DWORD PTR [old_esp],ESP */              \
CONST CODE_FIRST(word,MOV_ESP_TO,0x2589)       \
CONST CODE(dword_ptr,pold_esp,&old_esp)        \
/* POP ECX */                                  \
CONST CODE(byte,POP_ECX,0x59)                  \
/* MOV DWORD PTR [old_return],ECX */           \
CONST CODE(word,MOV_POLD_R,0x0D89)             \
CONST CODE(dword_ptr,p_old_return,&old_return) \
/* MOV ECX,this */                             \
CONST CODE(byte,MOV_ECX,0xB9)                  \
      CODE(dword_ptr,m_this,0)                 \
/* CALL memFunc */                             \
CONST CODE(byte,CALL,0xE8)                     \
      CODE(dword,m_memFunc,0)                  \
/* MOV ESP,old_esp */                          \
CONST CODE(byte,MOV_ESP,0xBC)                  \
CONST CODE(dword,old_esp,0)                    \
/* MOV DWORD PTR [ESP],old_retrun */           \
CONST CODE(word,MOV_P,0x04C7)                  \
CONST CODE(byte,_ESP,0x24)                     \
CONST CODE(dword,old_return,0)                 \
/* RET */                                      \

First,we save the ESP to a value old_esp.
Second, pop the return address(return to caller),save to a value old_return.
Third,prepare this pointer in ECX
4th ,call the member function,(we pop the caller’s return address, and CALL will push a return address – the stack is fit for callee. the callee will return to the the rest of the thunk code.)
5th,restore the ESP, return address and return to the caller.


sizeof(ThisToCdecl)==36 , I think it is unacceptable.

If we use PUSH old_return instead of MOV DWORD PTR[ESP],old_return , 2 bytes are saved ,(therefore,we must POP before save old_esp),and one more stack operation is added.
(See ThisToCdecl 34.h)

In this case,I prefer space optimization to time optimization.Then,a third implement is :
We could use a function named Hook to prepare this pointer ,save the old_esp and old_return,set callee return address and jump to the callee.
In this way,the thunk object contains fewer instructions,and become smaller. (23 bytes)


#define THIS_TO_CDECL_CODES()       \
/* CALL Hook */                     \
CONST CODE(dword,HOOK,0)            \
/* this and member function */      \
      CODE(dword,m_memFunc,0)       \
      CODE(dword_ptr,m_this,0)      \
/* member function return here! */  \
/* MOV ESP,oldESP */                \
CONST CODE(byte,MOV_ESP,0xBC)       \
CONST CODE(dword,oldESP,0)          \
/* JMP oldRet */                    \
CONST CODE(byte,JMP,0xE9)           \
CONST CODE(dword,oldRet,0)

these machine codes first call a function “Hook”, the Hook function does follow work :
1. save the oldESP and oldRet
2. set callee return address to “member funtion return here!”
3. set ECX this
4. JMP to memberfunction

after the callee returns , rest of the thunk code modifies the ESP and return to the caller.

The Hook function is implmented like this :

void __declspec( naked ) ThisToCdecl::Hook() {
 _asm {
// p=&m_memFunc; &m_this=p+4; &oldESP=p+9; &oldRet=p+14
// Save ESP

// Save CallerReturn(by offset)
  SUB ECX,18

// Set CalleeReturn

// Set m_this

// Jump to m_memFunc

We use CALL offset(dword) to transfer to Hook, and this instruction will push the return address.
So,the stack after CALL HOOK is like this:

Arg m 
Arg m-1
caller   Return Address
Hook     Return Address    <- ESP
;//Hook Return Address is just followed by instruction "CALL HOOK" — &m_method

Hook uses __declspec( naked ) to force the compiler not to generate extra instructions.
( compatibility: VC8 supports.VC6,7 not sure. g++ not support)

The first instruction POP EAX will make the stack decreases by 4 and get the thunk object’s
address (address of offset to memberf “m_memFunc”)

caller Return Address    <- ESP
EAX : p //p=&m_memFunc; &m_this=p+4; &oldESP=p+9; &oldRet=p+14

now three more things should be noticed :
1.the thunk object uses CALL(0xE8) to is a relative transfer.
offset can be calculated by SetTransferDST(&CALL,&Hook)
2.the thunk object uses JMP offset to jump to the caller,and the offset is calculated by Hook
3.Hook use JMP DOWRD PTR [EAX],this is an absolute transfer,
so m_memFunc shouldn’t uses SetTransferDST, m_memFunc = PointerToInt32(&C::Fun); is correct.

See ThisToCdecl.h and ThisToCdecl.cpp for more details.

Design CdeclToCdecl

1.We have discussed a plain C function with __cdecl

2.A member function with __cdcel expects the stack to be like :

Arg m             <- ESP + 8 + N
Arg m-1
Arg 1             <- ESP + 8
this              <- ESP + 4
Return Address    <- ESP

3.a member function with __cdecl returns by using RET

The CedclToCdecl class is almost like the ThisToCdecl class :
The thunk object calls a Hook function to prepare this pointer,save old_esp and old_return,then jumps to the callee.
After callee returns ,the thunk object modifies the ESP and jumps to the caller.
The difference is in Hook function.
It inserts this pointer between Arg1 and Return Address instead of moving this pointer to ECX.

See CdeclToCdecl.h and CdeclToCdecl.cpp for more details.

Design StdToCdecl

Let’s compare it with CdeclToCdecl.
The only difference is that the member function uses RET N+4 instead of RET.
After the callee returns to thunk object, either by using RET N+4 or RET, the ESP will be restored.
Therefore,CdeclToCdecl can be competent for StdToCdecl.
So, StdToCdecl is merely a typedef “typedef CdeclToCdecl StdToCdecl;” ^_^

Design CdeclToStd

The caller with __stdcall gives the work of balancing the stack to the callee.
The callee with __cdecl returns to caller by using RET.
The information about ESP is lost!
Unfortunately I have no idea on how to design a generic thunk class. -_-

About __fastcall and future work

The __fastcall calling convention passes the first two dword or smaller paramters in ECX and EDX.
So, designing a generic thunk class seems to be impossible.(parameter dependency)
But special solutions are existent.
I think the theory of thunk is more important than the implement.

I appreciate it If this article would be helpful when you intent to solve a special problem( for __fastcall or CdeclToStd) , implement on other platform or optimize these implements.

By the way, these source code can be used for whatever you want,and are provided “as is”,with no warranty.

About FlushInstructionCache

These classes are used typically like this :

class CNeedCallback {
    CThunk m_thunk;
    :m_thunk(this,Thunk::Helper::PointerToInt32(&CNeedCallback::Callback)) {}
    returnType Callback(...) {}

So, Obj and Method attributes of a thunk object don’t change after it construct.
In this case,I don’t know whether FlushInstructionCache is necessary.
If you think it IS,
please #define THUNK_FLUSHINSTRUCTIONCACHE in ThunkBase.cppor just remove the comment on line 4.

Special thanks

Special thanks to Illidan_Ne and Sean Ewington ^_^.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Software Developer (Junior)
China China
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

GeneralMy vote of 1 Pin
evenodder18-Jan-10 11:41
evenodder18-Jan-10 11:41 
AnswerRe: My vote of 1 Pin
OwnWaterloo24-May-10 11:59
OwnWaterloo24-May-10 11:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.