|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
IntroductionThis article presents a generic way of making a member function become a callback function with the help of thunk technology. It mostly talks about the theory. An implement and sample are available too. BackgroundMany libraries need us to provide a function as a callback, and it makes OOP difficult. The thunk technology is a fast (but platform dependency) way to solve this problem. I have read many article on thunk technology recently , and many of these solutions are for some SPECIFIC problems , I think. Environmentdevelopment : IA32 , Windows Xp SP2 , Visual Studio 2005. UsageThere are 5 (4 actually) classes (all in namespace Thunk). For exmaple,if we want to design a class to do subclassing work,we could use it with 5 steps as follows: class CSubClassing {
private:
Thunk::ThisToStd m_thunk;
//ONE.choose a correct Thunk class.
/* this ThisToStd class makes a __thiscall method (LRESULT SubProc(…) )
become a __stdcall callback function. (Win32 WNDPROC) */
//TWO.Instantiate a thunk object.
public:
CSubClassing() {
m_thunk.Attach(this);
//THREE.attach the object
m_thunk.AttachMethod(&CSubClassing::SubProc);
//FOUR.attach the method
// to do
}
void Attach(HWND hWnd) {
m_oldProc = (WNDPROC)SetWindowLong(hWnd,GWL_PROC
,m_thunk.MakeCallback<LONG>());
//FIVE.convert it to callback function
/* SetWindowLong function specifys WNDPROC by LONG value */
// to do
}
private:
// this non-static member function now will be callbacked by Windows
LRESULT SubProc(HWND hWnd,UINT msg,WPARAM wParam,LPARAM lParam) {
if (msg!=WE_NEEDED)
return CallWndProc(m_oldProc,hWnd,msg,wParam,lParam);
// to do
}
WNDPROC m_oldProc;
};
All the 5 classes have the same interfases and usage. See Thunk.h, sample project for more details. The sample project contains 5 programs’ source code. These 5 programs use one test codes – TestClass.h,TestClass.cpp and main.cpp. In addition,you can get the information about what’s the necessary files to be included and to be added in a project if you want to use a Thunk class. TheoryAbout the theory the most important thing we should know is Calling Convention —— the convention between the caller and the callee. A plain C functions uses one of the 3 Calling Conventions typically : We must focus on the following 3 points : The parameters and return address prepared by the caller are always NOT the same as the callee expected, because of the need of "this" pointer. And the way of balancing the stack may be different too. Our work is to prepare the "this" pointer at the right place the callee expected, and make up for the difference of stack balancing. To make it simple, let's take " void func(int); void C::func(int); " as samples. func(1212); the compiler prepares the arguments and return address like this:
PUSH 1212 ; lead stack increases by 4
CALL func ; lead stack increases by 4,too (because the return address is pushed)
0x50000 :... ; the callee return here. And we suppose the address here is 0x50000
the caller EXPECTs the callee uses RET 4 (lead stack decreases by 8 : 4 for the argument 1212,the and other 4 for the return address 0x50000 ) to balance the stack, so there is no extra machine code. So, after this, the stack is like this : ....
1212
0x50000 <- ESP
Second, let's see what the expectative parameters and return address will be like if the callee uses __thiscall. C obj;
obj.func(1212);
PUSH 1212;
MOV ECX,obj;
CALL C::func
So, after this, the stack likes this : ....
1212
0x50000 <-ESP
Third, let’s see how the the callee returns. Actually it will return to 0x50000 by using RET 4 So, our work is only to preparing "this" pointer and then jumping to the member function. Design ThisToStdThree more kinds of information are necessary, before we design the first and simplest class —ThisToStd. 1.we need a method to get a function's address. Unlike a data pointer,which can be cast to an int value, void *p = &someValue;
int address = reinterpret_cast<int>(p);
/* a warning if checking the portability for 64-bit machine
it can be ignored because this thunk is only used on 32-bit machine ^_^ */
a function pointer can not because of more limits. void __stdcall fun(int) { ... }
void C::fun(int) {}
//int address = (int)fun; // not allow!
//int address = (int)&C::fun; // error,too
there are two methods to do a powerful cast template<typename dst_type,typename src_type>
dst_type pointer_cast(src_type src) {
return *static_cast<dst_type*>( static_cast<void*>(&src) );
}
template<typename dst_type,typename src_type>
dst_type union_cast(src_type src) {
union {
src_type src;
dst_type dst;
} u = {src};
return u.dst;
}
so we can implement a method : template<typename Pointer>
int PointerToInt32(Pointer pointer)
{
return pointer_cast<int>(pointer); // or union_cast<int>(pointer);
}
int address = PointerToInt32(&fun); // works!
int address = (int)&C::fun; // works,too!
See ThunkBase.h for more details. 2. destination of transfer instruction Destinations of many transfer instructions are specified by OFFSET TO the source for example : 0xFF000000 : 0xE9 0x33 0x55 0x77 0x99
0xFF000005 : ...;
so after the instruction “ JMP -1720232653 “ the next instruction to be executed will be at : 0x98775538 : ...;
we can implement 2 methods based on this: void SetTransterDST(
void *src /* the address of transfer instruction*/
,int dst /* the destination*/ )
{
unsigned char *op = static_cast<unsigned char *>(src);
switch (*op++) {
case 0xE8: // CALL offset (dword)
case 0xE9: // JMP offset (dword)
{
int *offset = reinterpret<int*>(op);
*offset = dst – reinterpret<int>(src) - sizeof(*op)*1 – sizeof(int);
}
break;
case 0xEB: // JMP offset (byte)
...
break;
case ...:
...
break;
default :
assert(!”not complete!”);
}
}
int GetTransnferDST(const void *src) {
const unsigned char *op = static_cast< const unsigned char *>(src);
switch (*op++) {
case 0xE8: //CALL offset (dword)
case 0xE9: //JMP offset (dword)
{
const int *offset = reinterpret_cast<const int*>(op);
return *offset + PointerToInt32(src) + sizeof(*op) +sizeof(int);
}
break;
case 0xEB: //JMP offset(byte)
...
break;
case ...:
...
break;
default:
assert(!”not complete!”);
break;
}
return 0;
}
See ThunkBase.cpp for more details. 3. growth of stack.
We design the class: class ThisToStd
{
public:
ThisToStd(const void *Obj = 0,int memFunc = 0);
const void *Attach(const void *newObj);
int Attach(int newMemFunc);
private:
#pragma pack( push , 1) /* this will force the compiler to align following structure with 1 byte size */
unsigned char MOV_ECX;
const void *m_this;
unsigned char JMP;
const int m_memFunc;
#pragma pack( pop , 1) // restore the alignment
};
ThisToStd:: ThisToStd(const void *Obj,int memFunc)
: MOV_ECX(0xB9),JMP(0xE9) {
Attach(Obj); // set this pointer
Attach(memFunc); // set member function address (by offset)
}
const void* ThisToStd::Attach(const void *newObj) {
const void *oldObj = m_this;
m_this = newObj;
return oldObj;
}
int ThisToStd::Attach(int newMemFunc) {
int oldMemFunc = GetTransferDST(&JMP);
SetTransferDST(&JMP,newMemFunc);
return oldMemFunc;
}
We use it like this: typedef void ( __stdcall * fun1)(int);
class C { public : void __thiscall fun1(int){} };
C obj;
ThisToStd thunk;
thunk.Attach(&obj); // suppose &obj = OBJ_ADD
int memFunc = PointerToInt32(&C::fun1); // suppose memFunc = MF_ADD
thunk.Attach(memFunc); /* thunk.m_memFunc will set to MF_ADD – (&t.JMP)-5 */
fun1 fun = reinterpret_cast<fun1>(&thunk); // suppose &thunk = T_ADD
fun(1212); // the same as obj.fun(1212);
how it works when the CPU executes at fun(1212) , the machine code is PUSH 1212;
CALL DWORD PTR [fun];
0x50000 : ... ; suppose RET_ADD = 0x50000
// CALL DOWRD PTR [fun] different to CALL(0xE8) offset(dword)
// the only thing we need to know is : it push RET_ADD and JMP to T_ADD
after these 2 instructions ,the stack would be : ....
1212
RET_ADD <- ESP
and the next instruction to be executed is at the address of thunk (T_ADD) the first byte of thunk is “const unsigned char MOV_ECX” – initialized with 0xB9. T_ADD : MOV ECX,OBJ_ADD
the 6th byte of thunk is “const unsigned char JMP” – initialized with 0xE9. T_ADD+5 : JMP offset
MF_ADD : ...; now this pointer is ready ,(so are arguments and return adress by fun(1212) ), and the C::fun1 will return to RET_ADD using RET 4 and balance the stack correctly. Design StdToStdLet's make an analysis with the following 3 steps : 1. how does the caller prepare parameters and return address? Arg m <- ESP + 4 + N
Arg m-1
…
Arg 1 <- ESP + 4
Return Address <- ESP
It gives the work of balancing the stack to the callee(using RET N). 2. how does the callee get the parameters and return address? (What is the expectation?) Arg m <- ESP + 8 + N
Arg m-1
…
Arg 1 <- ESP + 8
this <- ESP + 4
Return Address <- ESP
3. how does the callee return ? So our work is to insert this pointer between Arg1 and Return Address and jumping to the member function. Before the design of StdToStd let’s define some useful marcos. MachineCodeMacro.h #undef CONST
#undef CODE
#undef CODE_FIRST
#ifndef THUNK_MACHINE_CODE_IMPLEMENT
#define CONST const
#define CODE(type,name,value) type name;
#define CODE_FIRST(type,name,value) type name;
#else
#define CONST
#define CODE(type,name,value) ,name(value)
#define CODE_FIRST(type,name,value) :name(value)
#endif
ThunkBase.h #include <MachineCodeMacro.h> namespace Thunk { typedef unsigned char byte; typedef unsigend short word; typedef int dword; typedef const void* dword_ptr; } StdToStd.h #include <ThunkBase.h>
#define STD_TO_STD_CODES() \
/* POP EAX */ \
CONST CODE_FIRST(byte,POP_EAX,0x58) \
\
/* PUSH m_this */ \
CONST CODE(byte,PUSH,0x68) \
CODE(dword_ptr,m_this,0) \
\
/* PUSH EAX */ \
CONST CODE(byte,PUSH_EAX,0x50) \
\
/* JMP m_memFunc(offset) */ \
CONST CODE(byte,JMP,0xE9) \
CONST CODE(dword,m_memFunc,0)
namespace Thunk {
class StdToStd {
public:
StdToStd(const void *Obj = 0,int memFunc = 0);
StdToStd(const StdToStd &src);
const void* Attach(const void *newObj);
int Attach(int newMemFunc);
private:
#pragma pack( push ,1 )
STD_TO_STD_CODES()
#pragma pack( pop )
};
StdToStd.cpp #include <StdToStd.h> #define THUNK_MACHINE_CODE_IMPLEMENT #include <MachineCodeMacro.h> namespace Thunk { StdToStd::StdToStd(dword_ptr Obj,dword memFunc) STD_TO_STD_CODES() { Attach(Obj); Attach(memFunc); } StdToStd::StdToStd(const StdToStd &src) STD_TO_STD_CODES() { Attach(src.m_this); Attach( GetTransferDST(&src.JMP) ); } dwrod_ptr StdToStd::Attach(dword_ptr newObj) { dword_ptr oldObj = m_this; m_this = newObj; return oldObj; } dword StdToStd::Attach(dword newMemFunc) { dword oldMemFunc = GetTransferDST(&JMP); SetTransferDST(&JMP,newMemFunc); return oldMemFunc; } } The macro CONST CODE_FIRST(byte,POP_EAX,0x58) the difference between CODE_FIRST and CODE is in the StdToStd.cpp The commet of STD_TO_STD_CODES() explains how it works. See StdToStd.h and StdToStd.cpp for more details. Design ThisToCdeclLet's make an analysis with the following 3 steps : 1. when a plain C function (with __cdecl) calls. …
Arg m <- ESP + 4 + N
Arg m-1
…
Arg 1 <- ESP + 4
Return Address <- ESP
It balances the stack using ADD ESP,N 2.when a member function (has same paramter list with __thiscall) is about to be called. Arg m <- ESP + 4 + N
Arg m-1
…
Arg 1 <- ESP + 4
Return Address <- ESP
ECX : this
3.when callee returns Then,our work is /*
dword oldESP = ESP;
... ; prerare arguments
CALL ThunkAddress
...;
ADD ESP,N
assert(ESP==oldESP);
*/
The number of parameters ×4 is not always equals to N , so we couldn’t use SUB ESP,N to set the ESP value. We can’t modify the return address to let it cross instruction “ADD ESP,N”,because this instruction is not always followed by CALL (call the caller). A possible implement is to save the ESP int some place,and MOV it to ESP after callee retruns. ThisToCdecl 36.h #define __THIS_TO__CDECL_CODES() \ /* MOV DWORD PTR [old_esp],ESP */ \ CONST CODE_FIRST(word,MOV_ESP_TO,0x2589) \ CONST CODE(dword_ptr,pold_esp,&old_esp) \ \ /* POP ECX */ \ CONST CODE(byte,POP_ECX,0x59) \ \ /* MOV DWORD PTR [old_return],ECX */ \ CONST CODE(word,MOV_POLD_R,0x0D89) \ CONST CODE(dword_ptr,p_old_return,&old_return) \ \ /* MOV ECX,this */ \ CONST CODE(byte,MOV_ECX,0xB9) \ CODE(dword_ptr,m_this,0) \ \ /* CALL memFunc */ \ CONST CODE(byte,CALL,0xE8) \ CODE(dword,m_memFunc,0) \ \ /* MOV ESP,old_esp */ \ CONST CODE(byte,MOV_ESP,0xBC) \ CONST CODE(dword,old_esp,0) \ /* MOV DWORD PTR [ESP],old_retrun */ \ CONST CODE(word,MOV_P,0x04C7) \ CONST CODE(byte,_ESP,0x24) \ CONST CODE(dword,old_return,0) \ /* RET */ \ CONST CODE(byte,RET,0xC3) First,we save the ESP to a value old_esp. optimization sizeof(ThisToCdecl)==36 , I think it is unacceptable. If we use PUSH old_return instead of MOV DWORD PTR[ESP],old_return , 2 bytes are saved ,(therefore,we must POP before save old_esp),and one more stack operation is added. In this case,I prefer space optimization to time optimization.Then,a third implement is : ThisToCdecl.h #define THIS_TO_CDECL_CODES() \
/* CALL Hook */ \
CONST CODE_FIRST(byte,CALL,0xE8) \
CONST CODE(dword,HOOK,0) \
\
/* this and member function */ \
CODE(dword,m_memFunc,0) \
CODE(dword_ptr,m_this,0) \
\
/* member function return here! */ \
/* MOV ESP,oldESP */ \
CONST CODE(byte,MOV_ESP,0xBC) \
CONST CODE(dword,oldESP,0) \
\
/* JMP oldRet */ \
CONST CODE(byte,JMP,0xE9) \
CONST CODE(dword,oldRet,0)
these machine codes first call a function “Hook”, the Hook function does follow work : after the callee returns , rest of the thunk code modifies the ESP and return to the caller. The Hook function is implmented like this : void __declspec( naked ) ThisToCdecl::Hook() { _asm { POP EAX // p=&m_memFunc; &m_this=p+4; &oldESP=p+9; &oldRet=p+14 // Save ESP MOV DWORD PTR [EAX+9],ESP ADD DWORD PTR [EAX+9],4 // Save CallerReturn(by offset) //src=&JMP=p+13,dst=CallerReturn,offset=CallerReturn-p-13-5 MOV ECX,DWORD PTR [ESP] SUB ECX,EAX SUB ECX,18 MOV DWORD PTR [EAX+14],ECX // Set CalleeReturn MOV DWORD PTR [ESP],EAX ADD DWORD PTR [ESP],8 // Set m_this MOV ECX,DWORD PTR [EAX+4] // Jump to m_memFunc JMP DWORD PTR[EAX] } } We use CALL offset(dword) to transfer to Hook, and this instruction will push the return address. …
Arg m
Arg m-1
…
Arg1
caller Return Address
Hook Return Address <- ESP
;//Hook Return Address is just followed by instruction “CALL HOOK” — &m_method
Hook uses __declspec( naked ) to force the compiler not to generate extra instructions. The first instruction POP EAX will make the stack decreases by 4 and get the thunk object’s …
Arg1
caller Return Address <- ESP
EAX : p //p=&m_memFunc; &m_this=p+4; &oldESP=p+9; &oldRet=p+14
now three more things should be noticed : See ThisToCdecl.h and ThisToCdecl.cpp for more details. Design CdeclToCdecl1.We have discussed a plain C function with __cdecl 2.A member function with __cdcel expects the stack to be like : …
Arg m <- ESP + 8 + N
Arg m-1
…
Arg 1 <- ESP + 8
this <- ESP + 4
Return Address <- ESP
3.a member function with __cdecl returns by using RET The CedclToCdecl class is almost like the ThisToCdecl class : See CdeclToCdecl.h and CdeclToCdecl.cpp for more details. Design StdToCdeclLet’s compare it with CdeclToCdecl. Design CdeclToStdThe caller with __stdcall gives the work of balancing the stack to the callee. About __fastcall and future workThe __fastcall calling convention passes the first two dword or smaller paramters in ECX and EDX. I appreciate it If this article would be helpful when you intent to solve a special problem( for __fastcall or CdeclToStd) , implement on other platform or optimize these implements. By the way, these source code can be used for whatever you want,and are provided “as is”,with no warranty. About FlushInstructionCacheThese classes are used typically like this : class CNeedCallback {
private:
CThunk m_thunk;
public:
CNeedCallback()
:m_thunk(this,Thunk::Helper::PointerToInt32(&CNeedCallback::Callback)) {}
private:
returnType Callback(...) {}
}
So, Obj and Method attributes of a thunk object don’t change after it construct. Special thanksSpecial thanks to Illidan_Ne and Sean Ewington ^_^.
|
||||||||||||||||||||||