![]() |
Languages »
C / C++ Language »
General
Advanced
License: The Code Project Open License (CPOL)
Grafting Compiled Code: Unlimited Code ReuseBy Jeffrey WaltonAdd Functionality to a Project Using Existing Compiled Machine Code |
VC6, VC7, VC7.1, VC8.0, WindowsVS.NET2003, VS2005, Dev
|
||||||||||
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
This article will demonstrate techniques to incorporate compiled machine code into an existing project using ASM source files. The assembly source file will be created from the compiled machine code. In addition, the article will remove the single thread safety limitation of Imagehlp.dll, and show techniques for converting a compiled STDCALL procedure to a C-CALL assembly language routine.
Example one will present a standard console application. The program adds two numbers and then returns the result. The example will introduce a few of the obstacles which will be encountered. Sample two will incorporate the graft using compiled machine code of Add() from the first.
Finally, example three will use foreign code from imagehlp.dll to supplement the PEChecksum program. This will remove the requirement of the imagehlp library dependency from a project and fully demonstrate the techniques. The PEChecksum program was presented in An Analysis of the PE Checksum Algorithm. Those who are familiar with x86 assembly, WinDbg, and IDA Pro should begin at example three.
The samples will use standard C as the language. For those interested in reusing compiled C++, please see Paul Vincent Sabanal's and Mark Vincent Yason's BlackHat 2007 presentation, Reversing C++.
To keep the first two examples academic, optimizations will be disabled. In general, release code with optimizations lacked the structure desired for the samples presented in this article. Release samples will be built with Optimizations disabled (/Od). Refer to Figure 1.
|
|
|
Figure 1: Disable Optimizations |
With optimizations enabled, some function were inlined. For example Figure 2 shows no corresponding call to a function that is present in the source file. wmain is entered at 0x00401000. The first function call encountered is at 0x40101D. 0x40101D is a call to cout. In this case, Add() was optimized away.
|
|
|
Figure 2: Missing Function Call Due to Optimization |
Another optimization which was not desired was the Frame Pointer Omission. FPO was not used to keep the examples easier to follow. The Frame Pointer is generally created with the instruction sequence shown below. The lack of a Frame Pointer adds a small wrinkle to an otherwise academic exercise. Frame Pointers are addresses in the 'Stack, ESP, and EBP' section.
Variables which are global to the process are placed in either the .data section, or (if present) the .bss section. The sections correspond to the initialized and uninitialized data sections respectively. Usually, a program refers to a global variable by address rather than relative based addressing (such as EBP used in local variables). For example, notice the code generated for access to the global scratch variable in Figure 3.
|
|
|
Figure 3: Global Variable Storage Access |
The address used is 0x00417000. When examining the program in PE Browse, the variable is listed under the .data section at address 0x00417000. Refer to Figure 4.
|
|
|
Figure 4: Allocation of Global Variable |
The disassembly and PE header of a uninitialized global variable is shown in Figure 5. The four byte values at address 0x00403374 are garbage: CR, LF, [SPACE], and [SPACE].
|
|
|
Figure 5: Allocation of Uninitialized Global Variable |
The meaning of the push ECX will be examined in Local Variables.
Local variables are stored on the thread's stack. Figure 5 shows a lone push of ECX, even though ECX is not used in main() per se. This is a technique to create the local storage by the compiler for the variable i. Rather than issuing and sub ESP, 4 - a three byte opcode (0x83 0xEC 0x04), push ECX - a one byte opcode (0x51) is used.
|
|
|
Figure 6: Local Allocation of a Single Variable |
When a greater number of variables require storage allocations, a sub ESP, n is used (where n is the number of bytes required). For example Figure 7 shows the allocation of five DWORD variables. Rather than issuing a series of five push ECX, the compiler issues one sub ESP, 0x14.
|
|
|
Figure 7: Local Allocation of Multiple Variables |
The stack is an area in memory which a thread uses as a 'scratch pad' during program execution. Each thread in a process has its own stack. One typically envisions memory for the stack as a contiguous region starting at a low address and moving sequentially to a large address. This is similar to addressing in a heap or an array - A[0] resides at a lower address than A[63]. However, unlike most memory access operations, stacks grow down.
ESP is the stack pointer and is maintained by the processor. EBP (if used) is maintained by the thread. When the processor encounters a push n instruction, two actions occur in the order listed below:
This implies ESP always points to the last value placed on the stack.
|
|
|
Figure 8: Argument Size and Call Stack |
Because the push of a value is always of machine word size, pushing five consecutive bytes consumes 20 bytes (0x14) on the stack - even though the local allocation could use only 2 DWORDs (8 bytes). Finally, there is no 0 extend for the truncated push of a byte. Whatever occupied the upper three bytes of register surfaces as part of the function's parameters, even though only the low order byte is of interest. This is embodied in the instruction sequence mov al, byte ptr [a]; push eax. Refer to figure 8.
The final comment to make with regard to the code generation is the compiler's awareness of multiple pipelines in the processor. Rather than reusing eax by issuing:
mov al, byte ptr [a] push eax ... mov al, byte ptr [d] push eax ...
The compiler will rotate register usage (eax, ecx, edx) so that the execution pipe remain full. This optimization is critical to performance since there are no branches which might otherwise stall execution due to a branch prediction miss. So we could expect to see the following:
mov al, byte ptr [a] push eax ... mov dl, byte ptr [d] push edx ...
When creating a stack frame for based address referencing, the compiler will issue a standard pair of instruction. The frame creates a well defined "function context". Typical prologue for stack based operations is the sequence:
push EBP mov EBP, ESP
The above sequence is emitted in each function invoked so that a thread (the function) does not accidentally destroy the stack pointer (ESP). Conversely, when a function exits, one typically encounters a stack and EBP restorations. This is required since the calling function has a different reference from which it is working:
pop EBP ret
One should conclude that EBP can be relative to the function (if used), while ESP is relative to the thread. This implies when one sees EBP-0xn, the function is referring to local storage in the function. EBP+0xn signals the thread is accessing a local variable which was created by the calling function, or a function in the call chain.
When viewing a disassembly, it is not uncommon to encounter xor eax, eax and or eax, 0xFFFFFFFF. The first instruction is equivalent to mov eax, 0, while the second is equivalent to mov eax, -1. They are optimized versions of generated code. Usually the instruction sequences use less space than their equivalent cousins.
Code Graft 1Code Graft 1 is the base line on which the remaining examples will expand upon. Due to the compiler and linker's behavior, the first example is somewhat more complex than desired. The following will detail the issues encountered, and outline the general workarounds used in this article.
The source code to the first sample is listed below. main() calls Add(), which adds two numbers. The result is then displayed on standard output.
int main( )
{
DWORD Augend = 32437; // 0x7EB5
DWORD Addend = 15369; // 0x3C09
DWORD Sum = 0; // 0xBABE
Sum = Add ( Augend, Addend );
cout << _T("Augend: ") << Augend << endl;
cout << _T("Addend: ") << Addend << endl;
cout << _T(" Sum: ") << Sum << endl;
return 0;
}
DWORD Add( DWORD Augend, DWORD Addend )
{
DWORD result = Augend + Addend;
return result;
}
The first issue encountered is that storage layout does not honor source code declarations. Refer to Figure 9.
|
|
|
Figure 9: Storage Layout versus Source Code Declaration |
The variables are declared and initialized in the following order:
However, the layout in memory is:
The second issue encountered is the Add() function's use of variables. Add() creates a scratch variable (result) and adds the two values. The result is then returned to main(). Since Add() accepts two arguments - Augend and Addend, one would expect to the function to operate on EBP+0x04 and EBP+0x08. EBP+0x04 and EBP+0x08 are the expected relative base addresses since it is presumed they have been pushed on the stack. For the temporary result, it is expected the value would be returned in one of two ways:
However, with execution halted in Add(), a different scenario is observed. Refer to Figure 10.
|
|
|
Figure 10: Storage Layout versus Source Code Declaration |
Before executing the instructions of Add() (but after entering the function), the stack appears as below. Refer to figure 11.
|
|
|
Figure 11: Stack Layout After Calling Add() |
Once the function has been executed (but before the execution of the return), the stack layout is as shown in Figure 12.
|
|
||
|
Figure 12: Stack Layout |
||
Table 1 explains the values with respect to their address (RVA).
|
Item |
Address |
Value |
Comment |
|
1 |
0x12FF5C |
BABE |
Add::result (created by Add) |
|
2 |
0x12FF60 |
0012FF7C |
EBP of main |
|
3 |
0x12FF64 |
00401028 |
Return Address |
|
4 |
0x12FF68 |
7EB5 |
Add::Augden (pushed by main) |
|
5 |
0x12FF6C |
3C09 |
Add::Addend (pushed by main) |
|
6 |
0x12FF70 |
0 |
main::Result |
|
7 |
0x12FF74 |
7EB5 |
main::Augend |
|
8 |
0x12FF78 |
3C09 |
main::Addend |
|
Table 1: Stack Layout |
|||
Example two is Code Graft 1 less the function for Add() in the C++ source file. Code Graft 1's generated code for Add() under WinDbg is shown below.
00401cc0 55 push ebp 00401cc1 8bec mov ebp, esp 00401cc3 51 push ecx 00401cc4 8b4508 mov eax, dword ptr [ebp+8] 00401cc7 03450c add eax, dword ptr [ebp+0Ch] 00401cca 8945fc mov dword ptr [ebp-4], eax 00401ccd 8b45fc mov eax, dword ptr [ebp-4] 00401cd0 8be5 mov esp, ebp 00401cd2 5d pop ebp 00401cd3 c3 ret
At this point, the donor (Code Graft 1) is providing 20 bytes of code for the recipient (Code Graft 1). The easiest way to incorporate the functionality is through an Assembly file (with a custom build step) added to the project. This method has the added benefit of allowing incorporation of both x32 and x64 routines since inline assembly is not being used.
There are other methods available to incorporate the graft. The first is to edit the memory directly in WinDbg. This creates a temporary exhibit. Second, inline assembly could be used to emit the instruction sequence. This has the down side that inline assembly is not supported on x64 platforms. The third alternative is patching. Patching an executable usually falls under the purview of Viruses and Crackers. Patching is left as an exercise to the reader. Two changes were require to successfully link the executable:
Add() was changed to Addition() because add is a reserved word in the assembler (MASM). extern "C" was added due to name mangling and link error LNK2001: unresolved external symbol "unsigned long __cdecl Addition(unsigned long,unsigned long)" (?Addition@@YAKKK@Z). The nearly unchanged C++ file is shown below.
extern "C" DWORD Addition( DWORD, DWORD ); int main( int argc, char* argv[] ) { DWORD Augend = 32437; // 0x7EB5 DWORD Addend = 15369; // 0x3C09 DWORD Sum = 0; Sum = Addition ( Augend, Addend ); cout << _T("Augend: ") << Augend << endl; cout << _T("Addend: ") << Addend << endl; cout << _T(" Sum: ") << Sum << endl; return 0; }
To begin, create a file named Addition.asm in the project directory. Next add the file to the project. Refer to Figure 13.
|
|
||
|
Figure 13: Adding ASM File to Project |
||
In later versions of Visual Studio, the environment will ask if it should use the masm.rules Custom Build Rules. Select OK. Refer to figure 14.
|
|
||
|
Figure 14: MASM Custom Build Rule |
||
If the Custom Build Rules is not available, add the following as a Custom Build Step:
After adding the ASM file to the project, the project will appears as in Figure 15.
|
|
|
Figure 15: Addition of |
Next add the following to Addition.asm. Below demonstrates the minimum requirements for an assembly procedure.
PUBLIC Addition .486 .MODEL FLAT, C .CODE Addition PROC push ebp ; Save Caller's EBP mov ebp, esp ; Grab our Frame Reference push ecx ; Storage for local 'result' mov eax, dword ptr [ebp+8] ; Augend add eax, dword ptr [ebp+0Ch] ; Addend mov dword ptr [ebp-4], eax ; Temporary 'result' mov eax, dword ptr [ebp-4] ; ??? Already in EAX mov esp, ebp ; Clean 'push ECX' from stack pop ebp ; Resotre Caller's EBP ret Addition ENDP END ; End of .CODE
PUBLIC Addition informs the linker that the procedure Addition is available for any module to use. .486 is a Processor Directive. .MODEL is a Simplified Segment Directive which directs MASM to generate code for a particular memory model. The language ("C") informs the assembler of the calling convention.
.CODE is another Simplified Segment Directive. .CODE begins the code section, while END marks the end of the code section. Other sections exist, such as .DATA. PROC and ENDP are Procedure Directives which book-end the Addition function. Additional procedures would be book-ended in a similar manner with a different label. Residing in the Addition procedure is the copy and paste code from Code Graft 1.
After compiling and linking, the first thing that is noticed is the second executable is 0x200 bytes (one paragraph) smaller than the first program. Refer to Figure 14.
|
|
||
|
Figure 14: Comparison of Executable File Sizes |
||
However, the generated code has remained unchanged as far as execution of main() and Addition(). Refer to Figure 15.
|
|
|
Figure 15: Source Code Analysis in WinDbg |
Code Graft 3 will demonstrate code grafting from a foreign executable. Specifically, it will reuse CheckSumMappedFile() from Imagehlp.dll. Imagehlp.dll is a single threaded library, so this is an opportunity to improve the function. A nearly complete treatment of the PE Checksum algorithm was presented in An Analysis of the
Windows
PE Checksum Algorithm.
To begin, download the PE Checksum Source code. Open StdAfx.h and comment the references to imagehlp.dll; and add a prototype for CheckSumMemMapFile(). The name change was incorporated due to linking with Imagehlp.lib (even though it was not specified).
extern "C" { PIMAGE_NT_HEADERS /*WINAPI*/ CheckSumMemMapFile( PVOID BaseAddress, DWORD FileLength, PDWORD ExistingCheckSum, PDWORD CalculatedCheckSum ); }
The extern "C" is required due to name mangling. Notice also that WINAPI (a macro for __stdcall) is missing. This is due to link errors when attempting to link the object files. I suspect this might be a packing issue, but I have not investigated further.
LNK1190: invalid fixup found, type 0x0002
Since the model is no longer STDCALL, the routines from used from Imagehlp.dll will require conversion. The three most prevalent issues are:
Add an assembly file to the project named "CheckSum.asm". Create three procedures in CheckSum.asm: CheckSumMemMapFile, _ChkSum, and _ImageNtHeader. Acquire the assembly code for CheckSumMappedFile() and ChkSum() from imagehlp.dll and place it in CheckSum.asm under their respective procedures. Leave _ImageNtHeader empty at this point. _ImageNtHeader will be a hand coded replacement used in lieu of Imagehlp.dll's call to RtlpImageNtHeader() of NTDLL.DLL.
The only procedure which requires the PUBLIC attribute is CheckSumMemMapFile. This leaves _ChkSum and _ImageNtHeader 'private' procedures for use by CheckSumMemMapFile.
Alternately, use the listing files for the functions provided in this article. The files are CheckSummMappedFile.listing and ChkSum.listing. The listing files were created from a Copy and Paste operation in WinDbg while examining the original CodeGraft.exe. Refer to Figure 16.
|
|
|
Figure 16: WinDbg Copy and Paste |
At this point the listing includes memory addresses and opcodes, and mnemonics. Create labels for any jumps encountered (changing CheckSumMappedFile to CheckSumMemMapFile). For example, at 0x76c96f3b is the following instruction:
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
The jump target is 0x76C9F5A. At that location, create a label. Note that the label name is based on the location provided by the disassembly (the '+' has been changed to '_'):
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a) ... 76c96f57 8b7de4 mov edi,dword ptr [ebp-1Ch] CheckSumMemMapFile_0x4f: 76c96f5a 85c0 test eax,eax
Finally, clean the original instruction to coincide with the jump to the label:
76c96f3b eb1d jmp CheckSumMemMapFile_0x4f
There are areas of the code which appear to be artifacts. Examine 0x76c96fc8 for instance. Since the is no assembly mnemonic to generate the opcode, create the code using the DB directive. Note that when using hex notation in MASM, prefix the number with a '0'. DUP is an operator which creates a data byte the requested number of times.
;; 76c96fc8 ff ??? ;; 76c96fc9 ff ??? ;; 76c96fca ff ??? DB 3 DUP(0FFh) ;; 76c96fcb ff426f inc dword ptr [edx+6Fh] DB 0FFh, 042h, 06Fh ;; 76c96fce c9 leave DB 0C9h ;; 76c96fcf 764b jbe imagehlp!MapFileAndCheckSumA+0x43 (76c9701c) DB 076h, 04Bh ;; 76c96fd1 6f outs dx,dword ptr [esi] DB 06Fh ;; 76c96fd2 c9 leave DB 0C9h ;; 76c96fd3 7690 jbe imagehlp!CheckSumMappedFile+0x5a (76c96f65) DB 076h, 090h ;; 76c96fd5 90 nop ;; 76c96fd6 90 nop ;; 76c96fd7 90 nop ;; 76c96fd8 90 nop DB 4 DUP(090h)
Finally, one can remove the unneeded material in the listing. To remove an item in the listing, simply comment it:
;; push 10h ;; push offset `string'+0x3c (76c96fc8) ;; call _SEH_prolog (76c934b9) mov esi,dword ptr [ebp+10h] and dword ptr [esi],0 mov eax,dword ptr [ebp+0Ch] shr eax,1 push eax push dword ptr [ebp+8] push 0 ; 76c96f1e e856d6ffff call ChkSum (76c94579) call _ChkSum
The original code installed a Structured Exception Handler upon entry. CodeGraft.exe code is wrapping the code in a handler, so the installation can be skipped. The removing of the handler is realized by commenting out the call above. This creates a stack imbalance that will be addressed in the STDCALL to C CALL conversion.
This step requires the most analysis. This is due to the fact that Frame Pointers are missing. So each procedure will receive the customary
push ebp mov ebp, esp
Once the additional push is encountered, diligence must be paid to code/stack dependencies. The CheckSumMemMapFile is shown below. Instructions in CAPITAL letters were added for stack management. Commented lines were removed. Finally, STDCALL performs a ret n, where n is an adjustment to ESP. C-CALL uses a vanilla ret, with the callee performing the stack adjustment. The result of the cleanup is available as CodeGraft4.zip.
CheckSumMemMapFile PROC
;;push 10h
;;push offset `string'+0x3c (76c96fc8)
; Inspecting 0x76c96fc8 shows this is '-1'...
; push 0FFFFFFFFh
;; 76c96f08 e8acc5ffff call _SEH_prolog (76c934b9)
PUSH EBP ; Reference
MOV EBP, ESP
SUB ESP, 10h ; Space for 4 Temporary Variables
; T1: EBP-10h use in place of ebp-18h
; T2: EBP-0Ch use in place of ebp-1Ch
; T3: EBP-08h use in place of ebp-20h
; T4: EBP-04h use in place of ebp-04h
mov esi,dword ptr [ebp+10h] ; Header CheckSum Variable (Read From PE Header)
and dword ptr [esi],0 ; Header CheckSum = 0
mov eax,dword ptr [ebp+0Ch] ; File Size
shr eax,1 ; File Size = File Size / 2
push eax ; Parameter 3: File Size
push dword ptr [ebp+8] ; Parameter 2: Source (pBaseAddress)
push 0 ; Parameter 1: Partial Sum
; 76c96f1e e856d6ffff call _ChkSum@4(76c94579)
call _ChkSum
;; No Longer STDCALL
;; Clean the parameters from the Stack
ADD ESP, 0Ch
mov edi,eax ; EDI = Return from _ChkSum
mov dword ptr [EBP-0Ch],edi ; Sum
and dword ptr [EBP-04h],0 ; File Size = 0???
;; push dword ptr [ebp+8]
;; 76c96f2f e81ed2ffff call RtlpImageNtHeader (76c94152)
push [ebp+8] ; Source (pBaseAddress)
call _ImageNTHeader
ADD ESP, 4 ; Stack Maintenance - No longer STDCALL
mov dword ptr [EBP-08h],eax
or dword ptr [EBP-04h],0FFFFFFFFh ; EBP-04h = -1
jmp _CheckSum_0x4f
;; Retain the Noise Bytes
DB 5 DUP (090h)
xor eax,eax
inc eax
ret
;; Retain the Noise Bytes
DB 5 DUP (090h)
mov esp,dword ptr [EBP-10h] ; Local Temporary Storage
xor eax,eax
or dword ptr [EBP-04h],0FFFFFFFFh ; Local Temporary Storage
mov esi,dword ptr [ebp+10h] ; Local Temporary Storage
mov edi,dword ptr [EBP-0Ch] ; Local Temporary Storage
_CheckSumMemMapFile_0x4f:
test eax,eax
je _CheckSum_0x90
cmp eax,dword ptr [ebp+8]
je _CheckSum_0x90
mov cx,word ptr [eax+18h]
cmp cx,10Bh
je _CheckSum_0x6a
cmp cx,20Bh
jne _CheckSum_0xb5
_CheckSumMemMapFile_0x6a:
lea ecx,[eax+58h] ; Existing (Header) Checksum
mov edx,dword ptr [ecx] ; This routine removes the existing
mov dword ptr [esi],edx ; Checksum from the calculated value
xor edx,edx ;
mov dx,word ptr [ecx] ; Notice the use of Subtract with Borrow (sbb)
cmp di,dx ;
sbb esi,esi ; This is consistent with the Documnetation stating
neg esi ; 'Calculate the checksum of the file
add esi,edx ; with the the existing taken as 0.'
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSumMemMapFile_0x90:
mov ecx,dword ptr [ebp+0Ch]
test cl,1
je _CheckSumMemMapFile_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSumMemMapFile_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
;; 76c96fb8 e83cc5ffff call _SEH_epilog (76c934f9)
ADD ESP, 10h
POP EBP
;; No Longer STDCALL
;; ret 10h
ret
CheckSumMemMapFile ENDP
The first change encountered was removing the SEH. The program is wrapping the operation in a handler, so adding the SEH mechanism at this level was abandoned. The next addition is that of a reference by push ebp and mov ebp, esp.
SUB ESP, 10h ; Space for 4 Temporary Variables ; T1: EBP-10h use in place of ebp-18h ; T2: EBP-0Ch use in place of ebp-1Ch ; T3: EBP-08h use in place of ebp-20h ; T4: EBP-04h use in place of ebp-04h
The original code would access EBP-0x1C, without reserving stack space. Analysis revealed the stack needed to accommodate four DWORDs. The above accomplishes the task. Below, a manual stack adjustment completes the procedure and restoration of EBP.
ADD ESP, 10h POP EBP ;; No Longer STDCALL ;; ret 10h ret
As Joe Partridge pointed out, the original port missed the use of ESI above. Since the register was used, it must be saved and restored. EBX, ESI, EDI, and EBP must be preserved during function invocation. EAX, ECX, and EDX are scratch registers.
This procedure is basically unchanged. Since the procedure is moving values placed on the stack (parameters) into registers, a local frame reference was not created. The noticeable effect of conversion is the changing of ret 0Ch to ret since the caller is now cleaning the stack. _ChkSum can be examined in detail in An Analysis of the Windows PE Checksum Algorithm.
_ChkSum PROC
push esi
mov ecx,[esp+10h] ; File Size / 2
mov esi,[esp+0Ch] ; Source (pBaseAddress)
mov eax,[esp+8] ; Partial Sum
shl ecx,1 ; File Size = File Size * 2
je _ChkSum_0x16e
test esi,2
je _ChkSum_0x2d
sub edx,edx
mov dx,[esi]
add eax,edx
adc eax,0
add esi,2
sub ecx,2
...
_ChkSum_0x16e:
mov edx,eax ;; Fold 32 bits in 16
shr edx,10h
and eax,0FFFFh
add eax,edx
mov edx,eax
shr edx,10h
add eax,edx
and eax,0FFFFh
pop esi
;; No longer STDCALL
;; ret 0Ch
ret
_ChkSum ENDP
_ChkSum is not using a local stack frame - it is accessing the parameters using ESP:
push esi
mov ecx,[esp+10h] ; File Size / 2
mov esi,[esp+0Ch] ; Source (pBaseAddress)
mov eax,[esp+8] ; Partial Sum
This could be converted to use a local frame reference as follows (with the appropriate epilogue):
PUSH EBP MOV EBP, ESP push esi ;; Stack Based ;; mov ecx,[esp+10h] ; File Size / 2 ;; mov esi,[esp+0Ch] ; Source (pBaseAddress) ;; mov eax,[esp+8] ; Partial Sum ;; Frame Based mov ecx,[EBP+10h] ; File Size / 2 mov esi,[EBP+0Ch] ; Source (pBaseAddress) mov eax,[EBP+08h] ; Partial Sum
In the above conversion, the offsets used to reference values through EBP and ESP were the same. In this example, it was simply coincidence. This may not always be the case.
_ImageNtHeader is a hand coded replacement for the original call to RtlpImageNtHeader(). The procedure takes the pointer to the memory mapped file, and adds to it the value of e_lfanew of the IMAGE_DOS_HEADER. The function returns the sum on success (a pointer to the IMAGE_NT_HEADER), or NULL on failure.
_ImageNtHeader PROC
push ebp
mov ebp, esp
push esi
;; ESI = pBaseAdddress
mov eax, dword ptr[ ebp+08h ]
mov esi, eax
;; pBaseAdddress == NULL?
cmp esi, 0
je NULLRETURN
;; pBaseAdddress == 0xFFFFFFFF?
cmp esi, 0FFFFFFFFh
je NULLRETURN
;; MZ Signature
cmp byte ptr [ESI], 'M'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'Z'
jne NULLRETURN
;; ESI is a pointer to IMAGE_DOS_HEADER
;; Grab the e_lfanew DWORD
;
; IMAGE_DOS_HEADER
; is 64 bytes (0x40) long
;
; e_lfanew occupies bytes
; IMAGE_DOS_HEADER[60-63]
;
; ESI+060 is _not_ Hex!!!
;
mov eax, esi
add eax, dword ptr[ ESI+060 ] ; value at e_lfanew
mov esi, eax
;; PE Signature
cmp byte ptr [ESI], 'P'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'E'
jne NULLRETURN
cmp byte ptr [ESI+02h], 0
jne NULLRETURN
cmp byte ptr [ESI+03h], 0
jne NULLRETURN
;;
;; EAX = IMAGE_NT_HEADER pointer
;;
jmp CLEANSTACK
NULLRETURN:
mov eax, 0
CLEANSTACK:
pop esi
pop ebp
ret
_ImageNtHeader ENDP
The fifth sample incorporates the previous examples, with the addition of optimizations applied to CheckSumMemMapFile and _ChkSum.
CheckSumMemMapFile can be further cleaned by observing the local variables serve no puropse in the code. In addition, the artifacts can be removed if the execution path is sent to the 'Abort' jump after cmp cx,20Bh (IMAGE_NT_OPTIONAL_HDR64_MAGIC). The cleaned routine is available in example four.
CheckSumMemMapFile PROC
PUSH EBP ; Create Local Stack Frame
MOV EBP, ESP
PUSH ESI
mov esi,dword ptr [ebp+10h] ; Header CheckSum Variable (Read from PE Header)
and dword ptr [esi],0 ; Header CheckSum = 0
mov eax,dword ptr [ebp+0Ch] ; File Size
shr eax,1 ; File Size = File Size / 2
push eax ; Parameter 3: File Size
push dword ptr [ebp+8] ; Parameter 2: Source (pBaseAddress)
push 0 ; Parameter 1: Partial Sum
call _ChkSum
ADD ESP, 0Ch ; C-CALL, adjust stack
mov edi,eax ; EDI = Return from _ChkSum
push [ebp+8] ; Source (pBaseAddress)
call _ImageNTHeader
ADD ESP, 4 ; C-CALL, adjust stack
test eax,eax ; Return from _ImageNTHeader. Is it NULL?
je _CheckSum_0x90 ; Abort
cmp eax,dword ptr [ebp+8] ; pBaseAddress == _ImageNTHeader
je _CheckSum_0x90 ; Abort
mov cx,word ptr [eax+18h] ; IMAGE_OPTIONAL_HEADER.Magic
cmp cx,10Bh ; IMAGE_NT_OPTIONAL_HDR32_MAGIC
je _CheckSum_0x6a
cmp cx,20Bh ; IMAGE_NT_OPTIONAL_HDR64_MAGIC
jne _CheckSum_0x90 ; Abort
_CheckSum_0x6a:
lea ecx,[eax+58h] ; ADDRESSOF(IMAGE_OPTIONAL_HEADER.Checksum)
mov edx,dword ptr [ecx] ; IMAGE_OPTIONAL_HEADER.Checksum (dereference)
mov dword ptr [esi],edx ; Save To Callee parameter dwHeaderCheckSum
xor edx,edx ; EDX = 0
mov dx,word ptr [ecx] ; 2 bytes at IMAGE_OPTIONAL_HEADER.Checksum
cmp di,dx ; DI = result of _ChkSum
sbb esi,esi
neg esi
add esi,edx
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSum_0x90:
mov ecx,dword ptr [ebp+0Ch] ; File Size
test cl,1
je _CheckSum_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSum_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
POP ESI
POP EBP
ret
CheckSumMemMapFile ENDP
A final peep hole optimization can be enjoyed in the main summation loop of _ChkSum. This supplement will take advantage of the processor's ability to schedule simultaneous instructions. The lesser summations (0x40 DWORDs, 0x20 DWORDs, 0x10 DWORDs, etc) will be skipped since they are encountered at most once during the routine's execution.
Because most time in this routine is spent executing the loop below (consuming 0x80 DWORDS), a further optimization would include performing the push ebx and push edx once. Once summation is complete, perform the respective pops before exiting at jne _ChkSum_0xe8.
_ChkSum_0xe8:
PUSH EBX
PUSH EDX
XOR EBX, EBX
XOR EDX, EDX
add eax,dword ptr [esi]
adc EBX,dword ptr [esi+4]
adc EDX,dword ptr [esi+8]
adc eax,dword ptr [esi+0Ch]
adc EBX,dword ptr [esi+10h]
adc EDX,dword ptr [esi+14h]
adc eax,dword ptr [esi+18h]
adc EBX,dword ptr [esi+1Ch]
adc EDX,dword ptr [esi+20h]
adc eax,dword ptr [esi+24h]
adc EBX,dword ptr [esi+28h]
adc EDX,dword ptr [esi+2Ch]
adc eax,dword ptr [esi+30h]
adc EBX,dword ptr [esi+34h]
adc EDX,dword ptr [esi+38h]
adc eax,dword ptr [esi+3Ch]
adc EBX,dword ptr [esi+40h]
adc EDX,dword ptr [esi+44h]
adc eax,dword ptr [esi+48h]
adc EBX,dword ptr [esi+4Ch]
adc EDX,dword ptr [esi+50h]
adc eax,dword ptr [esi+54h]
adc EBX,dword ptr [esi+58h]
adc EDX,dword ptr [esi+5Ch]
adc eax,dword ptr [esi+60h]
adc EBX,dword ptr [esi+64h]
adc EDX,dword ptr [esi+68h]
adc eax,dword ptr [esi+6Ch]
adc EBX,dword ptr [esi+70h]
adc EDX,dword ptr [esi+74h]
adc eax,dword ptr [esi+78h]
adc EBX,dword ptr [esi+7Ch]
ADC EAX, EBX
ADC EAX, EDX
adc eax,0
POP EDX
POP EBX
add esi,80h
sub ecx,80h
jne _ChkSum_0xe8
...
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 7 Mar 2008 Editor: |
Copyright 2007 by Jeffrey Walton Everything else Copyright © CodeProject, 1999-2009 Web18 | Advertise on the Code Project |