Grafting Compiled Code: Unlimited Code Reuse






4.83/5 (27 votes)
Add functionality to a project using existing compiled machine code.
- Download sample 1 source code - 3.43 KB
- Download sample 2 source code - 3.91 KB
- Download sample 3 source code - 24.9 KB
- Download sample 4 source code - 24.4 KB
- Download sample 5 source code - 42.3 KB
- Download assembler source code - 2.02 KB
- Download listing - checksum mapped file - 1.27 KB
- Download listing - checksum - 1.28 KB
Introduction
This article will demonstrate techniques to incorporate compiled machine code into an existing project using ASM source files. The assembly source file will be created from the compiled machine code. In addition, the article will remove the single thread safety limitation of Imagehlp.dll, and show techniques for converting a compiled STDCALL procedure to a C-CALL assembly language routine.
Example one will present a standard console application. The program adds two numbers and then returns the result. The example will introduce a few of the obstacles which will be encountered. Sample two will incorporate the graft using the compiled machine code of Add()
from the first.
Finally, example three will use foreign code from imagehlp.dll to supplement the PEChecksum program. This will remove the requirement of the imagehlp library dependency from a project and fully demonstrate the techniques. The PEChecksum program was presented in An Analysis of the PE Checksum Algorithm. Those who are familiar with x86 assembly, WinDbg, and IDA Pro should begin at example three.
The samples will use standard C as the language. For those interested in reusing compiled C++, please see Paul Vincent Sabanal's and Mark Vincent Yason's BlackHat 2007 presentation, Reversing C++.
Optimizations
To keep the first two examples academic, optimizations will be disabled. In general, release code with optimizations lack the structure desired for the samples presented in this article. Release samples will be built with optimizations disabled (/Od). Refer to Figure 1.
|
Figure 1: Disable Optimizations
|
With optimizations enabled, some function were inlined. For example, Figure 2 shows no corresponding call to a function that is present in the source file. wmain
is entered at 0x00401000. The first function call encountered is at 0x40101D. 0x40101D is a call to cout
. In this case, Add()
was optimized away.
|
Figure 2: Missing Function Call Due to Optimization
|
Another optimization which was not desired was the Frame Pointer Omission. FPO was not used to keep the examples easier to follow. The Frame Pointer is generally created with the instruction sequence shown below. The lack of a Frame Pointer adds a small wrinkle to an otherwise academic exercise. Frame Pointers are addressed in the 'Stack, ESP, and EBP' section.
Global Variables
Variables which are global to the process are placed in either the .data
section, or (if present) the .bss
section. The sections correspond to the initialized and uninitialized data sections respectively. Usually, a program refers to a global variable by address rather than relative based addressing (such as EBP
used in local variables). For example, notice the code generated for access to the global scratch variable in Figure 3.
|
Figure 3: Global Variable Storage Access
|
The address used is 0x00417000. When examining the program in PE Browse, the variable is listed under the .data
section at address 0x00417000. Refer to Figure 4.
|
Figure 4: Allocation of a Global Variable
|
The disassembly and PE header of an uninitialized global variable is shown in Figure 5. The four byte values at address 0x00403374 are garbage: CR, LF, [SPACE], and [SPACE].
|
Figure 5: Allocation of an Uninitialized Global Variable
|
The meaning of push ECX
will be examined in the section Local Variables.
Local Variables
Local variables are stored on the thread's stack. Figure 5 shows a lone push of ECX
, even though ECX
is not used in main()
per se. This is a technique to create local storage by the compiler for the variable i
. Rather than issuing sub ESP, 4
- a three byte opcode (0x83 0xEC 0x04), push ECX
- a one byte opcode (0x51) is used.
|
Figure 6: Local Allocation of a Single Variable
|
When a greater number of variables require storage allocations, a sub ESP, n
is used (where n is the number of bytes required). For example, Figure 7 shows the allocation of five DWORD
variables. Rather than issuing a series of five push ECX
, the compiler issues one sub ESP, 0x14
.
|
Figure 7: Local Allocation of Multiple Variables
|
Stack, ESP, and EBP
The stack is an area in memory which a thread uses as a 'scratch pad' during program execution. Each thread in a process has its own stack. One typically envisions memory for the stack as a contiguous region starting at a low address and moving sequentially to a large address. This is similar to addressing in a heap or an array - A[0] resides at a lower address than A[63]. However, unlike most memory access operations, stacks grow down.
ESP is the stack pointer, and is maintained by the processor. EBP (if used) is maintained by the thread. When the processor encounters a push n
instruction, two actions occur in the order listed below:
- the processor decrements ESP by the machine's word size
- the value n is placed on the stack at ESP
This implies ESP always points to the last value placed on the stack.
|
Figure 8: Argument Size and Call Stack
|
Because the push of a value is always of machine word size, pushing five consecutive bytes consumes 20 bytes (0x14) on the stack - even though the local allocation could use only two DWORD
s (8 bytes). Finally, there is no 0 extend for the truncated push of a byte. Whatever occupied the upper three bytes of register surfaces as part of the function's parameters, even though only the low order byte is of interest. This is embodied in the instruction sequence mov al, byte ptr [a]; push eax
. Refer to figure 8.
The final comment to make with regard to code generation is the compiler's awareness of multiple pipelines in the processor. Rather than reusing eax
by issuing:
mov al, byte ptr [a]
push eax
...
mov al, byte ptr [d]
push eax
...
the compiler will rotate register usage (eax
, ecx
, edx
) so that the execution pipe remains full. This optimization is critical to performance since there are no branches which might otherwise stall execution due to a branch prediction miss. So we could expect to see the following:
mov al, byte ptr [a]
push eax
...
mov dl, byte ptr [d]
push edx
...
When creating a stack frame for based address referencing, the compiler will issue a standard pair of instruction. The frame creates a well defined "function context". The typical prologue for stack based operations is the sequence:
push EBP
mov EBP, ESP
The above sequence is emitted in each function invoked so that a thread (the function) does not accidentally destroy the stack pointer (ESP). Conversely, when a function exits, one typically encounters a stack and EBP restorations. This is required since the calling function has a different reference from which it is working:
pop EBP
ret
One should conclude that EBP can be relative to the function (if used), while ESP is relative to the thread. This implies when one sees EBP-0xn, the function is referring to local storage in the function. EBP+0xn signals the thread is accessing a local variable which was created by the calling function, or a function in the call chain.
OR and XOR
When viewing a disassembly, it is not uncommon to encounter xor eax
, eax
, and or eax, 0xFFFFFFFF
. The first instruction is equivalent to mov eax, 0
, while the second is equivalent to mov eax, -1
. They are optimized versions of the generated code. Usually, the instruction sequences use less space than their equivalent cousins.
Code Graft 1
Code Graft 1 is the base line on which the remaining examples will expand upon. Due to the compiler and linker's behavior, the first example is somewhat more complex than desired. The following will detail the issues encountered and outline the general workarounds used in this article.
The source code to the first sample is listed below. main()
calls Add()
, which adds two numbers. The result is then displayed on standard output.
int main( )
{
DWORD Augend = 32437; // 0x7EB5
DWORD Addend = 15369; // 0x3C09
DWORD Sum = 0; // 0xBABE
Sum = Add ( Augend, Addend );
cout << _T("Augend: ") << Augend << endl;
cout << _T("Addend: ") << Addend << endl;
cout << _T(" Sum: ") << Sum << endl;
return 0;
}
DWORD Add( DWORD Augend, DWORD Addend )
{
DWORD result = Augend + Addend;
return result;
}
The first issue encountered is that storage layout does not honor source code declarations. Refer to Figure 9.
|
Figure 9: Storage Layout vs. Source Code Declaration
|
The variables are declared and initialized in the following order:
Augend
Addend
Sum
However, the layout in memory is:
Addend
(EBP-0x04
) - high memoryAugend
(EBP-0x08
)Sum
(EBP-0x0C
) - low memory
The second issue encountered is the Add()
function's use of variables. Add()
creates a scratch variable (result
) and adds the two values. The result is then returned to main()
. Since Add()
accepts two arguments - Augend
and Addend
, one would expect the function to operate on EBP+0x04 and EBP+0x08. EBP+0x04 and EBP+0x08 are the expected relative base addresses since it is presumed they have been pushed on the stack. For the temporary result, it is expected the value would be returned in one of two ways:
- through the use of the
Sum
variable at EBP+0x00 - through the use of
EAX
However, with execution halted in Add()
, a different scenario is observed. Refer to Figure 10.
|
Figure 10: Storage Layout vs. Source Code Declaration
|
Before executing the instructions of Add()
(but after entering the function), the stack appears as below. Refer to Figure 11.
|
Figure 11: Stack Layout After Calling Add()
|
Once the function has been executed (but before the execution of the return), the stack layout is as shown in Figure 12.
|
||
Figure 12: Stack Layout
|
Table 1 explains the values with respect to their address (RVA).
Item |
Address |
Value |
Comment |
1 |
0x12FF5C |
BABE |
Add::result (created by Add ) |
2 |
0x12FF60 |
0012FF7C |
EBP of main |
3 |
0x12FF64 |
00401028 |
Return address |
4 |
0x12FF68 |
7EB5 |
|
5 |
0x12FF6C |
3C09 |
|
6 |
0x12FF70 |
0 |
|
7 |
0x12FF74 |
7EB5 |
main::Augend |
8 |
0x12FF78 |
3C09 |
main::Addend |
Table 1: Stack Layout |
Code Graft 2
Example two is Code Graft 1 less the function for Add()
in the C++ source file. Code Graft 1's generated code for Add()
under WinDbg is shown below.
00401cc0 55 push ebp
00401cc1 8bec mov ebp, esp
00401cc3 51 push ecx
00401cc4 8b4508 mov eax, dword ptr [ebp+8]
00401cc7 03450c add eax, dword ptr [ebp+0Ch]
00401cca 8945fc mov dword ptr [ebp-4], eax
00401ccd 8b45fc mov eax, dword ptr [ebp-4]
00401cd0 8be5 mov esp, ebp
00401cd2 5d pop ebp
00401cd3 c3 ret
The Graft
At this point, the donor (Code Graft 1) provides 20 bytes of code for the recipient (Code Graft 1). The easiest way to incorporate the functionality is through an assembly file (with a custom build step) added to the project. This method has the added benefit of allowing incorporation of both x32 and x64 routines since inline assembly is not being used.
There are other methods available to incorporate the graft. The first is to edit the memory directly in WinDbg. This creates a temporary exhibit. Second, inline assembly could be used to emit the instruction sequence. This has the down side that inline assembly is not supported on x64 platforms. The third alternative is patching. Patching an executable usually falls under the purview of Viruses and Crackers. Patching is left as an exercise to the reader. Two changes were required to successfully link the executable:
- Change
Add()
toAddition()
- Function prototype of
Addition()
was changed toextern "C"
Add()
was changed to Addition()
because add
is a reserved word in the assembler (MASM). extern "C"
was added due to the name mangling and link error LNK2001: unresolved external symbol "unsigned long __cdecl Addition(unsigned long,unsigned long)" (?Addition@@YAKKK@Z). The nearly unchanged C++ file is shown below:
extern "C" DWORD Addition( DWORD, DWORD );
int main( int argc, char* argv[] )
{
DWORD Augend = 32437; // 0x7EB5
DWORD Addend = 15369; // 0x3C09
DWORD Sum = 0;
Sum = Addition ( Augend, Addend );
cout << _T("Augend: ") << Augend << endl;
cout << _T("Addend: ") << Addend << endl;
cout << _T(" Sum: ") << Sum << endl;
return 0;
}
To begin, create a file named Addition.asm in the project directory. Next, add the file to the project. Refer to Figure 13.
|
||
Figure 13: Adding an ASM File to the Project
|
In later versions of Visual Studio, the environment will ask if it should use the masm.rules Custom Build Rules. Select OK. Refer to figure 14.
|
||
Figure 14: MASM Custom Build Rule
|
If Custom Build Rules is not available, add the following as a Custom Build Step:
- Debug Command Line
- ml -c -Zi "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
- Release Command Line
- ml -c "-Fl$(IntDir)\$(InputName).lst" "-Fo$(IntDir)\$(InputName).obj" "$(InputPath)"
- Outputs
- $(IntDir)\$(InputName).obj
After adding the ASM file to the project, the project will appear as in Figure 15.
|
Figure 15: Addition of ASM File
|
Next, add the following to Addition.asm. The code below demonstrates the minimum requirements for an assembly procedure.
PUBLIC Addition
.486
.MODEL FLAT, C
.CODE
Addition PROC
push ebp ; Save Caller's EBP
mov ebp, esp ; Grab our Frame Reference
push ecx ; Storage for local 'result'
mov eax, dword ptr [ebp+8] ; Augend
add eax, dword ptr [ebp+0Ch] ; Addend
mov dword ptr [ebp-4], eax ; Temporary 'result'
mov eax, dword ptr [ebp-4] ; ??? Already in EAX
mov esp, ebp ; Clean 'push ECX' from stack
pop ebp ; Resotre Caller's EBP
ret
Addition ENDP
END ; End of .CODE
PUBLIC Addition
informs the linker that the procedure Addition
is available for any module to use. .486
is a Processor Directive. .MODEL
is a Simplified Segment Directive which directs MASM to generate code for a particular memory model. The language ("C") informs the assembler of the calling convention.
.CODE
is another Simplified Segment Directive. .CODE
begins the code section, while END
marks the end of the code section. Other sections exist, such as .DATA
. PROC
and ENDP
are Procedure Directives which book-end the Addition
function. Additional procedures would be book-ended in a similar manner with a different label. Residing in the Addition
procedure is the copy and paste code from Code Graft 1.
After compiling and linking, the first thing that is noticed is the second executable is 0x200 bytes (one paragraph) smaller than the first program. Refer to Figure 14.
|
||
Figure 14: Comparison of Executable File Sizes
|
However, the generated code has remained unchanged as far as execution of main()
and Addition()
. Refer to Figure 15.
|
Figure 15: Source Code Analysis in WinDbg
|
Code Graft 3
Code Graft 3 will demonstrate code grafting from a foreign executable. Specifically, it will reuse CheckSumMappedFile()
from Imagehlp.dll. Imagehlp.dll is a single threaded library, so this is an opportunity to improve the function. A nearly complete treatment of the PE Checksum algorithm was presented in An Analysis of the Windows PE Checksum Algorithm.
To begin, download the PE Checksum Source code. Open StdAfx.h and comment the references to imagehlp.dll; and add a prototype for CheckSumMemMapFile(). The name change was incorporated due to linking with Imagehlp.lib (even though it was not specified).
extern "C" {
PIMAGE_NT_HEADERS /*WINAPI*/ CheckSumMemMapFile(
PVOID BaseAddress,
DWORD FileLength,
PDWORD ExistingCheckSum,
PDWORD CalculatedCheckSum
);
}
extern "C"
is required due to name mangling. Notice also that WINAPI
(a macro for __stdcall
) is missing. This is due to link errors when attempting to link the object files. I suspect this might be a packing issue, but I have not investigated further.
LNK1190: invalid fixup found, type 0x0002
Since the model is no longer STDCALL
, the routines used from Imagehlp.dll will require conversion. The three most prevalent issues are:
- STDCALL to C Call Conversion (stack cleanup)
- Addition of Local Frame References (EBP)
- Artifact Cleanup
Add an assembly file to the project named "CheckSum.asm". Create three procedures in CheckSum.asm: CheckSumMemMapFile
, _ChkSum
, and _ImageNtHeader
. Acquire the assembly code for CheckSumMappedFile()
and ChkSum()
from imagehlp.dll and place it in CheckSum.asm under their respective procedures. Leave _ImageNtHeader
empty at this point. _ImageNtHeader
will be a hand coded replacement used in lieu of Imagehlp.dll's call to RtlpImageNtHeader()
of NTDLL.DLL.
The only procedure which requires the PUBLIC
attribute is CheckSumMemMapFile
. This leaves the _ChkSum
and _ImageNtHeader
'private' procedures for use by CheckSumMemMapFile
.
Alternately, use the listing files for the functions provided in this article. The files are CheckSummMappedFile.listing and ChkSum.listing. The listing files were created from a Copy and Paste operation in WinDbg while examining the original CodeGraft.exe. Refer to Figure 16.
|
Figure 16: WinDbg Copy and Paste
|
Labels
At this point, the listing includes memory addresses and opcodes, and mnemonics. Create labels for any jumps encountered (changing CheckSumMappedFile
to CheckSumMemMapFile
). For example, at 0x76c96f3b is the following instruction:
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
The jump target is 0x76C9F5A. At that location, create a label. Note that the label name is based on the location provided by the disassembly (the '+' has been changed to '_'):
76c96f3b eb1d jmp imagehlp!CheckSumMappedFile+0x4f (76c96f5a)
...
76c96f57 8b7de4 mov edi,dword ptr [ebp-1Ch]
CheckSumMemMapFile_0x4f:
76c96f5a 85c0 test eax,eax
Finally, clean the original instruction to coincide with the jump to the label:
76c96f3b eb1d jmp CheckSumMemMapFile_0x4f
Artifacts
There are areas of the code which appear to be artifacts. Examine 0x76c96fc8 for instance. Since there is no assembly mnemonic to generate the opcode, create the code using the DB
directive. Note that when using hex notation in MASM, prefix the number with a '0'. DUP
is an operator which creates a data byte the requested number of times.
;; 76c96fc8 ff ???
;; 76c96fc9 ff ???
;; 76c96fca ff ???
DB 3 DUP(0FFh)
;; 76c96fcb ff426f inc dword ptr [edx+6Fh]
DB 0FFh, 042h, 06Fh
;; 76c96fce c9 leave
DB 0C9h
;; 76c96fcf 764b jbe imagehlp!MapFileAndCheckSumA+0x43 (76c9701c)
DB 076h, 04Bh
;; 76c96fd1 6f outs dx,dword ptr [esi]
DB 06Fh
;; 76c96fd2 c9 leave
DB 0C9h
;; 76c96fd3 7690 jbe imagehlp!CheckSumMappedFile+0x5a (76c96f65)
DB 076h, 090h
;; 76c96fd5 90 nop
;; 76c96fd6 90 nop
;; 76c96fd7 90 nop
;; 76c96fd8 90 nop
DB 4 DUP(090h)
Finally, we can remove the unneeded material in the listing. To remove an item in the listing, simply comment it:
;; push 10h
;; push offset `string'+0x3c (76c96fc8)
;; call _SEH_prolog (76c934b9)
mov esi,dword ptr [ebp+10h]
and dword ptr [esi],0
mov eax,dword ptr [ebp+0Ch]
shr eax,1
push eax
push dword ptr [ebp+8]
push 0
; 76c96f1e e856d6ffff call ChkSum (76c94579)
call _ChkSum
Additional Fixups
The original code installed a Structured Exception Handler upon entry. The CodeGraft.exe code wraps the code in a handler so the installation can be skipped. The removing of the handler is realized by commenting out the call above. This creates a stack imbalance that will be addressed in the STDCALL to C CALL conversion.
STDCALL to C CALL Conversion
This step requires the most analysis. This is due to the fact that Frame Pointers are missing. So each procedure will receive the customary:
push ebp
mov ebp, esp
Once the additional push is encountered, diligence must be paid to code/stack dependencies. The CheckSumMemMapFile
is shown below. Instructions in capital letters were added for stack management. Commented lines were removed. Finally, STDCALL
performs a ret n
, where n is an adjustment to ESP
. C-CALL uses a vanilla ret
, with the callee performing the stack adjustment. The result of the cleanup is available as CodeGraft4.zip.
CheckSumMemMapFile
CheckSumMemMapFile PROC
;;push 10h
;;push offset `string'+0x3c (76c96fc8)
; Inspecting 0x76c96fc8 shows this is '-1'...
; push 0FFFFFFFFh
;; 76c96f08 e8acc5ffff call _SEH_prolog (76c934b9)
PUSH EBP ; Reference
MOV EBP, ESP
SUB ESP, 10h ; Space for 4 Temporary Variables
; T1: EBP-10h use in place of ebp-18h
; T2: EBP-0Ch use in place of ebp-1Ch
; T3: EBP-08h use in place of ebp-20h
; T4: EBP-04h use in place of ebp-04h
mov esi,dword ptr [ebp+10h] ; Header CheckSum Variable (Read From PE Header)
and dword ptr [esi],0 ; Header CheckSum = 0
mov eax,dword ptr [ebp+0Ch] ; File Size
shr eax,1 ; File Size = File Size / 2
push eax ; Parameter 3: File Size
push dword ptr [ebp+8] ; Parameter 2: Source (pBaseAddress)
push 0 ; Parameter 1: Partial Sum
; 76c96f1e e856d6ffff call _ChkSum@4(76c94579)
call _ChkSum
;; No Longer STDCALL
;; Clean the parameters from the Stack
ADD ESP, 0Ch
mov edi,eax ; EDI = Return from _ChkSum
mov dword ptr [EBP-0Ch],edi ; Sum
and dword ptr [EBP-04h],0 ; File Size = 0???
;; push dword ptr [ebp+8]
;; 76c96f2f e81ed2ffff call RtlpImageNtHeader (76c94152)
push [ebp+8] ; Source (pBaseAddress)
call _ImageNTHeader
ADD ESP, 4 ; Stack Maintenance - No longer STDCALL
mov dword ptr [EBP-08h],eax
or dword ptr [EBP-04h],0FFFFFFFFh ; EBP-04h = -1
jmp _CheckSum_0x4f
;; Retain the Noise Bytes
DB 5 DUP (090h)
xor eax,eax
inc eax
ret
;; Retain the Noise Bytes
DB 5 DUP (090h)
mov esp,dword ptr [EBP-10h] ; Local Temporary Storage
xor eax,eax
or dword ptr [EBP-04h],0FFFFFFFFh ; Local Temporary Storage
mov esi,dword ptr [ebp+10h] ; Local Temporary Storage
mov edi,dword ptr [EBP-0Ch] ; Local Temporary Storage
_CheckSumMemMapFile_0x4f:
test eax,eax
je _CheckSum_0x90
cmp eax,dword ptr [ebp+8]
je _CheckSum_0x90
mov cx,word ptr [eax+18h]
cmp cx,10Bh
je _CheckSum_0x6a
cmp cx,20Bh
jne _CheckSum_0xb5
_CheckSumMemMapFile_0x6a:
lea ecx,[eax+58h] ; Existing (Header) Checksum
mov edx,dword ptr [ecx] ; This routine removes the existing
mov dword ptr [esi],edx ; Checksum from the calculated value
xor edx,edx ;
mov dx,word ptr [ecx] ; Notice the use of Subtract with Borrow (sbb)
cmp di,dx ;
sbb esi,esi ; This is consistent with the Documnetation stating
neg esi ; 'Calculate the checksum of the file
add esi,edx ; with the the existing taken as 0.'
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSumMemMapFile_0x90:
mov ecx,dword ptr [ebp+0Ch]
test cl,1
je _CheckSumMemMapFile_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSumMemMapFile_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
;; 76c96fb8 e83cc5ffff call _SEH_epilog (76c934f9)
ADD ESP, 10h
POP EBP
;; No Longer STDCALL
;; ret 10h
ret
CheckSumMemMapFile ENDP
The first change encountered was removing the SEH. The program wraps the operation in a handler, so adding the SEH mechanism at this level was abandoned. The next addition is that of a reference by push ebp
and mov ebp, esp
.
SUB ESP, 10h ; Space for 4 Temporary Variables
; T1: EBP-10h use in place of ebp-18h
; T2: EBP-0Ch use in place of ebp-1Ch
; T3: EBP-08h use in place of ebp-20h
; T4: EBP-04h use in place of ebp-04h
The original code would access EBP-0x1C
, without reserving stack space. Analysis revealed the stack needed to accommodate four DWORD
s. The above accomplishes the task. Below, a manual stack adjustment completes the procedure and the restoration of the EBP
.
ADD ESP, 10h
POP EBP
;; No Longer STDCALL
;; ret 10h
ret
As Joe Partridge pointed out, the original port missed the use of ESI
above. Since the register was used, it must be saved and restored. EBX
, ESI
, EDI
, and EBP
must be preserved during function invocation. EAX
, ECX
, and EDX
are scratch registers.
ChkSum
This procedure is basically unchanged. Since the procedure is moving values placed on the stack (parameters) into registers, a local frame reference was not created. The noticeable effect of conversion is the changing of ret 0Ch
to ret
since the caller is now cleaning the stack. _ChkSum
can be examined in detail in An Analysis of the Windows PE Checksum Algorithm.
_ChkSum PROC
push esi
mov ecx,[esp+10h] ; File Size / 2
mov esi,[esp+0Ch] ; Source (pBaseAddress)
mov eax,[esp+8] ; Partial Sum
shl ecx,1 ; File Size = File Size * 2
je _ChkSum_0x16e
test esi,2
je _ChkSum_0x2d
sub edx,edx
mov dx,[esi]
add eax,edx
adc eax,0
add esi,2
sub ecx,2
...
_ChkSum_0x16e:
mov edx,eax ;; Fold 32 bits in 16
shr edx,10h
and eax,0FFFFh
add eax,edx
mov edx,eax
shr edx,10h
add eax,edx
and eax,0FFFFh
pop esi
;; No longer STDCALL
;; ret 0Ch
ret
_ChkSum ENDP
_ChkSum
is not using a local stack frame - it is accessing the parameters using ESP
:
push esi
mov ecx,[esp+10h] ; File Size / 2
mov esi,[esp+0Ch] ; Source (pBaseAddress)
mov eax,[esp+8] ; Partial Sum
This could be converted to use a local frame reference as follows (with the appropriate epilogue):
PUSH EBP
MOV EBP, ESP
push esi
;; Stack Based
;; mov ecx,[esp+10h] ; File Size / 2
;; mov esi,[esp+0Ch] ; Source (pBaseAddress)
;; mov eax,[esp+8] ; Partial Sum
;; Frame Based
mov ecx,[EBP+10h] ; File Size / 2
mov esi,[EBP+0Ch] ; Source (pBaseAddress)
mov eax,[EBP+08h] ; Partial Sum
In the above conversion, the offsets used to reference values through EBP
and ESP
were the same. In this example, it was simply coincidence. This may not always be the case.
_ImageNtHeader
_ImageNtHeader
is a hand coded replacement for the original call to RtlpImageNtHeader()
. The procedure takes the pointer to the memory mapped file, and adds to it the value of e_lfanew
of IMAGE_DOS_HEADER
. The function returns the sum on success (a pointer to IMAGE_NT_HEADER
), or NULL
on failure.
_ImageNtHeader PROC
push ebp
mov ebp, esp
push esi
;; ESI = pBaseAdddress
mov eax, dword ptr[ ebp+08h ]
mov esi, eax
;; pBaseAdddress == NULL?
cmp esi, 0
je NULLRETURN
;; pBaseAdddress == 0xFFFFFFFF?
cmp esi, 0FFFFFFFFh
je NULLRETURN
;; MZ Signature
cmp byte ptr [ESI], 'M'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'Z'
jne NULLRETURN
;; ESI is a pointer to IMAGE_DOS_HEADER
;; Grab the e_lfanew DWORD
;
; IMAGE_DOS_HEADER
; is 64 bytes (0x40) long
;
; e_lfanew occupies bytes
; IMAGE_DOS_HEADER[60-63]
;
; ESI+060 is _not_ Hex!!!
;
mov eax, esi
add eax, dword ptr[ ESI+060 ] ; value at e_lfanew
mov esi, eax
;; PE Signature
cmp byte ptr [ESI], 'P'
jne NULLRETURN
cmp byte ptr [ESI+01h], 'E'
jne NULLRETURN
cmp byte ptr [ESI+02h], 0
jne NULLRETURN
cmp byte ptr [ESI+03h], 0
jne NULLRETURN
;;
;; EAX = IMAGE_NT_HEADER pointer
;;
jmp CLEANSTACK
NULLRETURN:
mov eax, 0
CLEANSTACK:
pop esi
pop ebp
ret
_ImageNtHeader ENDP
Code Graft 5
The fifth sample incorporates the previous examples, with the addition of optimizations applied to CheckSumMemMapFile
and _ChkSum
.
Optimized CheckSumMemMapFile
CheckSumMemMapFile
can be further cleaned by observing the local variables that serve no purpose in the code. In addition, the artifacts can be removed if the execution path is sent to the 'Abort' jump after cmp cx,20Bh (IMAGE_NT_OPTIONAL_HDR64_MAGIC)
. The cleaned routine is available in example four.
CheckSumMemMapFile PROC
PUSH EBP ; Create Local Stack Frame
MOV EBP, ESP
PUSH ESI
mov esi,dword ptr [ebp+10h] ; Header CheckSum Variable (Read from PE Header)
and dword ptr [esi],0 ; Header CheckSum = 0
mov eax,dword ptr [ebp+0Ch] ; File Size
shr eax,1 ; File Size = File Size / 2
push eax ; Parameter 3: File Size
push dword ptr [ebp+8] ; Parameter 2: Source (pBaseAddress)
push 0 ; Parameter 1: Partial Sum
call _ChkSum
ADD ESP, 0Ch ; C-CALL, adjust stack
mov edi,eax ; EDI = Return from _ChkSum
push [ebp+8] ; Source (pBaseAddress)
call _ImageNTHeader
ADD ESP, 4 ; C-CALL, adjust stack
test eax,eax ; Return from _ImageNTHeader. Is it NULL?
je _CheckSum_0x90 ; Abort
cmp eax,dword ptr [ebp+8] ; pBaseAddress == _ImageNTHeader
je _CheckSum_0x90 ; Abort
mov cx,word ptr [eax+18h] ; IMAGE_OPTIONAL_HEADER.Magic
cmp cx,10Bh ; IMAGE_NT_OPTIONAL_HDR32_MAGIC
je _CheckSum_0x6a
cmp cx,20Bh ; IMAGE_NT_OPTIONAL_HDR64_MAGIC
jne _CheckSum_0x90 ; Abort
_CheckSum_0x6a:
lea ecx,[eax+58h] ; ADDRESSOF(IMAGE_OPTIONAL_HEADER.Checksum)
mov edx,dword ptr [ecx] ; IMAGE_OPTIONAL_HEADER.Checksum (dereference)
mov dword ptr [esi],edx ; Save To Callee parameter dwHeaderCheckSum
xor edx,edx ; EDX = 0
mov dx,word ptr [ecx] ; 2 bytes at IMAGE_OPTIONAL_HEADER.Checksum
cmp di,dx ; DI = result of _ChkSum
sbb esi,esi
neg esi
add esi,edx
sub edi,esi
movzx ecx,word ptr [ecx+2]
cmp di,cx
sbb edx,edx
neg edx
add edx,ecx
sub edi,edx
_CheckSum_0x90:
mov ecx,dword ptr [ebp+0Ch] ; File Size
test cl,1
je _CheckSum_0xa3
mov edx,dword ptr [ebp+8]
movzx dx,byte ptr [edx+ecx-1]
add edi,edx
_CheckSum_0xa3:
movzx edx,di
add edx,ecx
mov ecx,dword ptr [ebp+14h]
mov dword ptr [ecx],edx
POP ESI
POP EBP
ret
CheckSumMemMapFile ENDP
Optimized _ChkSum
A final peep hole optimization can be enjoyed in the main summation loop of _ChkSum
. This supplement will take advantage of the processor's ability to schedule simultaneous instructions. The lesser summations (0x40 DWORD
s, 0x20 DWORD
s, 0x10 DWORD
s, etc.) will be skipped since they are encountered at most once during the routine's execution.
Because the most time in this routine is spent executing the loop below (consuming 0x80 DWORD
s), a further optimization would include performing push ebx
and push edx
once. Once summation is complete, perform the respective pops before exiting at jne _ChkSum_0xe8
.
_ChkSum_0xe8:
PUSH EBX
PUSH EDX
XOR EBX, EBX
XOR EDX, EDX
add eax,dword ptr [esi]
adc EBX,dword ptr [esi+4]
adc EDX,dword ptr [esi+8]
adc eax,dword ptr [esi+0Ch]
adc EBX,dword ptr [esi+10h]
adc EDX,dword ptr [esi+14h]
adc eax,dword ptr [esi+18h]
adc EBX,dword ptr [esi+1Ch]
adc EDX,dword ptr [esi+20h]
adc eax,dword ptr [esi+24h]
adc EBX,dword ptr [esi+28h]
adc EDX,dword ptr [esi+2Ch]
adc eax,dword ptr [esi+30h]
adc EBX,dword ptr [esi+34h]
adc EDX,dword ptr [esi+38h]
adc eax,dword ptr [esi+3Ch]
adc EBX,dword ptr [esi+40h]
adc EDX,dword ptr [esi+44h]
adc eax,dword ptr [esi+48h]
adc EBX,dword ptr [esi+4Ch]
adc EDX,dword ptr [esi+50h]
adc eax,dword ptr [esi+54h]
adc EBX,dword ptr [esi+58h]
adc EDX,dword ptr [esi+5Ch]
adc eax,dword ptr [esi+60h]
adc EBX,dword ptr [esi+64h]
adc EDX,dword ptr [esi+68h]
adc eax,dword ptr [esi+6Ch]
adc EBX,dword ptr [esi+70h]
adc EDX,dword ptr [esi+74h]
adc eax,dword ptr [esi+78h]
adc EBX,dword ptr [esi+7Ch]
ADC EAX, EBX
ADC EAX, EDX
adc eax,0
POP EDX
POP EBX
add esi,80h
sub ecx,80h
jne _ChkSum_0xe8
...
Checksums
- CodeGraft1.zip
- CodeGraft2.zip
- CodeGraft3.zip
- CodeGraft4.zip
- CodeGraft5.zip
- CheckSumAsm.zip
- CodeGraft.zip
- PEChecksum.zip
MD5: F8958E18071F9FFDE17286AC4243C514
SHA-1: ECE1C15BA469CCABFF922C453A26C0BD6593CEEF
MD5: B457ED277E848A106F20F94B1CE275F4
SHA-1: 241E70C0660A652D4015C7787850DBA0684F62F8
MD5: 5DD1A1B16D47385577C8D7FF1DD49041
SHA-1: 98C5EFE3F2EA6CF5214C8A739FF99E1D60FD56EA
MD5: AC3800CF5714922D9930D7A2EAFCBD5C
SHA-1: 273C14760D4A438518513677424CE9A54E29294E
MD5: 8F8B25301DB6C77683FF8918CD679B21
SHA-1: 95D52336EE4EEC1A46D00ACD2BBC10C79489D41B
MD5: 35EA1BBC97F1A23E8F0B7D943BA0F9F3
SHA-1: B0BE1D8BF772958191114A55FFB343B5B829E240
MD5: c0d4468002f6ff82228323dd226093b5
SHA: 42bf918481881819fa8a1cc8f519303185964e15
MD5: C0D4468002F6FF82228323DD226093B5
SHA-1: 42BF918481881819FA8A1CC8F519303185964E15
Revisions
- 03.06.2008: General revisions and article formatting.
- 11.20.2007: Bug Fix - Added ESI preservation to
CheckSumMemMapFile
. - 11.05.2007: Initial release.