|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Downloads
IntroductionDetermining if a file's disk image has been altered after loading into memory can be a useful operation. Reasons for doing so would include hardware fault isolation - whether memory or disk - and detecting the effects of a binary patcher. Academia has been studying the problem of a program image integrity checking itself (and related issues) for some time. In academic literature this is known as self checksumming. Papers of interest include Strengthening Software Self-Checksumming via Self-Modifying Code by Giffin, Christodorescu, and Kruger; Watermarking, Tamper-Proofing, and Obfuscation - Tools for Software Protection by Collberg and Thomborson; Architectural Support for Copy and Tamper-Resistant Software; and finally Glen Wurster's thesis A Generic Attack on Hashing-Based Software Tamper Resistance. Microsoft employs a passive and semi-passive integrity system called Windows File Protection, which monitors Operating System files for inadvertent replacement. The passive system uses System File Checker to scan protected files for inadvertent replacement. In this system, the user must manually launch the tool to initiate the operation (hence the passive). In the semi-passive system, a protected directory is monitored. If the OS determines a file has been improperly replaced, the file will be restored from the cache, network installation point, or Windows CD. It is not possible for a developer to request protection from the Operating System. This may not be a bad situation, since Microsoft performs poorly when protecting it's own binaries from tampering. For examples of kernel patching, see Eliminating Explorer's Delay when Deleting an In Use File or ClearType over Remote Desktop in Windows XP by Dan Farino. To this end, this article will present the reader with the framework for performing integrity verification using Cryptographic Hash functions with Crypto++. Should the reader be inclined, part two of this article is available: Tamper Aware and Self Healing Code. Post-Build Executable Back Patching is also available, which demonstrates how to automate the process of back pacthing a value into a compile executable. This article does not cover the Linker and Loader behavior as Matt Pietrek's various Microsoft System Journal articles. The reader is encouraged to visit his articles listed in the Resources. Whether using Pietrek's articles or developing tools for examining executables, the reader should find that Microsoft is sufficiently vague in certain areas of the Portable Executable and Common Object File Format Specification and undocumented in others. Crypto++
Image ExecutionDepending on the source we use, we are told that the executable section can be found by name (for example, .text, .code, .textbss), or by examining the sections of an executable searching for Reading a Disk ImageThe first step in developing the system is based on the disk image. There is very little difference between an on-disk and in-memory image. Regardless of whether we use a pointer acquired via Winnt.h defines the structures and constants or interest. The first structure of interest is
The first order of business is mapping the disk file into memory. The task is accomplished as follows. /////////////////////////////////////////////////
if( 0 == GetModuleFileName( NULL, szFilename, PATH_SIZE ) )
{ return -1; }
/////////////////////////////////////////////////////////////
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if ( hFile == INVALID_HANDLE_VALUE )
{ return -1; }
/////////////////////////////////////////////////////////////
hFileMapping = CreateFileMapping( hFile, NULL,
PAGE_READONLY, 0, 0, NULL );
if ( NULL == hFileMapping )
{ return -1; }
/////////////////////////////////////////////////////////////
pBaseAddress = MapViewOfFile( hFileMapping,
FILE_MAP_READ, 0, 0, 0 );
if ( NULL == pBaseAddress )
{ return -1; }
The code to inspect the ////////////////////////////////////////////////////////////
pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( pBaseAddress );
if( pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
{ return -1; }
////////////////////////////////////////////////////////////
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
(PBYTE)pMappedFile + pDOSHeader->e_lfanew );
Once in possession of the typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
};
At this point, the reader should familiarize themselves with 32 and 64 bit versions of the NT header structures, as well as the /////////////////////////////////////////////////////////////
PIMAGE_NT_HEADERS pNTHeader = NULL;
pNTHeader = reinterpret_cast<PIMAGE_NT_HEADERS>(
(PBYTE)pBaseAddress + pDOSHeader->e_lfanew );
if(pNTHeader->Signature != IMAGE_NT_SIGNATURE )
{ return -1; }
/////////////////////////////////////////////////////////////
PIMAGE_FILE_HEADER pFileHeader = NULL;
pFileHeader = reinterpret_cast<PIMAGE_FILE_HEADER>(
(PBYTE)&pNTHeader->FileHeader );
/////////////////////////////////////////////////////////////
PIMAGE_OPTIONAL_HEADER pOptionalHeader = NULL;
pOptionalHeader = reinterpret_cast<PIMAGE_OPTIONAL_HEADER>(
(PBYTE)&pNTHeader->OptionalHeader );
/////////////////////////////////////////////////////////////
if( IMAGE_NT_OPTIONAL_HDR32_MAGIC != pNTHeader->OptionalHeader.Magic )
{ return -1; }
The
The PE Browse view of the Optional Header is shown below. The Optional Header is not optional; it is required in executable files, but not COFF object files.
Fields of interest in
In addition, the pImageBase + pNTHeader->OptionalHeader.AddressOfEntryPoint
Stepping over the previous two headers reveals a typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
};
The
Since this article is concerned with modified code, the section of interest is As Jeffrey Richter states, on other architectures /////////////////////////////////////////////////////////////
DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;
for( UINT i = 0; i < nSectionCount; i++ )
{
// When we find a Section such that
// Section Start <= Entry Point < Section End,
// we have found the .TEXT Section
if( pSectionHeader->VirtualAddress <= dwEntryPoint &&
dwEntryPoint < pSectionHeader->VirtualAddress +
pSectionHeader->Misc.VirtualSize )
{ break; }
pSectionHeader++;
}
Once the loop completes, /////////////////////////////////////////////////////////////
pCodeStart = (PVOID)((PBYTE)pBaseAddress +
pSectionHeader->PointerToRawData );
/////////////////////////////////////////////////////////////
dwCodeSize = pSectionHeader->Misc.VirtualSize;
Now armed with a foundation, one can use PE Browse to reveal the various structures of the PE executable. For example, the Import Directory found in
Examining this area of the disk file in fact displays the Import Table.
Finally, the code to read an executable's int _tmain(int argc, _TCHAR* argv[])
{
PVOID pBaseAddress = NULL;
DWORD dwRawData = 0;
PVOID pEntryPoint = NULL;
PVOID pCodeStart = NULL;
PVOID pCodeEnd = NULL;
SIZE_T dwCodeSize = 0;
GatherDiskImageInformation( pBaseAddress, dwRawData,
pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );
DumpDiskImageInformation( pBaseAddress, dwRawData,
pEntryPoint, pCodeStart, dwCodeSize, pCodeEnd );
HexDump( pCodeStart, pCodeStart, DUMP_SIZE );
return 0;
}
The difference between Base Address (0x350000) above and PE Browse base address (0x434000) below is superficial: 0x350000 was returned from
Reading a Memory ImageThe code to read a memory image is nearly the same as that of a disk image. The differences are:
int _tmain(int argc, _TCHAR* argv[])
{
HMODULE hModule = NULL;
PVOID pVirtualAddress = NULL;
PVOID pCodeStart = NULL;
PVOID pCodeEnd = NULL;
SIZE_T dwCodeSize = 0;
...
/////////////////////////////////////////////////////////////
pDOSHeader = static_cast<PIMAGE_DOS_HEADER>( (PVOID)hModule );
if(pDOSHeader->e_magic != IMAGE_DOS_SIGNATURE )
{ return -1; }
...
/////////////////////////////////////////////////////////////
DWORD dwEntryPoint = pNTHeader->OptionalHeader.AddressOfEntryPoint;
UINT nSectionCount = pNTHeader->FileHeader.NumberOfSections;
for( UINT i = 0; i < nSectionCount; i++ )
{
// When we find a Section such that
// Section Start <= Entry Point < Section End,
// we have found the .TEXT Section
if( pSectionHeader->VirtualAddress <= dwEntryPoint &&
dwEntryPoint < pSectionHeader->VirtualAddress +
pSectionHeader->Misc.VirtualSize )
{ break; }
pSectionHeader++;
}
...
}
Verifying IntegrityTo determine if the executable's In production, we would use a hash satisfying the requirements of a MDC, or Message Detection Code. MDCs are also known as Manipulation Detection Codes or less commonly Message Integrity Codes. MDCs satisfy two properties: One Way Hash Function (OWHF) and Collision Resistant Hash Function (CRHF). Hashes such as Whirlpool, RIPE-MD, or SHA-2 family (SHA224, SHA256, etc) comply with the requirements. These hash functions are preferred in part due to their digest length - each produces a signature of at least 160 bits. Because we now calculate both the on-disk and in-memory information, we have adjusted our variable accordingly. We also introduce the hash variables and objects: /////////////////////////////////////////////////
// On-Disk Variables
PVOID pBaseAddress = NULL;
DWORD dwRawData = 0;
PVOID pDiskEntryPoint = NULL;
PVOID pDiskCodeStart = NULL;
PVOID pDiskCodeEnd = NULL;
SIZE_T dwDiskCodeSize = 0;
/////////////////////////////////////////////////
// In-Memory Variables
HMODULE hModule = NULL;
PVOID pVirtualAddress = NULL;
PVOID pMemoryEntryPoint = NULL;
PVOID pMemoryCodeStart = NULL;
PVOID pMemoryCodeEnd = NULL;
SIZE_T dwMemoryCodeSize = 0;
/////////////////////////////////////////////////
// Hash Specific Variables
MD5 hash;
BYTE cbDiskHash[ MD5::DIGESTSIZE ];
BYTE cbMemoryHash[ MD5::DIGESTSIZE ];
Two functions BOOL CalculateHash( HashTransformation& hash,
PVOID pMessage, SIZE_T nMessageSize,
PBYTE pcbHashBuffer, SIZE_T nHashBufferSize )
{
if( nHashBufferSize != hash.DigestSize() )
{
ZeroMemory( pcbHashBuffer, nHashBufferSize );
return FALSE;
}
hash.Update( (const PBYTE)pMessage, nMessageSize );
hash.Final( pcbHashBuffer );
return TRUE;
}
In the listing above, we reuse the MD5 hash object - the object hashes both the on-disk and in-memory images. Calling Debug Builds
In Figure 8 the signatures are not consistent. This is because the capture was performed under the Visual Studio debugger. The debugger inserts software breakpoints, which affects the in-memory hash value. Although not very useful, a Debug build can be verified by running the program from the command prompt. The result is shown in Figure 9, which produces expected results. If we compare Figures 8 and, 9 we see that the on-disk signature is consistent regardless of whether the program is being hosted by the debugger.
To overcome the effects of the debuggers using software breakpoints, we can use WinDbg and set a breakpoint using a Debug Register. Taking from Ken Johnson: ...in WinDbg, if you use the 'ba' command then the code bytes in question will not be modified (i.e. substituted with an 0xcc/int 3). You are limited to 4 simultaneously active 'ba' breakpoints as they use the hardware supplied debug registers, which only support four target addresses. Release BuildsFigure 10 demonstrates running a release build of sample three. The noticeable change (besides different signatures between debug and release builds) is the release code size is nearly 10 times smaller than a debug build.
Binding and RebasingBinding an executable refers to writing the addresses of imported functions into the IAT of an executable. More precisely, when an image is bound the IMAGE_THUNK_DATA structures in the IAT are overwritten with the actual address of the imported function. This is done so the loader does not have to determine the address of an imported function and write the address at load time, thereby speeding up the load process. Since we do not use the IAT in our calculation of the digest, it does not affect our results. We will examine the effects of binding a DLL in the Dynamic Link Library section. Rebasing a DLL is common to avoid load address collisions. It is a procedure the loader will perform on DLLs when a conflict arises. Executable files are not rebased per se. Changing the base address of an executable is uncommon, but not unheard of. For example, the command interpreter (cmd.exe) on Windows XP, SP2 has a preferred base address of 0x4AD00000. JMP and CALL instructions emitted by the compiler use offsets relative to the instruction, rather than absolute addresses in the 32-bit flat segment. If the image needs to be loaded somewhere other than its specified base address, generated instructions don't need to change since they are using relative addressing.
In Figure 11, sample three was compiled with a base address of 0x00500000. As we can see, we have a different signature than expected (F0:A7:3B:5E...40:CF:E9:39 in a customary base address). When we rebase the executable, we find that more changes have occurred than simply changing In Sample 4, all code has been removed except the StringSource(
(const BYTE*)pCodeStart, dwCodeSize,
true, new FileSink( "textdump.bin", true )
);
Finally, we run sample four using base addresses of both 0x400000 and 0x500000. When we examine the binary files using a difference program, we see that there are over 2800 difference. Investigating further, we find that the differences tend to be small, usually consisting of one byte. The first change occurs at byte 8 (0x43 vs 0x53). The next 15 changes are the same at varying offsets. At difference 16, the byte change is 0x44 to 0x54.
The binary dumps (textdump.bin) of the .text sections are available in Sample 4. Next, we use PE Browse to examine the compiled code. We know that the .text section will start at either 0x401000 or 0x501000, depending on the image base of the executable we are examining. The disassembly is shown below in Figure 13.
According to the disassembly, the cause of the disparity is due to three factors. First, the exception handler parameter (0x533F6E) is being placed on the stack, which is a function address (constant) in the .text section. The second cause is the function address of objects being called through their vtable entry (for example, see 0x5013A8). In this case, the function address is being loaded based the object's layout which is located in the .rdata section. The dereference into the .rdata section is not relative - it is absolute. The final reason for the signature difference is Microsoft's Buffer Security Check introduced in Visual Studio 2003. Buffer security check is similar to GCC's StackGuard (which uses canary values) and IBM's ProPolice which offers enhancements to StackGuard (see also Secure Programmer: Countering Buffer Overflows). The security cookie is the return address of a function call XOR'd with a random value which is then placed next to the return address in memory. If an attacker attempts to overwrite the return address with a buffer overflow (stack smash), the check of the security cookie will usually catch exploit. In all cases above, the differences arise because of memory addresses which are absolute in the 32 bit flat address space.. If we were motivated, we could parse the .reloc section of the executable and fix the image offline (if the image was built without the /FIXED linker switch). For a discussion of the IMAGE_BASE_RELOCATION directory, see Pietrek's Peering Inside the PE: A Tour of the Win32 Portable Executable File Format. Dynamic Link LibrariesUp to this point, we have examined executable files while only touching on DLLs when a DLL needed consideration. We will now examine DLLs in detail. To begin, we will add a second project to the solution - a Win32 DLL - to create Sample 5. Select a Win32 Console Application as shown in Figure 14. When prompted by the wizard, select an Application Type of 'DLL'. There is no need to check 'Export Symbols'.
Add a DEF file and EXPORT GatherImageInformation:
Next, move the function // Project 1: VerifyIntegrity.exe
__declspec(dllimport)
VOID GatherImageInformation( HMODULE& hModule,
PVOID& pVirtualAddress, PVOID& pEntryPoint,
PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd );
And // Project 2: VerifyIntegrityDll.dll
__declspec(dllexport)
VOID GatherImageInformation( HMODULE& hModule,
PVOID& pVirtualAddress, PVOID& pEntryPoint,
PVOID& pCodeStart, SIZE_T& dwCodeSize, PVOID& pCodeEnd )
{
...
}
When we run the program, we observe the results of Figure 16. Since the dynamic library calls
Finally, Sample 5 adds code to perform both executable and library in-memory interrogations by the DLL. In essence, we have moved
Examining the disassembly of the DLL confirms the entry point. Note that the entry point is the runtime's entry point, and not
While the majority of the code is equivalent for the .text section of both an executable and DLL, there is one critical point: how do we retrieve the image base of the dynamic library (even if loaded at an address other than preferred)? For this, we will use the pseudo-variable If you don't have access to EXTERN_C IMAGE_DOS_HEADER __ImageBase;
PIMAGE_DOS_HEADER pDOSHeader =
static_cast<PIMAG_DOS_HEADER>( __ImageBase );
Sample 6 moves TCHAR szFilename[ MAX_PATH ] = { 0 };
////////////////////////////////////////////////
if( 0 == GetModuleFileName( (HMODULE)&__ImageBase,
szFilename, MAX_PATH ) ) { return -1; }
/////////////////////////////////////////////////////////////
hFile = CreateFile( szFilename, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
When executed out side of the debugger, we observe the expected results. We can also observe these results under the debugger if software breakpoints are set in the executable, and not the DLL.
RebasingIf we specify a different base address for the DLL during compilation and linking, we see that the new base address affects the hash of the .text section. However, the hash of the on-disk image is consistent with the in-memory image. Figure 20 below, a base address of 0x08000000 was specified.
Next, we use rebase.exe on the DLL to set the preferred base address back to 0x10000000. The results are exactly the same as specifying the base address during compile time (shown in Figure 21).
Next, we will rebase the DLL to 0xA0000000. After the rebase, we examine the executable using Dependency Walker (depends.exe) to verify the base address.
When we run the executable, we observe two issues. First, the DLL loads at address 0x00330000. Second, the hashes are inconsistent. It is apparent the runtime loader performed fixups when the operating system relocated the DLL from its preferred base address.
For the last exercise in rebasing, we will build the DLL as usual (base address 0x10000000), and then rebase the DLL to 0x00330000 as the Operating System did in Figure 23.
As we can see, the results are if the linker used a base address of 0x00330000 after compilation. BindingThe final exercise will run bind.exe over the DLL and examine the results. As discussed earlier, bind hard codes function addresses in the IAT. We expect that binding the image should not effect its .text section. When we view the output of sample six after binding, we find it is the case.
GetAddressOfMain()Determining the location of a function can be useful as a "flag in the sand" or a basic sanity check. An interesting aspect of debugging is that the debugger should not influence the program. To this end, Visual Studio does a very good job. However, there is some influence from the environment. Consider the following: PVOID pfnMain = (PVOID)&_tmain;
In Debug builds, PVOID pfnMain = (PVOID)&_tmain;
PBYTE pPossibleJump = static_cast<PBYTE>(pfnMain);
BYTE opcode = *pPossibleJump;
if( 0xE9 /* Jump */ != opcode )
{
cout << "main() is not a jump opcode... no fixup applied" << endl;;
}
else
{
DWORD dwJump = *( reinterpret_cast<PDWORD>(pPossibleJump+1) );
pfnMain = pPossibleJump + dwJump + sizeof(opcode) + sizeof(dwJump);
cout << "main() is a jump opcode... fixup applied" << endl;
}
A picture being worth 1000 words, the figure below displays the result of the previous code.
Address Space Layout RandomizationASLR is Address Space Location Randomization. It is meant to thwart certain types of attacks, such as stack smashing, which some binaries could fall victim. Since it is a runtime rebasing policy, ASLR does effect integrity checks. Fortunately, we can remove ASLR by not specifying the /dynamicbase linker option. Additional resources include Inside the Windows Vista Kernel: Part 3 by Russinovich and Windows Vista ISV Security. Also of interest is On the Effectiveness of Address-Space Randomization, an analysis of ASLR. The paper was authored by researchers at Stanford University. Miscellaneous and Other ErrataFor the purposes of this article, the named section of interest when using Microsoft tools is Matt Pietrek's articles are generally considered the standard when examining and manipulating executable headers. However, please keep in mind some articles cited by others from Pietrek are over 14 years old. The author finds the syndrome similar to that described by Donald Knuth in The Art of Computer Programming, Volume 2 Seminumerical Algorithms, Section 3.1: Many random number generators in use ... were not very good. People have tended to avoid learning about [the systems]; ... and [the systems] have been passed down blindly from one programmer to another, until the users have no understanding of the original limitations. Surely Microsoft's implementations have changed as the environment has become more hostile in the years since Pietrek's articles were released. For example, consider Data Execution Protection. DEP is implemented in one of two ways within the confines of a Windows XP, SP 2 system on a PC. Another example is Address Space Layout Randomization (ASLR), which is a feature of Windows Vista. It should be readily apparent that DEP and ASLR were not available when Pietrek's articles were originally written. By no means should a reader construe the author's differing opinion as an assertion of incorrectness. It is simply felt that it is now time to re-examine Pietrek's works, especially in the context of malicious software environments. Resources
Acknowledgments
Checksums
Revisions
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||