|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
Contents
IntroductionThis is an introduction to Windows Vista and the x64 architecture. Writing an article like this is always uneasy, because there's plenty to talk about, but on the other hand it's an article, not a book. I tried to focus on some important aspects, but it goes without saying it that I had to cut out a lot (e.g. the User-Mode Driver Framework, and I'm very sorry for that). This is just a general overview on certain topics, if you want to learn more, then you should really consider turning to specific guides. Also, I won't talk about some obvious matters of the x64 architecture, like the fact that applications can now access a larger memory range etc. This article should be considered a quick upgrade for x86/XP developers. At the time I write this article I've been using Windows Vista for a month and its official release is scheduled for January 30th (so, in another month). I moved to x64 with XP some months ago and at the time I did I was surprised that I found all the drivers for my devices. But, as we know, Windows Vista requires drivers to be certified, and in order to get the certification companies have to supply a x64 version of the driver. No certification will be released for x86-only drivers. However, at the moment I write, a lot of applications like virtual drive encrypters don't provide drivers for Vista (since x64 versions haven't got a certificate). If you didn't know about the certification, don't worry, I'll talk about it later and you'll see that it's still possible to run drivers without it. I just wanted to say that hardware compatibility is no longer an issue like it was one year ago, and by switching to Windows Vista x64 you're not taking too much chances. I tried to organize this article in two sections, one about the changes brought us by x64 and then by Vista. I tried as hard as possible to separate these two things, because the x64 technology already existed under Windows XP, so it was important to me that the reader was given a clear distinction between those things that affect only Vista and those ones which affect both topics. x64 Sectionx64 AssemblyIn this paragraph I'll try to explain the basics of x64 assembly. I assume the reader is already familiar with x86 assembly, otherwise he won't be able to make heads or tails of this paragraph. Moreover, since this is just a very (but very) brief guide, you'll have to look into the AMD64 documentation for more advanced stuff. Some stuff I won't even mention, you'll see by yourself that some instructions are no longer in use: for instance, that the lea instruction has completely taken place of the mov offset. What you're going to notice at once is that there are some more registers in the x64 syntax:
Of course, all general-purpose registers are 64 bits wide. The old ones we already knew are easy to recognize in their 64-bit form: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp (and rip if we want to count the instruction pointer). These old registers can still be accessed in their smaller bit ranges, for instance: rax, eax, ax, ah, al. The new registers go from r8 to r15, and can be accessed in their various bit ranges like this: r8 (qword), r8d (dword), r8w (word), r8b (low byte). Here's a figure taken from the AMD docs:
Applications can still use segments registers as base for addressing, but the 64-bit mode only recognizes three of the old ones (and only two can be used for base address calculations). Here's another figure:
And now, the most important things. Calling convention and stack. x64 assembly uses FASTCALLs as calling convention, meaning it uses registers to pass the first 4 parameters (and then the stack). Thus, the stack frame is made of: the stack parameters, the registers parameters, the return address (which I remind you is a qword) and the local variables. The first parameter is the rcx register, the second one rdx, the third r8 and the fourth r9. Saying that the parameters registers are part of the stack frame, makes it also clear that any function that calls another child function has to initialize the stack providing space for these four registers, even if the parameters passed to the child function are less than four. The initialization of the stack pointer is done only in the prologue of a function, it has to be large enough to hold all the arguments passed to child functions and it's always a duty of the caller to clean the stack. Now, the most important thing to understand how the space is provided in the stack frame is that the stack has to be 16-byte aligned. In fact, the return address has to be aligned to 16 bytes. So, the stack space will always be something like 16n + 8, where n depends on the number of parameters. Here's a small figure of a stack frame:
Don't worry if you haven't completely figured out how it works: now we will see a few code samples, which, in my opinion, always make the theory a lot easier to understand. Let us take for instance a hello-world application like: int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
LPSTR szCmdLine, int iCmdShow)
{
MessageBox(NULL, _T("Hello World!"), _T("My First x64 Application"), 0);
return 0;
}
This code disassembled would look like: .text:0000000000401220 sub_401220 proc near ; CODE XREF: start+10E p
.text:0000000000401220
.text:0000000000401220 arg_0= qword ptr 8
.text:0000000000401220 arg_8= qword ptr 10h
.text:0000000000401220 arg_10= qword ptr 18h
.text:0000000000401220 arg_18= dword ptr 20h
.text:0000000000401220
.text:0000000000401220 mov [rsp+arg_18], r9d
.text:0000000000401225 mov [rsp+arg_10], r8
.text:000000000040122A mov [rsp+arg_8], rdx
.text:000000000040122F mov [rsp+arg_0], rcx
.text:0000000000401234 sub rsp, 28h
.text:0000000000401238 xor r9d, r9d ; uType
.text:000000000040123B lea r8, Caption ; "My First x64 Application"
.text:0000000000401242 lea rdx, Text ; "Hello World!"
.text:0000000000401249 xor ecx, ecx ; hWnd
.text:000000000040124B call cs:MessageBoxA
.text:0000000000401251 xor eax, eax
.text:0000000000401253 add rsp, 28h
.text:0000000000401257 retn
.text:0000000000401257 sub_401220 endp
The stack pointer initialization is all about the things I said earlier. Since we are calling a child-function with parameters we need the space for all four parameter registers (0x20, this value is already aligned to 16 byte) and the return address (0x08). Thus, we'll have 0x28. Remember that if the stack-value is too small or is not aligned, your code will crash at once. Also, don't wonder why there's no .text:0000000000401180 sub_401180 proc near ; CODE XREF: sub_4011F0+4 p
.text:0000000000401180 ; sub_4011F0+11 p
.text:0000000000401180
.text:0000000000401180 var_28= qword ptr -28h
.text:0000000000401180 var_20= qword ptr -20h
.text:0000000000401180 var_18= qword ptr -18h
.text:0000000000401180
.text:0000000000401180 sub rsp, 48h
.text:0000000000401184 lea rax, unk_402040
.text:000000000040118B mov [rsp+48h+var_18], rax
.text:0000000000401190 lea rax, unk_402044
.text:0000000000401197 mov [rsp+48h+var_20], rax
.text:000000000040119C lea rax, unk_402048
.text:00000000004011A3 mov [rsp+48h+var_28], rax
.text:00000000004011A8 lea r9, qword_40204C ; __int64
.text:00000000004011AF lea r8, qword_40204C+4 ; __int64
.text:00000000004011B6 lea rdx, unk_402054 ; __int64
.text:00000000004011BD lea rcx, aAa ; "ptr"
.text:00000000004011C4 call TakeSevenParameters
.text:00000000004011C9 xor r9d, r9d ; uType
.text:00000000004011CC lea r8, Caption ; "My First x64 Application"
.text:00000000004011D3 lea rdx, Text ; "Hello World!"
.text:00000000004011DA xor ecx, ecx ; hWnd
.text:00000000004011DC call cs:MessageBoxA
.text:00000000004011E2 add rsp, 48h
.text:00000000004011E6 retn
.text:00000000004011E6 sub_401180 endpAs said, the child function takes 7 parameters, making it necessary to provide space for 3 extra parameters on the stack. So, 7 * 8 = 0x38, which aligned to 16byte is 0x40. Providing, then, space for the return address makes it 0x48, our value indeed. I think you have understood the stack-frames logic by now, it's actually quite easy to understand it, but it needs a second to revert from the old x86/stdcall logic to this one. But now enough of this, now that we've seen how the x64 code works, we'll try compiling an assembly source by ourselves.
Before we start, I have to make something clear. There are some assemblers over the internet which make the job easier, mainly because the initialize the stack by themselves or they create code that is easy to converto from/to x86. But I think that is not the point here in this article. In fact, I'm going to use the microsoft assembler (ml64.exe), which requires you to write everything down, just like in the disassembly. Another option could be compiling the with another assembler and then link it with ml64. I think the reader should really make these decisions on his own. As far as I am concerned, I don't believe that much code should be written in assembly and avoided whenever it could be done. This new x64 technology is a good opportunity to re-think about these matters. In the last years I always wrote 64-bit compatible code in C/C++ (I mean unmanaged, of course) and when I had to recompile a project of 70,000 lines of code for x64, I didn't had to change one single line of code (I'll talk about the C/C++ programming later). Despite of all the macros an assembler offers, I seriously doubt that people who wrote their whole code in assembly will be able to switch so easily to x64 (remember one day even the IA64 syntax could be adopted). I think in most cases the obvious choice will be not converting to the new technology and stick to x86, but this isn't always possible, it depends on the software category. The Microsoft assembler is contained in the SDK and in the DDK (WDK for Vista). Right now, I'm using Vista's WDK, which I freely downloaded from the msdn. The first sample of code I'm going to show you is a simple Hello-World messagebox application. extrn MessageBoxA : proc extrn ExitProcess : proc .data body db 'Hello World!', 0 capt db 'My First x64 Application', 0 .code Main proc sub rsp, 28h xor r9d, r9d ; uType = 0 lea r8, capt ; lpCaption lea rdx, body ; lpText xor rcx, rcx ; hWnd = NULL call MessageBoxA xorecx, ecx ; exit code = 0 call ExitProcess Main endp end As you can see, I didn't bother unwinding the stack, since I call ExitProcess. The syntax is very similar to the old MASM one, although there are a few dissimilarities. The ml64 console output should be something like this:
The command line to compile is: ml64 C:\...\test.asm /link /subsystem:windows
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib /entry:Main
If the libs are not in the same directory as ml64.exe, you'll have to provide the path like I did. The entry has to be provided, otherwise you would have to use WinMainCRTStartup as main entry. The next sample of code I'm going to show you displays a window calling CreateWindowEx. What you're going to learn through this code is structure alignment and how integrating resources in your projects. Like I said earlier, I don't want to encourage you to write your windows in assembly, but I believe that this sort of code is good for learning. Now the code, afterwards the explanation. extrn GetModuleHandleA : proc
extrn MessageBoxA : proc
extrn RegisterClassExA : proc
extrn CreateWindowExA : proc
extrn DefWindowProcA : proc
extrn ShowWindow : proc
extrn GetMessageA : proc
extrn TranslateMessage : proc
extrn DispatchMessageA : proc
extrn PostQuitMessage : proc
extrn DestroyWindow : proc
extrn ExitProcess : proc
WNDCLASSEX struct
cbSize dd ?
style dd ?
lpfnWndProc dq ?
cbClsExtra dd ?
cbWndExtra dd ?
hInstance dq ?
hIcon dq ?
hCursor dq ?
hbrBackground dq ?
lpszMenuName dq ?
lpszClassName dq ?
hIconSm dq ?
WNDCLASSEX ends
POINT struct
x dd ?
y dd ?
POINT ends
MSG struct
hwnd dq ?
message dd ?
padding1 dd ? ; padding
wParam dq ?
lParam dq ?
time dd ?
pt POINT <>
padding2 dd ? ; padding
MSG ends
.const
NULL equ 0
CS_VREDRAW equ 1
CS_HREDRAW equ 2
COLOR_WINDOW equ 5
; WS_OVERLAPPEDWINDOW = (WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU |
; WS_THICKFRAME | WS_MINIMIZEBOX | WS_MAXIMIZEBOX)
WS_OVERLAPPEDWINDOW equ 0CF0000h
CW_USEDEFAULT equ 80000000h
SW_SHOW equ 5
WM_DESTROY equ 2
WM_COMMAND equ 111h
IDC_MENU equ 109
IDM_ABOUT equ 104
IDM_EXIT equ 105
.data
szWindowClass db 'FirstApp', 0
szTitle db 'My First x64 Windows', 0
szHelpTitle db 'Help', 0
szHelpText db 'This will be a big help...', 0
.data?
hInstance qword ?
hWnd qword ?
wndclass WNDCLASSEX <>
wmsg MSG <>
.code
WndProc: ; proc hWnd : qword, uMsg : dword, wParam : qword, lParam : qword
mov [rsp+8], rcx ; hWnd (save parameters as locals)
mov [rsp+10h], edx ; Msg
mov [rsp+18h], r8 ; wParam
mov [rsp+20h], r9 ; lParam
sub rsp, 38h
cmp edx, WM_DESTROY
jnz @next1
xor ecx, ecx ; exit code
call PostQuitMessage
xor rax, rax
ret
@next1:
cmp edx, WM_COMMAND
jnz @default
mov rbx, rsp
add rbx, 38h
mov r10, [rbx+18h] ; wParam
cmp r10w, IDM_ABOUT
jz @about
cmp r10w, IDM_EXIT
jz @exit
jmp @default
@about:
xor r9d, r9d
lea r8, szHelpTitle
lea rdx, szHelpText
xor ecx, ecx
call MessageBoxA
jmp @default
@exit:
mov rbx, rsp
add rbx, 38h
mov rcx, [rbx+8h] ; hWnd
call DestroyWindow
@default:
mov rbx, rsp
add rbx, 38h
mov r9, [rbx+20h] ; lParam
mov r8, [rbx+18h] ; wParam
mov edx, [rbx+10h] ; Msg
mov rcx, [rbx+8] ; hWnd
call DefWindowProcA
add rsp, 38h
ret
MyRegisterClass: ; proc hInst : qword
sub rsp, 28h
mov wndclass.cbSize, sizeof WNDCLASSEX
mov eax, CS_VREDRAW
or eax, CS_HREDRAW
mov wndclass.style, eax
lea rax, WndProc
mov wndclass.lpfnWndProc, rax
mov wndclass.cbClsExtra, 0
mov wndclass.cbWndExtra, 0
mov wndclass.hInstance, rcx
mov wndclass.hIcon, NULL
mov wndclass.hCursor, NULL
mov wndclass.hbrBackground, COLOR_WINDOW
mov wndclass.lpszMenuName, IDC_MENU
lea rax, szWindowClass
mov wndclass.lpszClassName, rax
mov wndclass.hIconSm, NULL
lea rcx, wndclass
call RegisterClassExA
add rsp, 28h
ret
InitInstance: ; proc hInst : qword
sub rsp, 78h
mov rax, CW_USEDEFAULT
xor rbx, rbx
mov [rsp+58h], rbx ; lpParam
mov [rsp+50h], rcx ; hInstance
mov [rsp+48h], rbx ; hMenu = NULL
mov [rsp+40h], rbx ; hWndParent = NULL
mov [rsp+38h], rbx ; Height
mov [rsp+30h], rax ; Width
mov [rsp+28h], rbx ; Y
mov [rsp+20h], rax ; X
mov r9d, WS_OVERLAPPEDWINDOW ; dwStyle
lea r8, szTitle ; lpWindowName
lea rdx, szWindowClass ; lpClassName
xor ecx, ecx ; dwExStyle
call CreateWindowExA
mov hWnd, rax
mov edx, SW_SHOW
mov rcx, hWnd
call ShowWindow
mov rax, hWnd ; set return value
add rsp,78h
ret
Main proc
sub rsp, 28h
xor rcx, rcx
call GetModuleHandleA
mov hInstance, rax
mov rcx, rax
call MyRegisterClass
test rax, rax
jz @close ; if the RegisterClassEx fails, exit
mov rcx, hInstance
call InitInstance
test rax, rax
jz @close ; if the InitInstance fails, exit
@handlemsgs: ; message processing routine
xor r9d, r9d
xor r8d, r8d
xor edx, edx
lea rcx, wmsg
call GetMessageA
test eax, eax
jz @close
lea rcx, wmsg
call TranslateMessage
lea rcx, wmsg
call DispatchMessageA
jmp @handlemsgs
@close:
xor ecx, ecx
call ExitProcess
Main endp
end
As you can see, I tried to stay as low level as I could. The reason why I avoided for other functions other than the main the proc macro is that the ml64 puts a prologue end an epilogue, which I didn't want, by itself. Avoiding the macro made it possible to define my own stack frame without any intermission by the compiler. The first thing to notice scrolling this code is the structure: MSG struct
hwnd dq ?
message dd ?
padding1 dd ? ; padding
wParam dq ?
lParam dq ?
time dd ?
pt POINT <>
padding2 dd ? ; padding
MSG ends
It requires two paddings which the x86 declaration of the same structure didn't. The reason, in a few words, is that qword members should be aligned to qword boundaries (this for the first padding). The additional padding at the end of the structure follows the rule that: every structure should be aligned to its largest member. So, being its largest member a qword, the structure should be aligned to an 8-byte boundary. To compile this sample, the command line is: ml64 c:\myapp\test.asm /link /subsystem:windows
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib
/entry:Main c:\myapp\test.res
test.res is a file I took from a VC++ wizard project, I was too lazy to make on by myself. Anyway, making a resource file is very easy with the VC++, but no one forbids you to use the notepad, it just takes more time. To compile the resource file all you need to do is to use the command line: "rc test.rc". I think the rest of the code is pretty easy to understand. I didn't cover everything with this paragraph, but now you should have quite a good insight into x64 assembly. Let's move on. C/C++ ProgrammingWriting x64 compatible code in C/C++ is very easy. All what it takes is to follow some basic rules. The most common mistake that make that makes 99% of the old 32bit sources uncompatible is wrong casting. For Instance: ptr1 = (DWORD) (sizoef (x) + ptr2); // <-- WRONG! This line of code assumes that pointers are 32bit long, but on x64 pointers are 64bit long and the line of code above basically truncates the pointer making it invalid. So, always cast like this: ptr1 = (ULONG_PTR) (sizoef (x) + ptr2); <-- RIGHT!
It doesn't matter if you use ULONG_PTR, LONG_PTR, DWORD_PTR or whatever. The important thing is that you use one of these defines (or directly by pointer type: (void *)). Keep in mind that all handles and handle derivates are qwords. HANDLE, HKEY, HICON, HBITMAP, HINSTANCE, HMODULE, HWND etc. etc. These are all 64bit long, even though they're not all the same handle (HINSTANCE, for example, is just a pointer, not a real handle). Even WPARAM and LPARAM are now 64bit long. There's no rule to follow, just don't assume these types are 32 or 64bit long: write code that is compatible with both conditions: HWND *hWndArray = (HWND *) malloc(sizeof (DWORD) * n); <-- WRONG! Instead write: HWND *hWndArray = (HWND *) malloc(sizeof (HWND) * n); <-- RIGHT! As you can see this isn't a rule, just good sense. The defines to use for writing architecture-dependent code are:
if you want to write, for example, a piece of code for x86 only, you could write: #ifdef _M_IX86
// x86 only code
#endif
Now that you know all the rules, you just have to compile your project for x64. Keep in mind that every project in VC++ (nowadays) starts with a x86 configuration: it's your job to add a project configuration to the project, but don't worry it's very easy. All you have to do is open the configuration manager (Build -> Configuration Manager) and then under "Active solution platform" click New, just like this:
A dialog box will pop up where you can choose the new platform which for to create a new project configuration. There's nothing more to do, except to build. Inline AssemblyBad news! Microsoft completely removed the support for inline assembly in C/C++, both for user and kernel mode. If you try to compile a code sample like this on x64/Itanium: #include "stdafx.h"
#include <Windows.h>
int _tmain(int argc, _TCHAR* argv[])
{
__asm int 3;
return 0;
}
It will give you more than just one error. Being the __asm keyword no longer supported, the __naked declspec was removed as well (since it doesn't make sense without inline assembly). Now, prepare for the good news. Before you start thinking about using external asm files or stuff like that, you should know that the VC++ offers some very powerful assembly intrinsics. The header to include to use these intrinsics is "intrin.h". Let's take for a code sample the intrinsics _ReturnAddress() and _AddressOfReturnAddress(). The first one gives us the return address of the current function and the second one the address of the return address itself. Let's analyze this little code sample that I took from the MSDN: int _tmain(int argc, _TCHAR* argv[])
{
void* pvAddressOfReturnAddress = _AddressOfReturnAddress();
printf_s("%p\n", pvAddressOfReturnAddress);
printf_s("%p\n", *((void**) pvAddressOfReturnAddress));
printf_s("%p\n", _ReturnAddress());
return 0;
}
The second and the third printf_s will show the same output, since both display the return address of the current function. These intrinsics are very powerful, and nothing can stop us from doing some of the old tricks we did with inline assembly. For instance, having the address of the return address could give me the possibility of changing it and making the function return somewhere else. Let's try that: ULONG_PTR OldAddress = 0;
void f1()
{
printf_s("Hello there!\n");
ULONG_PTR *pAddressOfReturnAddress = (ULONG_PTR *)
_AddressOfReturnAddress();
if (OldAddress == 0)
{
OldAddress = *pAddressOfReturnAddress;
*pAddressOfReturnAddress = (ULONG_PTR) &f1;
}
else
{
*pAddressOfReturnAddress = OldAddress;
}
}
The output of this function is: Hello there!
Hello there!
That's because, as you can see from the code, I changed the return address of the current function making it execute again. I put a condition to make it execute again just once, otherwise it would have brought to an endless loop. An important thing to know is that this sample works in Release mode only if you disable code optimization, otherwise the VC++ will remove the line of code which sets the new return address. I'm sure there are ways to trick the VC++ not to do this, but the problem is that if the function is called just by one caller like this one, the VC++ will put the code of the function directly in the caller one, so setting a new return address under these conditions is a bit risky. Disabling optimization is, I believe, the safest way to act. Enough of this trivia. Here's a list of the intrinsics for x64 taken from the MSDN (many of them are supported on x86 as well):
There are also some 3D intrinsics (called 3DNow) which will be useful for game/3D coders. I left those intrinsics out of the list since they were too many and you'd need to include another header file to use them: "mm3dnow.h". If these intrinsics are not enough, you might need to use an external asm file. On the other hand, if you're really lazy and you just need something on the fly, there's a quick way to embed assembly code in your C/C++ files. #include "stdafx.h"
#include <Windows.h>
unsigned char BitSwapAsm[7] =
{
0x48, 0x8B, 0xC1, // mov rax, rcx
0x48, 0x0F, 0xC8, // bswap rax
0xC3 // retn
};
__int64 (*BitSwap)(__int64 Value) = (__int64 (*)(__int64))
(ULONG_PTR) BitSwapAsm;
int _tmain(int argc, _TCHAR* argv[])
{
//
// I have to change the page protection, otherwise the code would crash
//
DWORD dwOldProtect;
VirtualProtect(BitSwap, sizeof (BitSwapAsm), PAGE_EXECUTE_READWRITE,
&dwOldProtect);
printf_s("%p\n", BitSwap(0xDDCCBBAA));
getchar();
}
This code relies on function pointers and I had to change the page protection flags in order to make it execute. It's really a dumb method, but in some case it could be time saving. Windows On WindowsOf course, compatibility for 32bit applications has to be provided on x64 (and Itanium as well) and this is what WOW64 (Windows on Windows 64) is all about. When we look at the modules loaded by a 32bit application with a 32bit version of the Task Explorer we see this:
Seems pretty regular, except, of course, for the system files path, which in our case is syswow64 instead of the old common System32. It's easy to understand why it is this way: the System32 folder is now reserved for the 64bit environment and the 32bit files had to be placed somewhere else. But look what happens when I open the same process with an x64 version of the Task Explorer:
Suddenly, all the 32bit modules are gone and what remains are the WOW64 emulation modules. Here's the description the MSDN gives us of these modules:
32bit applications have a maximal 2GB space (4GB if explicitly required) and the rest of the space is handled by the system. This doesn't change much of course, since on x86 user mode applications had 2GB of virtual memory space out of 4GB (the other 2GB were reserved for kernel mode). On x64 these two other GB can now be accessed by 32bit applications. In order to achieve this, the
I've seen this done by 3D-games players in order to increase performances. Of course, it's only useful for very heavy memory consuming applications. A very useful function to determine whether a process is running under WOW64 or not is: BOOL IsWow64Process(
HANDLE hProcess, // [in] Handle to a process.
PBOOL Wow64Process // [out] Pointer to a value that is set to TRUE if the
// process is running under WOW64. Otherwise, the value
// is set to FALSE.
);
The work done by Wow64Cpu.dll on x64 is zero, because x64 supports x86 natively. I was first tempted to look how the calling sequence works in order to make one myself and provide a way to use x86 components from x64 in the same address space, but, on second thought, even if it could be implemented, it wouldn't work on Itanium. And this brings us to one of the next paragraphs, because under normal conditions a 32bit application cannot load a 64bit dll and a 64bit application cannot load a 32bit dll. So, interprocess communication becomes an important aspect on 64bit systems. Anyway, before that, I have to talk about file system and registry redirection, since they are strictly related to WOW64, but deserve an extra paragraph for their importance. File System And Registry RedirectionSince the System32 path is reserved to 64bit files, any time a 32bit application tries to access this directory it is redirected to SysWow64 one. However, there are some subdirectories of System32 that are shared between 32bit and 64bit applications and so no redirection is needed. These subdirectories are:
Also, there are some functions related to the WOW64 file system redirection:
I think it's easy to understand how to use these functions. However, I add a little code sample (you can find almost the same one on the MSDN): int _tmain(int argc, _TCHAR* argv[])
{
BOOL bIsWOW64Enabled;
if (IsWow64Process(GetCurrentProcess(), &bIsWOW64Enabled))
{
if (bIsWOW64Enabled == TRUE) // we run under WOW64
{
PVOID pOldValue;
DWORD FileSize;
HANDLE hFile = CreateFile(_T("c:\\windows\\system32\\notepad.exe"),
GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
FileSize = GetFileSize(hFile, NULL);
CloseHandle(hFile);
_tprintf(_T("File Size: %d Bytes\n"), FileSize);
Wow64DisableWow64FsRedirection(&pOldValue); // disable redirection
hFile = CreateFile(_T("c:\\windows\\system32\\notepad.exe"),
GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, 0, NULL);
FileSize = GetFileSize(hFile, NULL);
CloseHandle(hFile);
_tprintf(_T("File Size: %d Bytes\n"), FileSize);
Wow64RevertWow64FsRedirection(pOldValue); // restore redirection
getchar();
}
}
return 0;
}
The output of this program is: File Size: 151040 Bytes
File Size: 169472 Bytes
The file size changes because one time the program opens the 32bit notepad and one time the 64bit one. Of course, remember when you're using these functions, always use them along with GetProcAddress, otherwise your code won't work on older systems which don't provide them. Let's move on to the registry. As for the file system the registry is being redirected as well, or better some keys of it. These keys are:
You can find every one of these keys duplicated for 32bit applications in their WOW node: any of these keys has a subkey called Wow6432Node, which contains a duplicate of the parent key. For instance:
Some of these WOW64 redirected keys have subkeys which are reflected. Reflection in this case means that when I change a reflected key in the 32bit node the change is being reflected on the 64bit key as well and viceversa. This is necessary, because some keys need to remain in synch. This is quite different from just sharing the keys between 64bit and 32bit mode, because the reflection can be filtered and also disabled. These are the reflected keys:
The functions to handle reflection are:
They work just like the WOW64 file system functions, so I don't think a code sample is necessary. There are also some shared keys between 64bit and 32bit applications:
As said, these keys are shared, so any change made to them will affect both 32bit and 64bit applications, and there's no way to avoid this like for reflected keys. But what if a 32bit applications wants to access the 64bit registry or viceversa? Don't worry! As I discovered when I was dealing with the same problem, Microsoft provides a very simple way to do the job. The flags KEY_WOW64_64KEY and KEY_WOW64_32KEY can be used with these functions: RegCreateKeyEx, RegDeleteKeyEx and RegOpenKeyEx.
What I needed to do was to access the subkeys of a 64bit key from a 32bit application, which translated in code is just: RegOpenKeyEx(HKEY_LOCAL_MACHINE, MyKey, 0, KEY_READ | KEY_WOW64_64KEY, &hKey);
Easy, isn't it? All in all, the documentation provided by Microsoft on file system and registry redirection is very good and I just reported what I first found on the MSDN. I don't think these redirections are going to be much of a problem for programmers. Interprocess CommunicationAs mentioned in the Windows On Windows paragraph, interprocess communication becomes an important aspect on x64, since a 64bit application might need to use a 32bit component and viceversa. The MSDN suggests these ways for process to communicate between each other:
Using CreateProcess or ShellExecute means that you could communicate through arguments and output reading. If you need something more sofisticated (and professional), you have no choice but to use RPCs (Remote Procedure Calls) or COM objects. For RPCs you need to learn a bit about the MIDL (Microsoft Interface Definition Language), but eventually every code sample I tried wasn't working on Vista x64, so I gave up on RPCs. I would suggest you to use a COM, writing them in MFC is very easy (comparing to writing them without MFC, I mean). There's a very good series of articles on CodeProject about writing ActiveXs. Actually, the guide is about how writing ActiveXs in plain C (I had to reduce the size of my ActiveX, so I couldn't use MFC), but the theory is the same and these articles are well written and could save you from the effort of reading a book. If you have never written COM objects before, you will eventually discover that it can be annoying. Shared memory is not really an option. If you are looking for a solution between CreateProcess and COM objects, you may use pipes or things like that. Actually, you could implement your own pipes through shared memory and mutexes. This is what I have done in some projects:
The " *32" next to the process name is the way of the Task Manager to tell us which are 32bit processes. As you can see the Server is a 64bit process and the Client a 32bit one. The two processes communicate with each other without problems. However, don't get too excited, there are some problems and I'll explain later what they are about. For now, let's see a code sample (communication.zip download available at the top of the article) Here's the Client code: #include <Windows.h>
#include <tchar.h>
#define BUF_SIZE 256 * sizeof (TCHAR)
TCHAR MyEvent[] = _T("Global\\SharedMemoryEvent");
TCHAR szName[]= _T("Global\\MyFileMappingObject");
int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
LPSTR szCmdLine, int iCmdShow)
{
//
// Create the event to communicate between server and client
//
HANDLE hEvent = CreateEvent(NULL, FALSE, FALSE, MyEvent);
//
// Start server process
//
PROCESS_INFORMATION pi = { 0 };
STARTUPINFO si = { 0 };
if (!CreateProcess(_T("Server.exe"), NULL, NULL, NULL, FALSE, 0, NULL,
NULL, &si, &pi))
return 1;
//
// Wait for the server to complete the job
//
WaitForSingleObject(hEvent, INFINITE);
//
// Access shared memory object
//
HANDLE hMapFile = OpenFileMapping(
FILE_MAP_ALL_ACCESS, // read/write access
FALSE, // do not inherit the name
szName); // name of mapping object
if (hMapFile == NULL) return 1;
LPCTSTR pBuf = (LPTSTR) MapViewOfFile(
hMapFile, // handle to map object
FILE_MAP_ALL_ACCESS, // read/write permission
0,
0,
BUF_SIZE);
if (pBuf == NULL) return 1;
//
// Shows Server Output
//
MessageBox(NULL, pBuf, _T("Server Output"), MB_OK);
UnmapViewOfFile(pBuf);
CloseHandle(hMapFile);
//
// Tell the server that the object isn't used any longer
//
SetEvent(hEvent);
return 0;
}
And here's the Server code: #include <Windows.h> #include <tchar.h> #define BUF_SIZE 256 * sizeof (TCHAR) TCHAR MyEvent[] = _T("Global\\SharedMemoryEvent"); TCHAR szName[] = _T("Global\\MyFileMappingObject"); TCHAR szMsg[] = _T("Message from server process"); int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR szCmdLine, int iCmdShow) { // // Create the memory shared object // HANDLE hMapFile; LPCTSTR pBuf; hMapFile = CreateFileMapping( INVALID_HANDLE_VALUE, // use paging file NULL, // default security PAGE_READWRITE, // read/write access 0, // max. object size BUF_SIZE, // buffer size szName); // name of mapping object if (hMapFile == NULL) return 1; pBuf = (LPTSTR) MapViewOfFile(hMapFile, // handle to map object FILE_MAP_ALL_ACCESS, // read/write permission 0, 0, BUF_SIZE); if (pBuf == NULL) return 1; CopyMemory((PVOID) pBuf, szMsg, (_tcslen(szMsg) + 1) * sizeof (TCHAR)); // // Wait for event before closing file object // HANDLE hEvent = OpenEvent(EVENT_ALL_ACCESS, FALSE, MyEvent); SetEvent(hEvent); WaitForSingleObject(hEvent, INFINITE); UnmapViewOfFile(pBuf); CloseHandle(hMapFile); return 0; } What these applications do is:
I believe it's easier to understand the code itself than this list. The problem I mentioned earlier is that in order to share a memory object (or an event) between processes, I have to create it in the "Global\\*" section. What happens with Vista is that only applications with admin privileges can access this section with CreateFileMapping (no problems with mutexes or events, though), and since usually applications run in Vista with user privileges, you have to explicitly tell Vista to run the Client application with admin privileges, which is not very professional. The solution to this problem could be to share the memory through a temporary file or even the registry (for small data). Portable ExecutableIf your software has anything to do with Portable Executables it won't be too hard to move to x64 (if you haven't done it already). Basically, what in PE64 changes is the size of virtual addresses (VAs), which are now 64bit wide. Keep in mind that not all the fields described as virtual addresses really are such, most of the time they're just relative virtual addresses (RVAs), which are, like in the PE32, 32bit wide. What changes, in short, is the Optional Header (which has some 64bit wide fields like the ImageBase), Import Directory thunks (the two thunk arrays. OFTs and FTs, are now 64bit wide, since thunks were built to contain virtual addresses among the other things), the Load Config Directory and the TLS Directory. Let's take, for instance, the old PE32 Optional Header: typedef struct _IMAGE_OPTIONAL_HEADER {
//
// Standard fields.
//
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
//
// NT additional fields.
//
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
And the PE64 one: typedef struct _IMAGE_OPTIONAL_HEADER64 {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
ULONGLONG ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
ULONGLONG SizeOfStackReserve;
ULONGLONG SizeOfStackCommit;
ULONGLONG SizeOfHeapReserve;
ULONGLONG SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
Of course, Distinguishing between PE32 and PE64 should be done by checking the Magic field in the Optional Header. This field can be one of these values: #define IMAGE_NT_OPTIONAL_HDR32_MAGIC 0x10b
#define IMAGE_NT_OPTIONAL_HDR64_MAGIC 0x20b
#define IMAGE_ROM_OPTIONAL_HDR_MAGIC 0x107
It is your choice to either double write every time the code to handle both PE32/64 or write a class to handle them automatically. Exception HandlingRemember the old days when you set the SEH in your code? Well, with x64/Itanium they're gone. Exception Handlers are now stored as structured in the PE64 Exception Directory. The basic structure is this: typedef struct _RUNTIME_FUNCTION {
DWORD BeginAddress;
DWORD EndAddress;
DWORD UnwindData;
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;
All three fields are RVAs (otherwise there wouldn't be dwords).
The typedef union _UNWIND_CODE {
struct {
UBYTE CodeOffset;
UBYTE UnwindOp : 4;
UBYTE OpInfo : 4;
};
USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;
typedef struct _UNWIND_INFO {
UBYTE Version : 3;
UBYTE Flags : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes;
UBYTE FrameRegister : 4;
UBYTE FrameOffset : 4;
UNWIND_CODE UnwindCode[1];
/* UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
* union {
* OPTIONAL ULONG ExceptionHandler;
* OPTIONAL ULONG FunctionEntry;
* };
* OPTIONAL ULONG ExceptionData[]; */
} UNWIND_INFO, *PUNWIND_INFO;
Here's the description of the
The possible values of the Flags field are: #define UNW_FLAG_EHANDLER 0x01
#define UNW_FLAG_UHANDLER 0x02
#define UNW_FLAG_CHAININFO 0x04
Let's take for instance this code: #include <Windows.h>
#include <tchar.h>
#include <intrin.h>
int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
LPTSTR lpCmdLine, int nCmdShow)
{
__try
{
__debugbreak();
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
MessageBox(0, _T("Hello!"), _T("SEH"), 0);
}
return 0;
}
The dissassembly would be: .text:0000000000401000 wWinMain proc near ; CODE XREF: __tmainCRTStartup+18C p
.text:0000000000401000 sub rsp, 28h ; BeginAddress
.text:0000000000401004 int 3 ;< | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||