(untagged)

The 64 bit OS Architecture

Sachin R Sangoi

0.00/5 (No votes)

17 Apr 2007

Are you considering a move to 64-bit Windows? Read on...

Introduction

Truly, "necessity is the mother of invention." The need for more memory space, address space and the need for speed gave rise to 64-bit computing. But it is necessity, or lack of programming discipline that gave rise to memory requirements of more than 2GB and 3GB with the /LARGEADDRESSWARE switch enabled.

Remember the good old days when few KB and MB's of memory was sufficient, and GB's of memory appeared practically unlimited? With a 64-bit processor and Tera bytes of memory space (infinite) one wonders whether such an amount of memory will be required. But as Einstein says, "only two things are infinite: the universe and human stupidity. I'm not sure about the former." So with a poorly designed enterprise application, I see no reason why we can't touch the Tera bytes limit and give rise to 128-bit computing. I feel 64-bit computing includes areas such as game programming, CAD, and image processing.

I guess I am on wrong track. You also might have started wondering by now whether the article throws some light on 64-bit porting, or is anti-64-bit. So let's say something really nice about 64-bit application porting.

32-bit Operating Systems, such as Windows NT and Windows 2000, support up to 4 gigabytes (GB) of memory. Over the years, 32-bit Windows applications have not only grown in size, but the need for applications to have the ability to manipulate large amounts of data have dramatically increased. Today, millions of users around the world need to access terabytes of data in real time and the demand for an advanced, scalable architecture that can support a large amount of memory is understandable. The obvious question is: should you be considering a move to 64-bit Windows? In this column, we will explore the answer to this question. We will discuss the advantages of 64-bit Windows over 32-bit, talk about a few concepts and give you some tips on how you can prepare for the next wave of Windows. We will also examine some limitations of 64-bit Windows, including the reasons that 64-bit processes cannot load 32-bit DLLs. We will address the fate of 32-bit applications in a 64-bit environment. Finally, we will talk about some general guidelines for porting 32-bit applications to 64-bit.

64-bit is the natural progression of computing technology and Microsoft, as usual, is bringing to market a seamless migration strategy and powerful 64-bit platform. Just as computing migrated from 16-bit to 32-bit once the more powerful platform became available, it is almost inevitable that virtually all computers, over the next few years, will be 64-bit. However, the migration to 64-bit has some key differences:

the porting will be simpler and will require far fewer resources
the migration is driven by the desire to take advantage of the technology.

Prerequisite

Before getting into discussion about 64-bit OS and porting issues we need to focus on technology for 64-bit processors.

First, here's a look at the Intel 64-bit Itanium2 processor of the Itanium Processor Family (IPF):

The processor's EPIC architecture provides dramatic performance gains over older 32-bit chipsets.
EPIC stands for Explicit Parallel Instruction Code; it is a new instruction set designed for a high level of parallelism and allows for up to nine instructions to be executed in parallel.

The alternative to the IPF is x64 processor. There are actually 2 groups of x64 processors: the AMD64 processors, Opteron and Athlon 64 processors from Advanced Micro Devices, and the Intel Xeon EM64T(Emulated). Because both the AMD and Intel processors are running the same binary (as of x86) they are referred to as x64. One advantage of x64 is that it can run both 64-bit and 32-bit applications natively. This means there is no performance degradation when running a 32 bit app on a 64 bit system. This is not case with IPF.

This next session digs into processors' internals. If you are interested, read on, if not, skip the section.

In computers, the front side bus (FSB) is a term for the physical bi-directional data bus that carries all electronic signal information between the CPU and other devices within the system such as RAM, the system BIOS, AGP video cards, PCI expansion slots, hard disks etc.

To fully realize the performance gains provided by multiple processor cores, chip companies need to find a way to deliver enough data to the processor from the main memory to keep the cores as productive as possible. Intel's current front-side system bus design should be able to keep as many as four cores satisfied, depending on their frequency, but with technology growing faster, the FSB will be a bottle neck for quad core processor release on 2007. The company can adopt on-chip memory controllers to connect central processing units to memory. AMD64 processors eliminate the front-side bus architecture that dominates in x86-based systems today.

By integrating the memory controller, and using industry-standard HyperTransport technology for chip-to-chip communication, AMD64 processors reduce the bottlenecks and latencies commonly found in other x86-based systems. The drawback of an integrated memory controller is, the integrated memory controller can only work with the memory standard for which it was designed. With memory standards changing every 18 months or so, this means companies have to tweak their chips to enable transitions such as the switch from DDR (double data read) memory to some other type.

Why dig into so much detail on processors? If many of you ask, it goes to prove that, in fact, in many cases 32-bit apps run faster on the 64-bit AMD64 architecture than they would on an equivalent 32-bit x86 processor. This is due to the fact that AMD64 processors eliminate the front-side bus architecture that dominates in x86-based systems today.

About 64-bit operating system

P>Microsoft Corp. is currently offering two operating systems compatible with the AMD64 architecture that provide WOW64 capability. Those operating systems are Windows XP for 64-bit Extended Systems and Windows Server 2003 for 64-bit Extended Systems.

Wow64 (Windows On Windows)

Wow64 (Win32 emulation on 64-bit Windows) refers to the software that permit the execution of 32-bit x86 applications on 64-bit Windows. It is implemented as a set of user-mode Dlls. Technically, WOW64 is implemented using three DLLs: Wow64.dll, which is the core interface to the NT kernel that translates between 32-bit and 64-bit calls, including pointer and stack manipulations; Wow64win.dll, which provides the appropriate entry points for 32-bit apps; and Wow64cpu, which takes care of switching the processor from 32-bit to 64-bit mode.

Despite its outwardly similar appearance on all versions of 64-bit Windows, WOW64's implementation varies depending on the target processor architecture. For example, the version of 64-bit Windows developed for the Intel Itanium 2 processor uses Wow64win.dll to set up the emulation of x86 instructions within the Itanium 2's unique instruction set. That's a more computationally expensive task than the Wow64win.dll's functions on the AMD64 architecture, which switches the processor hardware from its 64-bit mode to 32-bit mode when it's time to execute a 32-bit thread, and then handles the switch back to 64-bit mode. No emulation is required!

The WOW64 subsystem also handles other key aspects of running 32-bit applications. For example, it's involved in managing the interaction of 32-bit apps with the Windows registry, which is somewhat different in 64-bit versions of the OS, and in providing an interface to the storage subsystem. WOW64 also ensures that 32-bit apps, utilities, dynamic link libraries (DLLs) and other files are stored in the appropriate directories.

Two of the more interesting functions of the WOW64 infrastructure are its Registry Redirector and File System Redirector.

Registry Redirector: In the case of the registry redirector, 64-bit Windows actually maintains two separate HKEY_LOCAL_MACHINE\Software trees. One is used by native 64-bit apps, the other is for 32-bit apps. This allows a 32-bit application to see a system image and resources that make it believe that it's running on a standard 32-bit version of Windows; obviously a 32-bit app wouldn't understand the hardware changes in a 64-bit system.

File System Redirector: The file system redirector is used to ensure that 64-bit Windows doesn't suffer from file-location overloading. That's because 64-bit Windows still uses the C:\windows\system32 directory for native applications. Oddly enough, this misnamed directory is for 64-bit apps only! So, 32-bit apps trying to read or write to C:\windows\system32 are redirected to C:\windows\SysWOW64 instead. Similarly, only 64-bit apps can use C:\Program Files; 32-bit apps are invisibly redirected to c:\Program Files(x86). You'll need to pay attention to this, however, if you're executing command-shell scripts that need to call a 32-bit application, because they'll be looking in the wrong directory. By default, a command shell launched on 64-bit Windows is a 64-bit command shell. You can still launch 32-bit apps from such a shell by looking in the correct directory. Or, you can simply start a 32-bit command shell, from C:\windows\SysWOW64\cmd.exe, and let it handle the directory translations for you.

The WoW64 executes and operates differently on the X64 and IPF chipsets. Because the X64 is designed to run 32-bit code natively, there is no performance loss for 32-bit applications. The Itanium2 requires an execution layer because the X86 binary has to be converted on the fly into EPIC. In many cases, 32-bit applications will run slower on the Itanium2 when compared to running on a 32-bit processor, because of the execution layer. An application's particular requirements and specifications will dictate which processor is best suited to its needs.

Why 64-bit apps cannot load 32-bit Dll's & 32-bit apps load 64-bit Dll's

I mentioned earlier that 32-bit processes can't load 64-bit DLLs and 64-bit processes can't load 32-bit DLLs. You might be wondering why. By default, 64-bit applications can use 8 TB of user mode address space. You have the option to specify that all memory below 2 GB be allocated to the application. Because 32-bit DLL can't address memory space above 2GB, the thunk layer would have to copy all data into the low 2GB of the 64-bit application. Obviously, this won't work if the 64-bit application tries to pass a pointer to data that is larger than 2GB.

Also 32-bit DLLs use x86 style exception handling and 4K pages. On an IA-64 processor, the native page size is 8K and the WOW64 emulator is responsible for simulating 4K pages. Because on an x86 machine exceptions do not "unwind" from user mode to kernel mode and back, WOW64 implements x86-style exception without switching from x86 code to IA-64 and back.

Finally, another reason why 64-bit and 32-bit processes can't load each other's DLLs is that system DLLs (kernel32.dll, user32.dll, and gdi32.dll) expect only one instance per process, 32-bit or 64-bit. If a process contained more than one instance of, say user32.dll, Win32k.sys will not be able to distinguish between them and wouldn't know which one to call.

So a 64-bit application must have all the Dll that are 64-bit.

64-bit Porting Issues

The section explains the most effective strategy for migrating a 32-bit solution for C++. In the majority of cases the port to 64-bit is straightforward. For unmanaged code (C/C++), the move to 64-bit focuses on 64-bit pointers and how to handle them in the code base. The memory model in 32-bit is ILP32; in 64-bit it is IL32P64. The most significant difference between this memory model and the Unix 64-bit memory model (I32LP64) is that a Long is still 32 bits whereas in Unix it is 64-bits.

The majority of coding issues for migrating C/C++ applications to 64-bit can be categorized into three groups: Pointer casting, Pointer arithmetic, and alignment. Additional challenges may present themselves when dealing with in-line assembler, use of one of the five modified API calls, and when attempting to communicate across a 32/64-bit boundary. Because the data types Int and Long remain the same size (32 bits) the amount of code that will need to be modified should be very small. Typically, the number of lines of code touched should be less than one percent of the total code base. This is different from Unix, in which Long moves to 64-bits. Developers must be very careful with the alignment of variables. The penalty for misalignment can be very severe in terms of performance. On x64 there is a performance hit but on Itanium systems the problem is more grave; the exception propagates to the application level and will cause the application to crash. Developers can use the �Wp64 compiler switch to ask the compiler to display possible portability issues. This will bring the vast majority of porting issues to their attention. This flag is also available in 32-bit mode.

I was working on porting an enterprise application written in C++ from 32-bit to 64-bit. I was very happy with the challenging assignment I got and started working on it enthusiastically. Microsoft makes life so simple. I just opened all the projectsdsp's (Dll's) in VS.NET 2005. It asked if I wanted to convert the project to VS.NET 2005 format and I clicked "Yes to All." Then I changed some compiler settings, rebuilt the solution, and the 64-bit Dll's were ready. I really felt blessed working with Microsoft technologies - the cool GUI, the next - next -next finish approach and /Wp64 compiler switch to give you all probable warnings and the best part is it gives you the solution also for 64-bit porting.

The porting assignment was completed in few days, and I was feeling proud with one of the biggest achievements of my life. But God has some different plan's for me. I think this time God has decided to annihilate me and chosen this assignment as one of the weapons to use against me. The project was put for testing and it started crashing now and then. I started having sleepless nights with crashes, crash dump and GPF following me in my dreams (if I actually got some sleep). Suddenly I started cursing the creator of the 64-bit processor. By now you might have got the reason why I was anti 64-bit at the start of this article. But believe me friends, if God gets you there he will get you through it. There are a few tips that might prevent you from having nightmares.

Porting Guidelines

64-bit clean: Before you start using any other tools or apply any 64-bit porting guidelines, get your code 64-bit clean using the /Wp64 switch in the VS.NET 2005 IDE. It points out many porting issues.

Pointer Casting: When moving from 32 to 64-bit, the main type that grows is the pointer and derived data types, like handles. In Windows 64-bit, the pointers and derived types are now 64-bit long. Some other types also increase in size: WPARAM, LPARAM, LRESULT, SIZE_T all are derived from pointer. One reason for this is that they are used as parameters and some functions expect pointers as parameters. All of the types derived from "int" and "long" continue to be 32 bits in size. Some of these include DWORD, UINT, and ULONG. Types that were less than 32-bits remain at their current sizes. An example of this is the "short" data type, which remains as a 16-bit signed integer.

Look for code where you have typecasted any of the pointer derived data type to long or DWORD which was perfectly acceptable in 32 bit but an unforgivable sin in 64-bit. Use polymorphic data types such as INT_PTR, DWORD_PTR for typecasting as they are represented as:

#ifdef __64
    typedef __int64     INT_PTR
#endif
#ifdef    __32
    typedef int        INT_PTR
#endif

Pointer Arithmetic: Look for code where you have done pointer subtraction or other arithmetic. If you have subtracted two pointers and stored the value in long acceptable in 32-bit but will cause pointer truncation on 64-bit and point to wrong address. Use ptrdiif_t to save results of pointer arithmetic instead of long or DWORD.

char lcTestArray[16], *char_ptr;
char_ptr = (char*)((int)lcTestArray + 1);
*char_ptr = 'a';

Use polyorphic types
char lcTestArray[16], *char_ptr;
char_ptr = (char*)((size_t)lcTestArray + 1);

*char_ptr = 'a';

The example illustrates the improper use of the int type in pointer arithmetic. The lcTestArray pointer is cast to an int in order to calculate offsets into the Test array and the result is cast back to a pointer. Again, due to the data type size differences, loss of data will occur, and memory faults will be inevitable.

Polymorphic Data Types: Be careful of Polymorphic data types such as INT_PTR, DWORD_PTR, size_t, time_t and many more. The CTime function returns time_t and looks for places you have collected the values in long int. In my code and also in my DB I saved a time stamp in long format. This will not pose any problem until the year 2038, but it will hurt a lot of programmers as its against our ethical code to have something in code you know might not work after some time. So better change it if you are storing the value in long int.

API Updated: The Win32 API remains the same. The only changes correspond to five replacement functions; four of which are replaced by a polymorphic version and one which is used for flat scroll bars:

GetClassLongPtr()
GetWindowLongPtr()
SetClassLongPtr()
SetWindowLongPtr()

The names of these functions have been changed. Also, these functions have been adjusted to use polymorphic data types (such as UINT_PTR), and use any updated constants.

Structure Alignment: Another common source of porting problems is data structure alignment. Data types tend to be aligned on boundaries related to the size of the data type itself. For instance, chars will align on one-byte boundaries, whereas integers will align on four-byte boundaries. In the structure below this issue. The a field (which is a character) correctly aligns at the head of the structure. The b and c integer fields, however, align at the next available 4-byte boundary in the structure. This forces the use of 3-bytes of padding between a and b in order to conform to the 4-byte alignment necessary for the integers.

struct ExampleStruct
{
 char a;     // 1 byte element

        // 3 bytes of padding

 int b, c;    // 2 * 4 byte elements

        // 4 bytes of padding

 void* d;    // 8 byte element

}

Similarly, d (which is a pointer), will align at the next available eight-byte boundary. Because the total aggregate size of a, b and c, as well as the padding between a and b, totals up to 12 bytes, 4 additional padding bytes are necessary after c so that d can be properly aligned on an 8-byte boundary. These changes in structure allocation can cause problems if incorrect assumptions are made about the way the structure is implemented in memory at run-time. For instance, assuming that the offset of d within the structure is 12 bytes from the head of the structure could cause problems if direct offset-based assignments or accesses are attempted. There are mechanisms in place to allow the use of offset-based access in a safe and platform-neutral manner. There are differences between how a structure may be padded on the 32-bit platform and the 64-bit platform. Developers should understand the architecture padding rules and ideally align all structure members on natural boundaries. The padding structure is significantly different on each platform, so any transfer of objects across the 32/64-bit boundary may cause problems. Structure alignment issues may give rise to invalid offset arithmetic.

The results for violating alignment vary from platform to platform. The following cases apply:

x86 - An exception is raised, but the operating system fixes the misalignment on the fly.
IPF - It behaves similarly to the x86, except the operating system does not fix it.
x64 - The hardware does not raise the exception; the fix is done at the hardware level.

There are several ways to avoid misalignment. One of them is to use the __unaligned keyword. It allows access to misaligned data; however, even if the data is aligned properly, the application will pay a performance penalty. This approach is not recommended except for cases where no other option is practical. The __unaligned keyword causes the compiler to insert code to correct misalignment problems on the fly. This increases the overall size of the executable and is the source of the performance penalty. Also it is possible to indicate that the data should be aligned on a specific boundary using __declspec(align()). Furthermore, the _aligned_malloc() call allows the developer to allocate memory in a pre-aligned fashion. This is the recommended best practice for ensuring that all data is aligned on natural boundaries. Because the majority of application providers are moving to the 64-bit platform for performance-oriented reasons, it is essential that the developer pays attention to alignment to prevent degradation of the application.

Messaging Architecture: As in many enterprise client server applications, the communication takes place by means on request and response. The client sends a request and server sends the response. The request and response are nothing but mere messages or pre defined structure. Be careful that you don't use any polymorphic data types like size_t, time_t, etc. in this messages because if the client is running on 32-bit machine the size of these data types will be 32 bits and on the server. A 64-bit application running on a 64-bit OS will be 64-bit, and a request will be misinterpreted. For messaging architecture, stick to the basic data types.

Data File Sharing: File handling becomes one very important aspect particularly if it is handled by both 64 bit and 32 bit app. For example, a 64-bit server writes to a file and distributes to a 32-bit client which reads them. A server application uses fwrite to write to a file (which uses sizeof(size_t) to specify the size). On a 64-bit platform, the size will be 8 bytes, and a 32-bit application attempting to read from the file can create havoc, as now the size will be 4 bytes. I will throw some light on this aspect with an example later, as I don't have VC 6.0 and MSDN installed.

Deprecated Functions: The best practice is for developers to aim for a single code base to compile for both 32 and 64-bits. This allows developers to protect their investment expertise in 32-bit code. They should not write code that depends on or assumes the sizes of data types for its calculations and operation. This code will very likely not be portable and could create difficulties in porting. Things seem to be settling for me and all of sudden from nowhere the VS.NET IDE starts shouting about the deprecated functions for strcpy, strncpy, etc and suggests I use strncpy_s. But I need the same code for 32-bit apps in VC 6.0 and 64-bit apps in VS.NET 2005, and our old friend #ifdef came to the rescue again. Write your custom function port_strncpy.

char* port_strncpy(Function paramters here)
{
    #ifdef __64
        strncpy_s(function parameters here)
 #endif
 
    #ifdef __32
    strcpy(function parameters here)
    #endif
}

Replace all your strncpy occur with port_strncpy. Using find and replace in project it won't take more than a couple of minutes. Replace all deprecated function by writing your custom function.

Format Specifiers: Use the proper format specifiers in printf or wsprintf. Use %p to print pointers in hexadecimal. This is the best choice for printing pointers. Refer to MSDN for other format specifiers.

A good indication that developers have written clean code is when they can compile it cleanly with level W4 warnings turned on. This doesn't specifically target 64-bit issues, but many portability issues are identified this way.

The New Data Types and Helper Functions

64-bit Windows introduces new data types that your applications should be aware of: fixed-precision data types, pointer-precision types, and specific-precision pointers. These types were added to the development environment to allow developers to prepare for 64-bit Windows� well before its introduction.

The developer tool kits from Microsoft also include new helper functions that can be useful in managing a code base across 32-bit and 64-bit systems. Some examples of these helper functions are:

unsigned long HandleToUlong( const void *h )
long HandleToLong( const void *h )
void *LongToHandle( const long h )
unsigned long PtrToUlong( const void *p )
unsigned int PtrToUint( const void *p)

The New 64-bit Compiler

The Platform SDK includes a pre-release version of a 64-bit compiler that can be used to identify pointer truncation, improper type casts, and other 64-bit-specific problems. You can run it on a project or set of code. This is a great place to start. The first time you run the compiler, it will generate many pointer truncation or type mismatch warnings. You can also use VC6.0 IDE to test build launch the msdev from the 64-bit compiler command prompt using /useenv switch. Note VC6.0 does not support the x64 processors family, particularly Xeon(EM64T). I have tried to build on 64bit 2003 OS with Dual Xeon processor(Hyper threaded); VC6.0 supports Itanium only I guess, correct me if I am wrong or some settings need to be changed.

The New Rules for Using Pointers

Porting your code to compile for both 32- and 64-bit Windows� is reasonably straightforward - but you do need to be careful and consistent. You need to follow a few simple rules about casting pointers, and use the new data types in your code.

Some of these rules for pointer manipulation are as follows.

Do not cast pointers to int, long, ULONG, or DWORD.
Use UINT_PTR and INT_PTR where appropriate (and if you are uncertain whether they are required, there is no harm in using them just in case).
Use the PtrToLong or PtrToUlong function to truncate pointers.
If you must truncate a pointer to a 32-bit value, use the PtrToLong or PtrToUlong function.
When setting the cbWndExtra member of the WNDCLASS structure, be sure to reserve enough space for pointers.

Finally

Thanks for your patience for reaching this far in the document. This is not complete guide to 64-bit OS since things are relatively new, some of you might have dug more information and are always welcome to contribute it. Suggestions for improvement are always welcome. So is a rating, so don't forget to rate whether it may be 1or "?". I hope the article is useful for you in addressing any porting issues if you have. Also revert back if you have got any new issues in porting or guidelines for porting. Please let me know if some information in this article is misleading and needs some correction.

References

Microsoft Windows 64-Bit Technology White Paper

AMD processor

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here