This article explains the need of binding an EXE and how it is achieved.
It is assumed that you are having a fair knowledge of Rebasing DLLs. If you have any problems regarding rebasing, refer my article Need for Rebasing DLLs. Also, a little knowledge of copy on write mechanism used by Windows OS is required. No worries, if you don't know, just relax, I will explain it to you right now, but in a very short way. Also, there is something known as memory mapped files. Memory-mapped files allow you to reserve a region of address space and commit physical storage to the region. The difference is that the physical storage comes from a file that is already on the disk instead of the system's paging file. Once the file has been mapped, you can access it as if the whole file were loaded in memory. The system uses memory-mapped files to load and execute EXE and DLL files. This greatly conserves both paging file space and the time required for an application to begin executing. When you create a new process for an application that is already running, the system simply opens another memory-mapped view of the file-mapping object that identifies the executable file's image, and creates a new process object and a new thread object (for the primary thread). The system also assigns new process and thread IDs to these objects. By using memory-mapped files, multiple running instances of the same application can share the same code and data in RAM.
Now, assume two instance of an application are running. The system simply maps the pages of virtual memory containing the file's code and data into both applications' address spaces. If one instance of the application alters some global variables residing in a data page, the memory contents for all instances of the application change. This type of change could cause disastrous effects and must not be allowed. The system prohibits this by using the copy-on-write feature of the memory management system. Any time an application attempts to write to its memory-mapped file, the system catches the attempt, allocates a new block of memory for the page containing the memory the application is trying to write to, copies the contents of the page, and allows the application to write to this newly allocated memory block. As a result, no other instances of the same application are affected. Now, consider that the first instance tries to alter some variable, then the system allocates a new page of virtual memory and copies the contents of the data page into it. The first instance's address space is changed so that the new data page is mapped into the address space at the same location as the original address page. Now the system can let the process alter the global variable without fear of altering the data for another instance of the same application. This is all about copy-on-write mechanism.
Using the code
Before starting, let's clear a few terms.
What Exporting means
When Microsoft's C/C++ compiler sees
__declspec(dllexport) modifier before a variable, function prototype, or C++ class, it embeds some additional information in the resulting .obj file. The linker parses this information when all of the .obj files for the DLL are linked. When the DLL is linked, the linker detects this embedded information about the exported variable, function, or class, and automatically produces a .lib file. This .lib file contains the list of symbols exported by the DLL. This .lib file is, of course, required to link any executable module that references this DLL's exported symbols. In addition to creating the .lib file, the linker embeds a table of exported symbols in the resulting DLL file. This export section contains the list (in alphabetical order) of exported variables, functions, and class symbols. The linker also places the relative virtual address (RVA) indicating where each symbol can be found in the DLL module. Below is the export section for DLL1.
Use: dumpbin /exports Dll1.dll > exportDll1.txt.
ordinal hint RVA name
1 0 0000100F ??0CDll1@@QAE@XZ
2 1 00001014 ??4CDll1@@QAEAAV0@ABV0@@Z
3 2 0000100A ?fnDll1@@YAHXZ
4 3 0002D3F0 ?nDll1@@3HA
Now, have a look at the export section of DLL2 below:
ordinal hint RVA name
1 0 0000100F ??0CDll2@@QAE@XZ
2 1 0000100A ??4CDll2@@QAEAAV0@ABV0@@Z
3 2 00001014 ?fnDll2@@YAHXZ
4 3 0002D3F0 ?nDll2@@3HA
What happens while running the executable module
When an executable file is invoked, the operating system loader creates the virtual address space for the process. Then the loader maps the executable module into the process' address space. The loader examines the executable's import section and attempts to locate and map any required DLLs into the process' address space. After all of the DLL modules have been located and mapped into the process’ address space, the loader fixes up all references to imported symbols. To do this, it again looks in each EXE's import section. For each symbol listed, the loader examines the designated DLL’s export section to see if the symbol exists. If the symbol does not exist (which is very rare), the loader displays an error.
Below is the part of import section of Testexe.Exe using dumpbin /imports importTestExe.txt.
Section contains the following imports:
4251C0 Import Address Table
425050 Import Name Table
0 time date stamp
0 Index of first forwarder reference
4251F0 Import Address Table
425080 Import Name Table
0 time date stamp
0 Index of first forwarder reference
As explained above, the loader will try to locate DLL1 and DLL2 in the process address space. Then it checks the import section for any imported symbol, in this case,
fnDll2 in import of EXE. So, it will search for it in the export of DLL1 and DLL2 shown above. If it cannot find, it will give an error. If the symbol does exist, the loader retrieves the RVA (Relative Virtual Address) of the symbol from exports of DLL and adds the virtual address of where the DLL module is loaded. It then saves this virtual address in the executable module’s import section. Now, when the code references an imported symbol, it looks in the calling module’s import section and gets the address of the imported symbol, and it can thus successfully access the imported variable, function, or C++ class member function. So the dynamic link is complete, the process primary thread begins executing, and the application is finally running!
Naturally, it takes the loader quite a bit of time to load all these DLL modules and fix up every module’s import section with the proper addresses of all the imported symbols. Since all this work is done when the process initializes, there is no run-time performance hit for the application. For many applications, however, a slow initialization is unacceptable. To help improve your application’s load time, you should rebase your DLL modules. For rebasing info, go through the article: Need for Rebasing DLLs.
So now you have rebased all the modules, so what's need for binding???
As seen normally at startup time, the loader spins through all the imported functions and looks up their addresses. The loader writes the symbol's virtual address into the executable module's import section. This allows references to the imported symbols to actually get to the correct memory location. If the loader is writing the virtual address of the imported symbol into the .exe module's import section, the pages that back the import section are written to. Since these pages are copy-on-write, the pages are backed by the paging file. So we have a problem since portions of the image file are swapped to and from the system's paging file instead of being discarded and reread from the file's disk image, when necessary. Also, the loader has to resolve the addresses of all the imported symbols (for all modules), which can be time-consuming. However, if the imported DLLs don’t change from run to run, the addresses that the loader gets back don’t change either. An easy optimization is to write the target function’s address to the importing executable, and so the loader doesn't go through all imported functions and carry out the process explained above every time the EXE is clicked. If the DLL does not change, this is exactly what the BIND program does, i.e., binds it. So this eliminates the need for calculating virtual address every time, and so saves copy on write mechanism too if DLL is not changed. It detects if DLL is changed by comparing the timestamps.
Normally, Win32 executables have two identical copies of the information needed to look up an imported function. One is called the import address table (IAT), while the other is called the import name table. You can check it from the import section of TestExe shown above. However, only one copy (the IAT) is required by the Win32 loader. The BIND program takes advantage of the fact that there are two copies of this information and overwrites the IAT entries with the imported function’s actual addresses. At load time, the loader checks to see if everything is well, and if so, uses the address that BIND has stored in your IAT. This eliminates the need to look up the function by its name. What if something is different than when the executable was linked? For instance, perhaps the imported DLL got loaded elsewhere. In this case, the loader uses the import name table information to do a normal lookup. So to bind the EXE on command line, use the following command:
bind -u TestExe.exe Dll1.dll Dll2.dll
and you will get the output as:
BIND: binding Dll1.dll
BIND: binding Dll2.Dll
Now, give the command: dumpbin /imports TestExe.exe > BindTest.txt. In BindTest.txt, you will get the following lines which were not there in the early imports of the EXE.
Header contains the following bound import information:
Bound to Dll1.dll [41F4A0C3] Sun Jan 23 23:16:19 2005
Bound to Dll2.dll [41F4A118] Sun Jan 23 23:17:44 2005
Bound to KERNEL32.dll [3AF32050] Fri May 04 14:34:08 2001
Contained forwarders bound to NTDLL.DLL [3AF32050] Fri May 04 14:34:08 2001
The number in square brackets indicates the build of each DLL module. This 32-bit timestamp value is expanded and shown as a human-readable string after the square brackets. The loader verifies that the location of the symbol referenced in the DLL's export section has not changed since binding was performed. The loader verifies this by checking each DLL's timestamp. If the DLL's timestamp has changed, then as mentioned earlier, it gets the information from Import Name Table and does the whole process as while running executables.
OK, so now you know that you should bind all the modules that you ship with your application. But when should you perform the bind? If you bind your modules at your company, you would bind them to the system DLLs that you've installed, which are unlikely to be what the user has installed. Since you don't know if your user is running Windows 98, Windows NT, or Windows 2000, or whether these have service packs installed, you should perform binding as part of your application's setup.
Points of Interest
I have tried my best to make clear the need for binding. Hope you like it. Also, the Bind utility internally uses the
Acknowledgement and References
I would like to acknowledge author Mr Jeffery Richter and his book on Windows OS, which is one of the best books to know about the Windows operating system internals. Parts of this article is taken from the book and examples were added to simplify things.