In this article, I discuss why basing a DLL is desirable and what it involves. Then I present a post-link utility, called Libase (for library base) to automate the procedure. Libase differs from the Platform SDK utility Rebase in that it chooses the new base address for the DLL based on a hash of the filename, instead of asking you to provide a base address explicitly.
Base Addresses and Rebasing
Every Win32 application loads in a private memory address space. The operating system makes it appear that each process has a linear address range that starts from zero. When one process reads from memory address 0x12345, it reads from an entirely different physical memory address than another process that reads from the same address. The operating system keeps the various logical memory spaces apart by implicitly using the segment registers (CS, DS, etc.) as a selector into a table that maps logical memory addresses to physical memory addresses. This is the way that protected mode works on the Intel processors.
In a process, the application (.exe) and all loadable components (mostly DLLs, but also in-process ActiveX servers which in fact are DLLs) share the logical address space. If one DLL reads from address 0x12345, it reads from the same memory location as another DLL in the same process.
Applications and DLLs have functions and variables. Code in an application calls a function by jumping to the address at which the function starts. The starting address of the function was determined by the linker when it built the executable. The linker cannot just choose any address; it has to take into account areas of memory that the operating system has reserved. A function may not start at address 0x00000000, for example, and in Windows 9x it may also not exceed address 0x80000000.
This raises a problem for DLLs. When you link the DLL, the linker cannot know what linear address the DLL will have to actually load at for any given application (or even application invocation). Thats the Dynamic Link part of the DLL acronym. It is entirely possible, and even very likely these days, that an application loads two DLLs that both are based by the linker to load at the same starting address. (The base address set by the linker is the preferred load address.)
If your application needs to load a DLL whose preferred load address conflicts with memory thats already in use (such as by a previously-loaded DLL that had the same preferred load address), the operating system rebases the conflicting DLL by loading it at a different address that does not overlap and then by adjusting all addresses. The physical format of a .dll file includes relocation information that points to, for example, the target addresses of CALL and JMP instructions, and addresses that reference global/static variables (such as literal strings). All these addresses have to get revised if the operating system cannot load the DLL at its preferred load address.
This procedure, done at load time, is time consuming, of course, but it also increases the memory footprint that the DLL takes. For every loaded module, Windows creates a section object, a memory mapped file for the DLL. Whenever your application accesses memory that was swapped out, Windows reloads it from the section object. When the executable module was loaded at a different base address than its preferred base address, the image of the module in memory no longer matches the image of the module on disk, and, therefore, those portions of the module that contain relocations are swapped out to the system pagefile. In summary, if a module loads at its preferred base address, it is not copied to the pagefile; if a module is rebased, nearly all of the code section and some of the data section of the DLL is copied to the pagefile at load time.
Basing a DLL means to explicitly select a preferred load address when you link it hopefully selecting one that will cause it to avoid memory locations used by the application or other DLLs. Two reasons why this is desirable are, as discussed above, to make the DLLs load faster and to reduce pagefile usage. A third reason, brought forward in John Robbins book Debugging Applications is to be able to determine the module and the source code line (with the help of a .map file) when given a crash address. (If the DLL had to be rebased when it was loaded, the DLLs .map file addresses will no longer reflect reality for that invocation of the DLL.)
By the way, on the faster load time issue, I should mention that when an executable module is unloaded, Windows puts its pages on a standby list, a kind of cache from where the modules pages can be retrieved very efficiently when it is loaded again. So if you load a DLL for a second time, and its pages are still in the standby list, it will load a lot quicker than the first time.
Slow load times are most irritating for applications that you start frequently, such as compilers, and Windows standby list already deals adequately with this category. Still, some utilities should always start in a snap, even when run only occasionally. For example, when a screen saver launched while a colleague and I were intensively watching a simulation developing, we were both disturbed. But had it not taken three to five seconds to load (in the heat of the moment, we forgot to time it), I would not have disabled it immediately. If you produce screen savers, it is worth taking care of such trivial matters. There are many other examples of applications that you will want to launch quickly from the first time on, from right-click context menu extensions for Windows Explorer to optional macro/script engines in applications.
Base address conflicts are not the only cause of slow DLL load times, or even the most important one. Ruediger Asche (see the bibliography) gives a detailed report of load times for a set of DLLs before and after base address conflicts were resolved. However, resolving base address conflicts is so easy that there is hardly any reason not to do it.
To end this overview, I checked the base addresses of executable modules built by the compilers that I have:
- Applications (.exe files) start at 0x00400000 for all compilers that I tested. These executable images are loaded first in a process, and they will never need to be relocated. (In fact, they sometimes do not even contain a .RELOC section the part of a .exe or .dll that contains detailed relocation information for rebasing.)
- Microsoft Visual C/C++ places DLLs at address 0x10000000; this is the address that you will encounter most.
- Microsoft Visual Basic places DLLs at address 0x11000000.
- Borland C++, Watcom C/C++, and LCC-Win32 place DLLs at address 0x00400000, thereby guaranteeing a conflicting base address with the application.
The address range for an application that is not reserved by any version of Windows is from 0x00400000 to 0x80000000. The system DLLs for Windows are currently based in memory from 0x70000000 to 0x78000000 on the Intel processors and from 0x68000000 to 0x78000000 on the MIPS processors. Other standard DLLs (for OLE support) are apparently in the range 0x50000000 to 0x5f000000. When selecting base addresses for DLLs, Microsoft suggests that you select them from the top of the allowed address range downwards, in order to avoid conflicts with memory allocated dynamically by the application (which is allocated from the bottom up).
In conclusion, the most suitable address range for DLLs is from 0x60000000 through 0x6f000000. Microsoft, seeking portability where it cannot be achieved, proposes to reduce the range further to 0x60000000 through 0x68000000 in order to accommodate both Intel and MIPS processors. (Also note that Microsofts upper limit overlaps the reserved range of the MIPS processor.) Microsofts proposal continues with a first letter scheme for the selection of the base address, which I have summarized in Table 1. In other words, you select a base address for your DLL based on the first letter of the DLLs name and the addresses in Table 1.
After selecting a load address for a DLL, you have to tell it to the linker. Note again that applications (.exe files) do not need a base adjustment; they are the first executable module that the loader will load, and, therefore, they always load at the address that the linker has fixed them at. The linker options are:
- With Watcom C/C++, add the OP OFFSET=address to the linker line (WLINK), where you replace address with the desired base address. You can use decimal or hexadecimal notation for this address. (Hexadecimal is in the same format as C/C++ literals, for example, OP OFFSET=0x62000000.)
- With Borland C++, use the -B:address option (TLINK32); the value is in hexadecimal.
- With Microsoft C/C++, use the option -base:address; the value is in hexadecimal.
- Alternatively, you can use a post-link utility. For the Rebase utility, which comes with the Platform SDK, use the -b address option; the value is in hexadecimal.
Automatic Rebasing: Libase
The drawbacks of the manual rebasing scheme are that the table is difficult to memorize, and that choosing a base address only on the first letter is too simplistic. When I tried it on several somewhat larger projects that I take part in, conflicts arose so quickly that a rolling the dice scheme produced better results than the first letter proposal. Initially, I extended the scheme to take the first two letters into account (with the added rule that, if many filenames start with the same prefix, the second letter to select is the first letter in the filename behind that prefix). This worked in the sense that it resolved nearly all of the conflicts, but the procedure became even harder to know by heart, now requiring two tables instead of one. This called for an automatic solution. And while I was at it, why stop at considering only two letters of the filename?
Libase is a little post-link utility that I wrote that chooses a base address of a DLL (considering all letters in the filename) and rebases the DLL to that address. You do not need to add linker flags to use it; instead, Libase must run after the linker has finished. Libase is configurable via a .ini file; by default it uses an address range of 0x60000000 to 0x6ff00000 (larger than the one proposed by Microsoft) with a step size of 0x00100000. The chosen range and step size allow for 256 different base addresses (instead of just nine with Microsofts proposal). The hash is adapted from the well-known hash function published in Compilers: Principles, Techniques and Tools by Aho, Sethi, and Ullman (page 435) as P. J. Weinbergers algorithm for computing hash values. The source code for Libase is in libase.c (Listing 1), and libase.ini (Listing 2) contains a sample .ini file to control it.
In its default configuration, Libase disregards the case of characters in a filename. That is, the files mylib.dll and MYLIB.DLL are rebased to the same address. By setting IgnoreCase to 0 in the .ini file, Libase uses the case of the filename as stored on disk.
To use Libase, simply run it with the path to a DLL on the command line. Libase can rebase multiple DLLs in one invocation, but unlike the Platform SDK utility Rebase, it does not choose consecutive, non-overlapping addresses; Libase chooses the base address for each DLL from a hash of its filename. One added feature of Libase is that it keeps the addresses to which it has rebased all modules that it has seen in its .ini file. This allows you to check whether a collision has occurred and to which DLLs that collision applies.
The workhorse function of Libase is ReBaseImage(), which is exported by Microsofts imagehlp.dll. The implementation of Libase is trivial for the remainder (except for the frustration of the SDK documentation for the ReBaseImage() function mismatching the prototype in imagehlp.h and conflicting with a comment in that header file).
Libase does not guarantee that a DLL gets a unique preferred load address; a base address collision may still occur; it is just less likely. The default step size assumes that no DLL is bigger than 1MB. You will get a warning for a DLL whose size exceeds the step size, because the report in the .ini file is then no longer accurate.
In closing, I would like to mention that to have a DLL load quickly, the first step is to make sure that Windows can locate it quickly. My advice is to keep implicitly loaded DLLs in the same directory as the application that uses them and to use a full path for DLLs that the application loads explicitly.
Ruediger R. Asche. Rebasing Win32 DLLs: The Whole Story, MSDN library, September 1995. This article does exhaustive tests on the load-time degradation of DLLs that must be rebased by the operating system at load time.
John Robbins. Debugging Applications (Microsoft Press, 2000), ISBN 0-7356-0886-5.
About the Author
Thiadmer Riemersma writes multimedia software and animation toolkits for his company ITB CompuPhase in The Netherlands. He can be contacted via the company home page at www.compuphase.com.