Alan Brown is director of software engineering, Interoperability Division, at MKS Software. He can be contacted at [email protected]
Remember the good old days when 640K was enough? Remember, too, when it empirically became insufficient for virtually everyone? That was a time when we saw innovations that let us map more memory into extremely limited address space. First, there was "Expanded Memory" (EMM), which was originally a hardware card to "page" 64K HASH(0x80b9d4) into the DOS address space-single 64K or 128K HASH(0x80b9d4). Then, we saw "Extended memory" (XMS) that made possible protected mode mappings into more than one 64K region. All this code was written just to give existing programs a stopgap. What was really needed was a larger address space, which is why 32-bit address spaces were a welcome relief.
If 640K was enough for anyone, 2 GBthe Windows practical limit in a 4-GB address spacehas proven to be enough for most applications. Nonetheless, there are still classes of applications looking for larger and larger address spaces. Again, we're seeing innovations, such as Address Windowing Extension (AWE) and Physical Address Extension (PAE) that mirror those of the 16-bit days. Then along comes 64-bit processors and two new 64-bit versions of Windows. Unlike the 16- to 32-bit migration, however, we do not have a majority of applications exploding out of the confines of a 2-GB address space. So what are we to do with these new processors and why do we care?
To be sure, some of us will continue to write 32-bit applications for the foreseeable future. However, during 2004, something happened that made it possible to put a 64-bit processor on every desktop. AMD's Athlon64 and Opteron processors became available in quantity, and motherboard manufacturers started to release boards with prices comparable to their high-end 32-bit counterparts. Intel, stuck in its Itanium money pit, rushed to follow suit, and by late 2004 was shipping "extended architecture 32-bit" Xeon processors, the IA32E or EM64T. AMD had seen what Intel had either missed or failed to deliverthat backward compatibility, in the form of well-performing binary compatibility, was as important today as in the Win16-to-Win32 days that led to the downfall of OS/2 and the domination of Windows on the desktop. To be sure, Intel produced a chip that performs extremely well when running native compiled 64-bit code, but my experience is that 32-bit applications do not perform as well as on AMD64 and EM64T architectures. Microsoft appears to have learned the OS source-code compatibility lesson, too, as building existing Win32 sources for 64-bit deployment is (for the most part) a fairly easy task.
We're now poised on the release of a new "Extended Architecture 64-bit" version of Windows. In fact, many computer vendors have been shipping EM64T and AMD64 processors with 32-bit Windows, so you may have one of these processors and not even have noticed it. Perhaps the most exciting aspect of this processor is its ability to run memory-hungry applications side-by-side with well-performing 32-bit applications. So you have to wonder whether 8 Terabytes is finally enough memory for everyone? Although, in theory, we ought to see 264 bytes or 16 Megaterabytes (16 Exabytes), in practice, we see that the application address space is currently limited to a tiny 7 to 8 Terabytes by the placement of the kernel mapping into each process at around 0x8000000000.
Of course, there is more to this 64-bit evolution than just process address space. We are no longer doubling processor speed every year or so and, at least for now, we find ourselves with processors running at between 3 GHz and 4 GHz. So processors with twice the hardware bus width (and/or multicore processors) ought to be able to perform a little better (at least in some cases) than their smaller cousins.
Given that 64-bit Windows is inevitable and that it is fairly likely we will see a greater and greater percentage of new 64-bit machines shipping, we are left with a quandary: Do you port to 64 bits or do you continue to deliver 32-bit applications? In this article, I examine the issues surrounding full ports and point out some of the benefits in simply remaining 32 bits. Where this all comes from is that I've spent 12 years developing a UNIX API layer for Win32 at MKS Software where I work. About three years ago, a customer requested that we port this layer to 64-bit Windows on the Itanium. Despite the portability of this layer to MIPS, PowerPC, and Alpha in the past, there were some unique 64-bit porting issues. I share some of the lessons learned along the way.
Windows on Windows
Those of us who remember the "DOS Penalty Box" and 16-bit Windows applications guest hosted on 32-bit Windows are groaning right now. Haven't we done that already? Didn't we learn anything? Well, on the surface, it appears that maybe this is going to be fine. The major problem with WoW32 was not performance, but compatibility and stability. In the 16-bit Windows world, we have cooperative multitasking and shared address space where not all applications were fond of the confinements imposed by DOS under Windows instead of Windows teetering on top of DOS. This time around, we already have applications that are expecting preemption and know they live in an isolated address space. On 64-bit Windows, 32-bit processes are nothing more than separate 64-bit processes with a special thunking layer that sets up an environment in which these 32-bit applications run. The new layer is called "Wow64," short for "Win32 on Windows 64."
A Windows 32-bit address space is split nicely down the middle with 2 GB for the operating system, and 2 GB for the application. If you include the /3GB boot.ini option (and perhaps /Userva=3030, if your machine runs out of process table entries, as mine did, and failed to allow logins after restart) and the /LARGEADDRESSAWARE linker flag, you can move the kernel shared system space (usually located in the upper 2 GB) up into the top 1 GB and give the application 3 GB of address space. But getting a big contiguous block of memory is not as easy as it seems. I have written a simple test applet to display the largest block of free memory and the highest application address. I then compiled it with 32- and 64-bit compilers, set and reset the /LARGEADDRESSAWARE bit in the executable, and ran them on various platforms; see memwalk.cpp (available at http://www.cuj.com/code/). Table 1 presents the results. (Naturally, your mileage will vary based on the Windows version and how many intrusive DLLs you have loading from HKLM\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs.) My best guess is that this limit is designed to optimize the page table sizes and that it could be changed at the sacrifice of kernel size (and perhaps performance) if the need arises.
The single most surprising result is that a LARGEADDRESSAWARE 32-bit Windows application is given virtually the entire 4-GB address space on x64 Windows (although we are still stuck with the unfortunate locations of the system DLLs between 0x7000000 and 0x7FFFFFFFrebase at your own riskmaking the largest contiguous blocks virtually the same as on a 32-bit processor). With very little work, you can virtually double available memory to your application with the simple requirement that it be run on an x64 version of Windows. I claim that this is 2 GB for free because few customers are going to set the /3GB boot.ini flag on a 32-bit OS for you, even if they know how, but you may have a chance of convincing them to install x64 machines.
I did find one unexpected result. On Windows XP SP2, the largest free block was significantly smaller than any other platform I tested. I adjusted memwalk.cpp to dump out DLL locations by calling GetModuleFileName() on every base address and discovered that UXTheme.DLL seems to have a suggested load address right in the middle of the largest free block (0x5AD70000). You'd think this was an error that will be corrected in future releases. X64 XP seems not to have that problem with its uxtheme.dll (based at 7DF50000). It may be safe to rebase the XPSP2 uxtheme.dll.
Of course, AWE and PAE are also available to 32-bit applications running on an x64, but this is neither new nor particularly surprising. After all, you have XMS in the penalty box on 32-bit Windows, too. The IA64 version of 64-bit Windows does neither AWE nor PAE for 32-bit applications running in Wow64 emulation.
If you decide to stick with 32-bit binaries for now, you can still make some preparations for 64-bit porting by turning on the 64-bit compatibility warnings in the Microsoft compilers. VS.NET 2003 (and later) has a /Wp64 command-line option that emits various useful warnings.
The Wow64 environment also has a couple of surprises (that in retrospect are very necessary) and a set of new APIs:
- A 32-bit application asking for %WINDIR%\System32 is actually redirected to %WINDIR%\Syswow64 (Wow64 FS Redirection).
- The registry is also virtualized such that HKLM/Software is really HKLM/Wow6432Node. There appears to be no similar HKCU virtualization. So, for example, Internet Explorer settings are shared by both 32- and 64-bit Internet Explorer.
- 32-bit applications are installed into c:\Program Files (x86) by default.
There are new APIs that allow 64-bit aware 32-bit applications to undo the virtualization and detect if it is running in such an environment; see Table 2. These are prototyped in the newest Platform SDK, but caution is needed as these are only implemented in kernel32 from Windows 2003 forward. Given that "link /delayload:kernel32.dll" is not an option, this leaves you with LoadLibrary() and GetProcAddress() to allow a binary to run both on the newer Windows and also on older Windows versions. I was pleasantly surprised to find one GetProcAddress() prototype in the winbase.h for GetSystemWow64Directory (both a function pointer typedef and a pair of ANSI and Unicode names), but disappointed to see that it was a Singleton. I implore Microsoft to take a few minutes out from .NET development to provide the dinosaurs among us with this mind-bogglingly useful enhancement. System32.cpp is a test applet that shows filesystem redirection functions in action.
There were also nuggets added to the native NT API. Before examining these, however, I need to discuss the encapsulated 32-bit address space within a Wow64 process.
When Windows detects CreateProcess() on a 32-bit binary, it has to set up a Wow64 process to handle it. This Wow64 process is actually a 64-bit application with a full 64-bit address space. It loads a 64-bit ntdll.dll (as do all 64-bit processes). It also loads several thunking DLLs, whose job appears to be to extract information from 32-bit stacks on the other end of a 32-bit call gate and make the native 64-bit calls to the 64-bit ntdll.dll:
Other than these DLLs, I am told no other 64-bit DLL may load into the Wow64 address space. Once this Wow64 application has loaded, it sets up a region of the 64-bit address space for the 32-bit process to runyou'd think this is the lower 4 GBand loads the 32-bit ntdll.dll and the normal 32-bit load begins. Certainly, using a 64-bit debugger (such as Windbg) on a 32-bit process is enlightening. While a 32-bit debugger simply stopped tracing at a jmp 33:xxxxxxxx instruction, the 64-bit one jumped through the gate just fine and let me continue observing right up to the protected mode jump in the 64-bit ntdll.dll. After all the years of looking at 32-bit disassembly, RAX (the 64-bit widened EAX) and the R08->R16 registers were a little hard to get used to.
For the most part, the 32-bit DLLs in %WINDIR%\syswow64 are apparently the same as their 32-bit Windows counterparts. I have read that they are "identical with few performance exceptions." One exception is clearly 32-bit ntdll.dll, which not only appears to be the native interface layer to 32-bit applications, but also sets up thunks to the 64-bit layer, and also exposes a number of Wow64-specific APIs that are not present in the 64-bit version of ntdll.dll. A dumpbin -exports | grep Wow64 | grep Nt shows an interesting list, which piqued my interest enough to do a little hacking to see what they might do for me:
If you are running a 32-bit application in the Wow64 layer and you want to access memory in a 64-bit application, the Win32 API ReadProcessMemory does work, but only to read the first 4 GB of address space. What if what you need is somewhere else? What if you want to read the Process Environment Block (PEB) or a Thread Environment Block (TEB) from address 0x7fffffde000?
NtQueryInformationProcess() does actually return information for many of the information classes. But where data returned would not fit into a 32-bit address space, the thunking mechanism apparently substitutes a zero. So how do you gather the information? Well, apparently, you create data structures that are carefully expanded to their 64-bit sizes and pass them to the Wow64 versions of the functions. An example GetParentProcessId() and GetCurrentDirectoryExW() is included in getcwd.cpp (at http://www.cuj.com/code/) as an example of using these APIs. In fact, the parent process ID is available to NtQueryInformationProcess(), so this example also walks into the PEB of a remote 32- or 64-bit process and prints its current working directory. I found it interesting that a Wow64 process appears to have a 32- and 64-bit PEB.
Of some interest is the ability to read a 64-bit address space. Not only can you read a PEB from another 64-bit process, but you can also observe the 64-bit portions of the Wow64 process (perhaps even a modified memwalk.cpp as NtWow64QueryVirtualMemory64() is present). Conspicuous in their absence are virtual memory allocation and modification functions, NtWow64WriteVirtualMemory64(), and any way to load a 64-bit DLL into the 64-bit portions of the Wow64 processprobably just as well given the number of these intrusive AppInit DLLs.
Wow64 applications see a virtualized view of the 64-bit filesystem and registry, but a network access (for example, \\server\admin$ and ConnectRegistry()) from a 32-bit machine sees the 64-bit registry and filesystem unvirtualized. So for any number of reasons, copy foo.dll \\server\admin$\system32 may not be the best thing to do.
Native 64-Bit Porting
Now that I have talked you out of recompiling your code with a 64-bit compiler, let me describe some of the issues and motivations for taking the plunge and actually porting.
There are classes of application that really need more than 4 GB of memory. These applications have probably been struggling with AWE and PAE or simply live on other 64-bit platforms, such as Solaris. Here are some of the reasons you may choose to recompile a native 64-bit binary:
- 32-bit libraries (static and dynamic) may not be linked to 64-bit executables (and vice versa), so if you provide middleware and you have customers planning to deploy on 64-bit Windows, you will need to migrate (which is, in fact, how we initially came to port to IA64, and more recently, to x64). COM-based DLLs are the exception to this. They are "proxied" in a 32-bit host in a similar manner to what would occur with a network link (although one loses the in-process performance and you may find yourself porting your component to 64-bit Windows anyway).
- If you deploy (Internet) Explorer extensions, you will not be able to use the 32-bit versions on 64-bit (Internet) Explorer. 64-bit Windows ships with both 32- and 64-bit (Internet) Explorer, and if each is configured for one process per window, you can in fact have them both running on a 64-bit machine. Hence, you need to deliver both 32- and 64-bit extensions to a 64-bit Windows.
- Of course, if you are exploding at the seams and longing for an 8 Terabyte address space, then Win64 is for you.
- You have a 16-bit Windows application. There is no Windows on Windows on Windows (WoWoW?) and 16-bit applications just plain do not run. (The image file LEX.EXE is valid, but is for a machine type other than the current machine.) It was a surprise to discover that one of the installers in a third-party component in our suite had a small 16-bit component that needed to be rewritten.
- You have a device driver. 32-bit device drivers do not work on 64-bit Windows.
- You are targeting IA64 and you have found that your application does not perform as well (or uses AWE or PAE, which is not available on IA64). It appears as though a native compiled IA64 binary performs exceedingly well while 32-bit binaries often perform less well than on 32-bit native hardware. Our experience is mostly with the first generation Itanium and we understand that Itanium II is better in this regard (perhaps because Windows uses a 32-bit software binary emulator rather than the native hardware emulation).
As I previously alluded with the COM proviso, this 64-bit port is not an all-or-nothing proposition.
Porting to a new platform can be a very big job. Porting in general requires some serious planning and a time commitment. Understanding the scope of the problem, however, requires that you at least make a test compile of some representative parts of your source base. It sounds simple enough on the surface: Just type make (or nmake or devenv /build), right? In practice, there can be a little more to it than that.
Do I use libraries for which I do not have source or it is not available on my target platform? What platform are you going to use for your porting? The target one or something more familiar like an x86 desktop? Is your Source Code Configuration management system even available for the target platform? Where do I get a compiler and linker? Do all of my build tools work for 64-bit development? Is your build environment able to handle multiplatform builds in the same source tree or do you need to make copies? So, in fact, just typing "make" may require some up-front work. But the simplest solution is to copy your source tree, install the Microsoft Platform SDK, and use the x86 to 64-bit cross compilers and leave all of your build environment, tools, SCM, and so on, on 32-bit Windows. In fact, this may be the single most surprising fact. You do not need a 64-bit machine until you have linked your first executable.
Once you have an executable, you have to find a debugger, because it is extraordinarily unlikely that this binary will run the first time and be ready to ship to customers (although for something small and self contained like an Internet Explorer Toolbar, I would not rule that possibility out entirely).
You cannot link 32-bit libraries (static or dynamic) to 64-bit applications. So if you do not have source code for all of your libraries and a 64-bit version of your dependent libraries is not available, this port is not yet feasible. You need to replace these libraries with something that is available on your target 64-bit architecture (often cost prohibitive), persuade the vendor to build them for you, obtain a source license and port it yourself, or just not port at all.
The next most obvious question is of development environmentcompilers, linkers, debuggers, CM systems, make, and IDEs.
- Where do I get a compiler? At this writing, GNU C/C++ is not yet available. Intel has commercial compilers for IA64 (native and cross compilers), IA32E (as Intel calls its EM64T x64 platform), and IA32. Microsoft has a pair of 14.0 (VC.NET 2005) x86 to IA64 and x64 cross compilers with the Platform SDK. And in the public preview of Visual Studio.NET 2005, in fact, there are native x64 and IA64 compilers as well as the x86 cross compilers.
- IDEs do not have to be 64 bit. You can either cross compile from a 32-bit system or run your 32-bit IDE on the 64-bit platform. This is assuming that your IDE is adjustable enough to use compilers that were not known to its developers.
- Where do I get a debugger? Windbg is available for all three architectures, but is a bit unsavory for all but the hackers among us. Visual Studio.NET 2005 has a remote debugger server for both 64-bit platforms. The basic studio remains a 32-bit application, again showing that you can have a hybrid implementation. You can develop on your current x86 desktop and remote debug to the 64-bit platforms as perhaps you might to Win9x.
- What about my Configuration Management system? Check with your SCM vendor to see if they support development on your target 64-bit platform. If not, either develop using cross compilation on your 32-bit workstation or copy or network mount the source base to the 64-bit machine.
So now you are prepared to build your source base. All these years of preparation for portability is about to bear fruit. But how portable is your code?
We all know that sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long). But are we prepared for sizeof(long) != sizeof(void *) or sizeof(int) != sizeof(void *)? Most 64-bit UNIX systems define sizeof(long) == sizeof(void *)the so-called LP64 model where long and pointer are 64 bits. Windows chose an LLP64 model where sizeof(long long) == sizeof(void *) for ease of porting 32-bit Windows source. This means that if you have code that assigns pointers to integers (which is nonportable everywhere, but it is easy to become lax and just assume that sizeof(int) == sizeof(long) == sizeof(void *)), or assignments of pointer to long, you will need to change these before your code will run correctly. Remember that it may appear to be running correctly as long as its locality remains in the lower 4 GB of the 64-bit address space, but malloc() and HeapAlloc() can return values outside this range as can MapViewOfFile(). It might be reasonable to VirtualProtect(MEM_RESERVE) everything available below 0x100000000 as part of initial testing (a modified memwalk.cpp can help you there).
For the most part, well-written Win32 code compiles with few, if any, source code changes. You may be surprised at the number of places you have assumed that a DWORD is the same thing as a PVOID, but if you followed the rules and were careful, the source files will compile; nonetheless, set -W3 on the 64-bit build, and pay close attention to the warnings.
I expected to need to create new GUIDs for all of my COM components, but in retrospect, that is simply a naïve view. You can have a 32-bit and a 64-bit component (say an Internet Explorer toolbar) each with the same GUID and each serving a client on the same machine at the same time. In fact, the COM implementation is quite flexible, allowing 32-bit components to be used by 64-bit processes (and vice versa) or both 32- and 64-bit implementations registered simultaneously. In other words, if you choose to port, you keep the same GUID and interface definitions. If you choose not to port (except for the components such as IE toolbars and Explorer extensions that apparently need to be in process servers and really do need to be ported), for the most part, everything just works. Performance may be the driving factor here.
Some of the harder problems will surround sharing of binary data with a 32-bit version of your software, be it through files or perhaps in shared memory, or over the wire. There are numerous technologies to help you here, but Microsoft has put a number of useful tools in their compilers and headers.
- Compiler option /Wp64 can be used today with cl Version 13.0 and higher (VS.NET 2003 and later) and with the 14.0 compilers in the Platform SDK. By simply setting /Wp64 on the build line of every source file, you will see five (that we have discovered so far) useful warnings; see Figure 1. The example, wp64.cpp (available online) illustrates these warnings and some portable solutions.
- Include Basetsd.h and make use of some of the new inline functions and macros, such as DWORD32, DWORD64, UINT_PTR, PointerToUlong(), and Ptr64ToPtr(). HWND is a 64-bit quantity on 64-bit Windows and if you need to pass one to a 32-bit process, it needs to be truncated; HandleToUlong() will help you there. But use caution: Truncation is seldom desirable. *__ptr64 forces a pointer to occupy 64 bits, even on a 32-bit platform. void * __ptr64 ptr; is useful to force the size to be identical on 32- and 64-bit builds for binary compatibility. There is a corresponding __ptr32, but apparently, no forward thinking __ptr128 for when we discover that 8 Terabytes is not enough.
- Use intptr_t instead of int or long long for pointer assignments to integer values. Most compilers provide a typedef to ensure this is correctly sized for your target platform.
- Remove any inline assembler and either write some "C" code for it, or build external assembler files because __asm produces an error on IA64 and x64 compilers (which is probably just as well).
- Ensure structure alignment where binary compatibility is an issue through the judicious placement of unions. See pointers.cpp.
- The IA64, like many RISC processors, is rather picky about data alignment. I came across some code like this:
unsigned char *ptr; DWORD dword = (DWORD) (*(unsigned long long *) ptr);
- which seems a lot happier with a memcpy() than the data misalignment exceptions it was producing.
- There may be times when you really need different code for 32- and 64-bit builds. The Microsoft 64-bit compilers define a preprocessor variable _WIN64 for this. The Microsoft compilers also provide _M_IX86, _M_IA64, and _M_AMD64 for those rare occasions that you need processor-specific code.
Likely, the hardest part of a 64-bit port is going to be the build environment. If you have previously ported to other architectures (Alpha, MIPS, and PowerPC, for instance), it is likely to be easier. The sample source (available electronically) has a makefile for use with CL and NMAKE that is a trivial example of a multiplatform build in the same source tree. The idea is to keep the outputs separate for each platform. Even this makefile has problems with multiple simultaneous builds as I failed to handle the vc?.pdb locations and the compiler TEMP locations need to be separated also.
If you use the Visual Studio IDE, you can fairly easily copy a configuration, specify $(OUTDIR), adjust compiler options as needed, and then open a PSDK command window for your target architecture and devenv /build /useenv myproject.sln "Win32 - Debug(X64)" to build that project. You'd think that VS.NET 2005, with its multiplatform compiler, will make this even easier by permitting target platforms to be specified on each configuration. There is a wonderful paper available on the AMD web site that descibes the step by step procedures for using Visual Studio and the Platform SDK to produce 64-bit binaries using existing project files.
There are of course, any number of installer issues. Delivering 32, 64 (IA64 and x64). or a three-way hybrid install requires careful planning. If you use Windows Installer, you are supposed to deliver using one MSI file for each architecture. Interestingly, you can use 32-bit Custom Action DLLs in a 64-bit install, suggesting a pair of cooperating 32- and 64-bit processes under the hood. But this is a topic for its own paper.
Not everyone will need to create native 64-bit ports of their applications. In fact, some may even have lighter weight if left as 32-bit and run on 64-bit processorsexcept perhaps on Itanium machines that, for the most part, will be relegated to high-end servers (if the chip is not already de facto dead). It does seem clear that Microsoft has done a super job on its 32-bit emulation layer, 64-bit processors, and motherboards are sufficiently inexpensive to make them very popular. And as a result, we have no excuses not to be embracing 64-bit platforms, both for running existing 32-bit applications and for a new breed of games and memory-intensive business applications taking advantage of the larger address space.
Thanks to Eric Youngdale, without whom this article would never have existed.