Embracing 64-Bit Windows

Alan shares some of the lessons learned in porting a Win32 UNIX API layer to 64-bit Windows.


June 01, 2005
URL:http://www.drdobbs.com/embracing-64-bit-windows/184401966

June, 2005: Embracing 64-Bit Windows

Alan Brown is director of software engineering, Interoperability Division, at MKS Software. He can be contacted at [email protected].


Remember the good old days when 640K was enough? Remember, too, when it empirically became insufficient for virtually everyone? That was a time when we saw innovations that let us map more memory into extremely limited address space. First, there was "Expanded Memory" (EMM), which was originally a hardware card to "page" 64K HASH(0x80b9d4) into the DOS address space-single 64K or 128K HASH(0x80b9d4). Then, we saw "Extended memory" (XMS) that made possible protected mode mappings into more than one 64K region. All this code was written just to give existing programs a stopgap. What was really needed was a larger address space, which is why 32-bit address spaces were a welcome relief.

If 640K was enough for anyone, 2 GB—the Windows practical limit in a 4-GB address space—has proven to be enough for most applications. Nonetheless, there are still classes of applications looking for larger and larger address spaces. Again, we're seeing innovations, such as Address Windowing Extension (AWE) and Physical Address Extension (PAE) that mirror those of the 16-bit days. Then along comes 64-bit processors and two new 64-bit versions of Windows. Unlike the 16- to 32-bit migration, however, we do not have a majority of applications exploding out of the confines of a 2-GB address space. So what are we to do with these new processors and why do we care?

To be sure, some of us will continue to write 32-bit applications for the foreseeable future. However, during 2004, something happened that made it possible to put a 64-bit processor on every desktop. AMD's Athlon64 and Opteron processors became available in quantity, and motherboard manufacturers started to release boards with prices comparable to their high-end 32-bit counterparts. Intel, stuck in its Itanium money pit, rushed to follow suit, and by late 2004 was shipping "extended architecture 32-bit" Xeon processors, the IA32E or EM64T. AMD had seen what Intel had either missed or failed to deliver—that backward compatibility, in the form of well-performing binary compatibility, was as important today as in the Win16-to-Win32 days that led to the downfall of OS/2 and the domination of Windows on the desktop. To be sure, Intel produced a chip that performs extremely well when running native compiled 64-bit code, but my experience is that 32-bit applications do not perform as well as on AMD64 and EM64T architectures. Microsoft appears to have learned the OS source-code compatibility lesson, too, as building existing Win32 sources for 64-bit deployment is (for the most part) a fairly easy task.

We're now poised on the release of a new "Extended Architecture 64-bit" version of Windows. In fact, many computer vendors have been shipping EM64T and AMD64 processors with 32-bit Windows, so you may have one of these processors and not even have noticed it. Perhaps the most exciting aspect of this processor is its ability to run memory-hungry applications side-by-side with well-performing 32-bit applications. So you have to wonder whether 8 Terabytes is finally enough memory for everyone? Although, in theory, we ought to see 264 bytes or 16 Megaterabytes (16 Exabytes), in practice, we see that the application address space is currently limited to a tiny 7 to 8 Terabytes by the placement of the kernel mapping into each process at around 0x8000000000.

Of course, there is more to this 64-bit evolution than just process address space. We are no longer doubling processor speed every year or so and, at least for now, we find ourselves with processors running at between 3 GHz and 4 GHz. So processors with twice the hardware bus width (and/or multicore processors) ought to be able to perform a little better (at least in some cases) than their smaller cousins.

Given that 64-bit Windows is inevitable and that it is fairly likely we will see a greater and greater percentage of new 64-bit machines shipping, we are left with a quandary: Do you port to 64 bits or do you continue to deliver 32-bit applications? In this article, I examine the issues surrounding full ports and point out some of the benefits in simply remaining 32 bits. Where this all comes from is that I've spent 12 years developing a UNIX API layer for Win32 at MKS Software where I work. About three years ago, a customer requested that we port this layer to 64-bit Windows on the Itanium. Despite the portability of this layer to MIPS, PowerPC, and Alpha in the past, there were some unique 64-bit porting issues. I share some of the lessons learned along the way.

Windows on Windows

Those of us who remember the "DOS Penalty Box" and 16-bit Windows applications guest hosted on 32-bit Windows are groaning right now. Haven't we done that already? Didn't we learn anything? Well, on the surface, it appears that maybe this is going to be fine. The major problem with WoW32 was not performance, but compatibility and stability. In the 16-bit Windows world, we have cooperative multitasking and shared address space where not all applications were fond of the confinements imposed by DOS under Windows instead of Windows teetering on top of DOS. This time around, we already have applications that are expecting preemption and know they live in an isolated address space. On 64-bit Windows, 32-bit processes are nothing more than separate 64-bit processes with a special thunking layer that sets up an environment in which these 32-bit applications run. The new layer is called "Wow64," short for "Win32 on Windows 64."

A Windows 32-bit address space is split nicely down the middle with 2 GB for the operating system, and 2 GB for the application. If you include the /3GB boot.ini option (and perhaps /Userva=3030, if your machine runs out of process table entries, as mine did, and failed to allow logins after restart) and the /LARGEADDRESSAWARE linker flag, you can move the kernel shared system space (usually located in the upper 2 GB) up into the top 1 GB and give the application 3 GB of address space. But getting a big contiguous block of memory is not as easy as it seems. I have written a simple test applet to display the largest block of free memory and the highest application address. I then compiled it with 32- and 64-bit compilers, set and reset the /LARGEADDRESSAWARE bit in the executable, and ran them on various platforms; see memwalk.cpp (available at http://www.cuj.com/code/). Table 1 presents the results. (Naturally, your mileage will vary based on the Windows version and how many intrusive DLLs you have loading from HKLM\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs.) My best guess is that this limit is designed to optimize the page table sizes and that it could be changed at the sacrifice of kernel size (and perhaps performance) if the need arises.

The single most surprising result is that a LARGEADDRESSAWARE 32-bit Windows application is given virtually the entire 4-GB address space on x64 Windows (although we are still stuck with the unfortunate locations of the system DLLs between 0x7000000 and 0x7FFFFFFF—rebase at your own risk—making the largest contiguous blocks virtually the same as on a 32-bit processor). With very little work, you can virtually double available memory to your application with the simple requirement that it be run on an x64 version of Windows. I claim that this is 2 GB for free because few customers are going to set the /3GB boot.ini flag on a 32-bit OS for you, even if they know how, but you may have a chance of convincing them to install x64 machines.

I did find one unexpected result. On Windows XP SP2, the largest free block was significantly smaller than any other platform I tested. I adjusted memwalk.cpp to dump out DLL locations by calling GetModuleFileName() on every base address and discovered that UXTheme.DLL seems to have a suggested load address right in the middle of the largest free block (0x5AD70000). You'd think this was an error that will be corrected in future releases. X64 XP seems not to have that problem with its uxtheme.dll (based at 7DF50000). It may be safe to rebase the XPSP2 uxtheme.dll.

Of course, AWE and PAE are also available to 32-bit applications running on an x64, but this is neither new nor particularly surprising. After all, you have XMS in the penalty box on 32-bit Windows, too. The IA64 version of 64-bit Windows does neither AWE nor PAE for 32-bit applications running in Wow64 emulation.

If you decide to stick with 32-bit binaries for now, you can still make some preparations for 64-bit porting by turning on the 64-bit compatibility warnings in the Microsoft compilers. VS.NET 2003 (and later) has a /Wp64 command-line option that emits various useful warnings.

The Wow64 environment also has a couple of surprises (that in retrospect are very necessary) and a set of new APIs:

There are new APIs that allow 64-bit aware 32-bit applications to undo the virtualization and detect if it is running in such an environment; see Table 2. These are prototyped in the newest Platform SDK, but caution is needed as these are only implemented in kernel32 from Windows 2003 forward. Given that "link /delayload:kernel32.dll" is not an option, this leaves you with LoadLibrary() and GetProcAddress() to allow a binary to run both on the newer Windows and also on older Windows versions. I was pleasantly surprised to find one GetProcAddress() prototype in the winbase.h for GetSystemWow64Directory (both a function pointer typedef and a pair of ANSI and Unicode names), but disappointed to see that it was a Singleton. I implore Microsoft to take a few minutes out from .NET development to provide the dinosaurs among us with this mind-bogglingly useful enhancement. System32.cpp is a test applet that shows filesystem redirection functions in action.

There were also nuggets added to the native NT API. Before examining these, however, I need to discuss the encapsulated 32-bit address space within a Wow64 process.

When Windows detects CreateProcess() on a 32-bit binary, it has to set up a Wow64 process to handle it. This Wow64 process is actually a 64-bit application with a full 64-bit address space. It loads a 64-bit ntdll.dll (as do all 64-bit processes). It also loads several thunking DLLs, whose job appears to be to extract information from 32-bit stacks on the other end of a 32-bit call gate and make the native 64-bit calls to the 64-bit ntdll.dll:

Other than these DLLs, I am told no other 64-bit DLL may load into the Wow64 address space. Once this Wow64 application has loaded, it sets up a region of the 64-bit address space for the 32-bit process to run—you'd think this is the lower 4 GB—and loads the 32-bit ntdll.dll and the normal 32-bit load begins. Certainly, using a 64-bit debugger (such as Windbg) on a 32-bit process is enlightening. While a 32-bit debugger simply stopped tracing at a jmp 33:xxxxxxxx instruction, the 64-bit one jumped through the gate just fine and let me continue observing right up to the protected mode jump in the 64-bit ntdll.dll. After all the years of looking at 32-bit disassembly, RAX (the 64-bit widened EAX) and the R08->R16 registers were a little hard to get used to.

For the most part, the 32-bit DLLs in %WINDIR%\syswow64 are apparently the same as their 32-bit Windows counterparts. I have read that they are "identical with few performance exceptions." One exception is clearly 32-bit ntdll.dll, which not only appears to be the native interface layer to 32-bit applications, but also sets up thunks to the 64-bit layer, and also exposes a number of Wow64-specific APIs that are not present in the 64-bit version of ntdll.dll. A dumpbin -exports | grep Wow64 | grep Nt shows an interesting list, which piqued my interest enough to do a little hacking to see what they might do for me:

If you are running a 32-bit application in the Wow64 layer and you want to access memory in a 64-bit application, the Win32 API ReadProcessMemory does work, but only to read the first 4 GB of address space. What if what you need is somewhere else? What if you want to read the Process Environment Block (PEB) or a Thread Environment Block (TEB) from address 0x7fffffde000?

NtQueryInformationProcess() does actually return information for many of the information classes. But where data returned would not fit into a 32-bit address space, the thunking mechanism apparently substitutes a zero. So how do you gather the information? Well, apparently, you create data structures that are carefully expanded to their 64-bit sizes and pass them to the Wow64 versions of the functions. An example GetParentProcessId() and GetCurrentDirectoryExW() is included in getcwd.cpp (at http://www.cuj.com/code/) as an example of using these APIs. In fact, the parent process ID is available to NtQueryInformationProcess(), so this example also walks into the PEB of a remote 32- or 64-bit process and prints its current working directory. I found it interesting that a Wow64 process appears to have a 32- and 64-bit PEB.

Of some interest is the ability to read a 64-bit address space. Not only can you read a PEB from another 64-bit process, but you can also observe the 64-bit portions of the Wow64 process (perhaps even a modified memwalk.cpp as NtWow64QueryVirtualMemory64() is present). Conspicuous in their absence are virtual memory allocation and modification functions, NtWow64WriteVirtualMemory64(), and any way to load a 64-bit DLL into the 64-bit portions of the Wow64 process—probably just as well given the number of these intrusive AppInit DLLs.

Wow64 applications see a virtualized view of the 64-bit filesystem and registry, but a network access (for example, \\server\admin$ and ConnectRegistry()) from a 32-bit machine sees the 64-bit registry and filesystem unvirtualized. So for any number of reasons, copy foo.dll \\server\admin$\system32 may not be the best thing to do.

Native 64-Bit Porting

Now that I have talked you out of recompiling your code with a 64-bit compiler, let me describe some of the issues and motivations for taking the plunge and actually porting.

There are classes of application that really need more than 4 GB of memory. These applications have probably been struggling with AWE and PAE or simply live on other 64-bit platforms, such as Solaris. Here are some of the reasons you may choose to recompile a native 64-bit binary:

As I previously alluded with the COM proviso, this 64-bit port is not an all-or-nothing proposition.

Porting to a new platform can be a very big job. Porting in general requires some serious planning and a time commitment. Understanding the scope of the problem, however, requires that you at least make a test compile of some representative parts of your source base. It sounds simple enough on the surface: Just type make (or nmake or devenv /build), right? In practice, there can be a little more to it than that.

Do I use libraries for which I do not have source or it is not available on my target platform? What platform are you going to use for your porting? The target one or something more familiar like an x86 desktop? Is your Source Code Configuration management system even available for the target platform? Where do I get a compiler and linker? Do all of my build tools work for 64-bit development? Is your build environment able to handle multiplatform builds in the same source tree or do you need to make copies? So, in fact, just typing "make" may require some up-front work. But the simplest solution is to copy your source tree, install the Microsoft Platform SDK, and use the x86 to 64-bit cross compilers and leave all of your build environment, tools, SCM, and so on, on 32-bit Windows. In fact, this may be the single most surprising fact. You do not need a 64-bit machine until you have linked your first executable.

Once you have an executable, you have to find a debugger, because it is extraordinarily unlikely that this binary will run the first time and be ready to ship to customers (although for something small and self contained like an Internet Explorer Toolbar, I would not rule that possibility out entirely).

You cannot link 32-bit libraries (static or dynamic) to 64-bit applications. So if you do not have source code for all of your libraries and a 64-bit version of your dependent libraries is not available, this port is not yet feasible. You need to replace these libraries with something that is available on your target 64-bit architecture (often cost prohibitive), persuade the vendor to build them for you, obtain a source license and port it yourself, or just not port at all.

The next most obvious question is of development environment—compilers, linkers, debuggers, CM systems, make, and IDEs.

So now you are prepared to build your source base. All these years of preparation for portability is about to bear fruit. But how portable is your code?

We all know that sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long). But are we prepared for sizeof(long) != sizeof(void *) or sizeof(int) != sizeof(void *)? Most 64-bit UNIX systems define sizeof(long) == sizeof(void *)—the so-called LP64 model where long and pointer are 64 bits. Windows chose an LLP64 model where sizeof(long long) == sizeof(void *) for ease of porting 32-bit Windows source. This means that if you have code that assigns pointers to integers (which is nonportable everywhere, but it is easy to become lax and just assume that sizeof(int) == sizeof(long) == sizeof(void *)), or assignments of pointer to long, you will need to change these before your code will run correctly. Remember that it may appear to be running correctly as long as its locality remains in the lower 4 GB of the 64-bit address space, but malloc() and HeapAlloc() can return values outside this range as can MapViewOfFile(). It might be reasonable to VirtualProtect(MEM_RESERVE) everything available below 0x100000000 as part of initial testing (a modified memwalk.cpp can help you there).

For the most part, well-written Win32 code compiles with few, if any, source code changes. You may be surprised at the number of places you have assumed that a DWORD is the same thing as a PVOID, but if you followed the rules and were careful, the source files will compile; nonetheless, set -W3 on the 64-bit build, and pay close attention to the warnings.

I expected to need to create new GUIDs for all of my COM components, but in retrospect, that is simply a naïve view. You can have a 32-bit and a 64-bit component (say an Internet Explorer toolbar) each with the same GUID and each serving a client on the same machine at the same time. In fact, the COM implementation is quite flexible, allowing 32-bit components to be used by 64-bit processes (and vice versa) or both 32- and 64-bit implementations registered simultaneously. In other words, if you choose to port, you keep the same GUID and interface definitions. If you choose not to port (except for the components such as IE toolbars and Explorer extensions that apparently need to be in process servers and really do need to be ported), for the most part, everything just works. Performance may be the driving factor here.

Some of the harder problems will surround sharing of binary data with a 32-bit version of your software, be it through files or perhaps in shared memory, or over the wire. There are numerous technologies to help you here, but Microsoft has put a number of useful tools in their compilers and headers.

unsigned char *ptr;
DWORD dword = (DWORD) (*(unsigned long long *) ptr);

Likely, the hardest part of a 64-bit port is going to be the build environment. If you have previously ported to other architectures (Alpha, MIPS, and PowerPC, for instance), it is likely to be easier. The sample source (available electronically) has a makefile for use with CL and NMAKE that is a trivial example of a multiplatform build in the same source tree. The idea is to keep the outputs separate for each platform. Even this makefile has problems with multiple simultaneous builds as I failed to handle the vc?.pdb locations and the compiler TEMP locations need to be separated also.

If you use the Visual Studio IDE, you can fairly easily copy a configuration, specify $(OUTDIR), adjust compiler options as needed, and then open a PSDK command window for your target architecture and devenv /build /useenv myproject.sln "Win32 - Debug(X64)" to build that project. You'd think that VS.NET 2005, with its multiplatform compiler, will make this even easier by permitting target platforms to be specified on each configuration. There is a wonderful paper available on the AMD web site that descibes the step by step procedures for using Visual Studio and the Platform SDK to produce 64-bit binaries using existing project files.

There are of course, any number of installer issues. Delivering 32, 64 (IA64 and x64). or a three-way hybrid install requires careful planning. If you use Windows Installer, you are supposed to deliver using one MSI file for each architecture. Interestingly, you can use 32-bit Custom Action DLLs in a 64-bit install, suggesting a pair of cooperating 32- and 64-bit processes under the hood. But this is a topic for its own paper.

Conclusion

Not everyone will need to create native 64-bit ports of their applications. In fact, some may even have lighter weight if left as 32-bit and run on 64-bit processors—except perhaps on Itanium machines that, for the most part, will be relegated to high-end servers (if the chip is not already de facto dead). It does seem clear that Microsoft has done a super job on its 32-bit emulation layer, 64-bit processors, and motherboards are sufficiently inexpensive to make them very popular. And as a result, we have no excuses not to be embracing 64-bit platforms, both for running existing 32-bit applications and for a new breed of games and memory-intensive business applications taking advantage of the larger address space.

Acknowledgments

Thanks to Eric Youngdale, without whom this article would never have existed.

June, 2005: Embracing 64-Bit Windows

Figure 1: Compiler warnings.

C4244. Conversion from 'type1' to 'type2', possible loss of data.
C4267. Conversion from 'type1' to 'type2', possible loss of data.
C4311. 'type_cast' : pointer truncation from 'type1 *' to 'type2'.
C4312. 'type_cast' : conversion from 'type1' to 'type2' of greater size.
C4313: 'printf' : '%x' in format string conflicts with argument 1 of type 'void *'

June, 2005: Embracing 64-Bit Windows

Table 1: Results of running memwalk.cpp on various platforms.

Platform Binary /LARGE ADDRESS AWARE /3 GB Free bytes Largest contiguous free block Kernel shared region begins
X86 X86 n n 0x7fa20000 0x77a30000 0x80000000
X86 X86 n y 0x7a3e0000 0x77a30000 0xC000000
X86 X86 y y 0xbfa1000 0x77a30000 0xC000000
X64 X86 n N/A 0x7e4e9000 0x776bd000 0x80000000
X64 X64 y N/A 0xfe4d9000 0x7ffc0000 0xFFFF0000
X64 X64 N/A N/A 0x7fffe90e000 0x7ff7ffc0000 0x0000080000000000
IA64 A64 N/A N/A 0x6fbfe17a000 0x6faffabe000 0x0000070000000000

June, 2005: Embracing 64-Bit Windows

Table 2: APIs that let 64-bit aware 32-bit applications undo virtualization.

API Description
GetSystemWow64Directory A 32- or 64-bit application may request the location of the 32-bit "system32" (SysWow64) directory.
IsWow64Process A 32- or 64-bit process may ask if a process is running in the Wow64 "emulation" layer.
Wow64DisableWow64FsRedirection For a 32-bit application running under Wow64 to indicate that it understands the 64-bit filesystem layout and wishes to discontinue the mappings. We found an issue with the Winsock API gethostbyname() with FS redirection disabled that has yet to be investigated. The caution here is that existing code may simply not run once you disable this.
Wow64EnableWow64FsRedirection For a 32-bit application running under Wow64 to indicate that it understands the 64-bit filesystem layout and wishes to enable the emulation.
Wow64RevertWow64FsRedirection For a 32-bit application running under Wow64 to indicate that it understands the 64-bit filesystem layout and wishes to revert the emulation to its default state.
RegOpenKeyEx Not new, but takes two new flags: KEY_WOW64_32KEY for a 64-bit application to open a registry key as it would appear to a 32-bit application, and KEY_WOW64_64KEY, for a 32-bit application to open the registry as it would appear to a 64-bit application.
RegDisableReflectionKey Disable reflection for a key and so allow a Wow64 process to see full 64-bit registry. This appears to affect HKLM/Software and HKCR.
RegEnableReflectionKey Enable reflection and so disallow a Wow64 process from seeing the full 64-bit registry.
ReqQueryReflectionKey Query the reflection state of an open registry key.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.