Targeting OS X 64 Bit
A while back, I targeted the D programming language compiler to generate 32-bit code for OS X, and 64-bit code for Linux. For the last several years, all Mac OS X machines include 64-bit CPUs, so the obvious next step is to target 64-bit code for OS X.
Having a debugged and working 32-bit port to OS X, and a debugged and working 64-bit code generator, this should be straightforward. (Hah!)
The object file format for OS X is the Mach-O, which is unique to OS X (the Linux universe uses the ELF format). The first step is to convince my dumpobj utility to recognize and dump the Mach-O 64 format.
Yes, I know there are existing off-the-shelf object file dumpers, but by writing my own I learn how the file format really works. This was a quick and straightforward job, as the Apple documentation on it is good. The next job was to retarget the obj2asm disassembler. Obj2asm already was doing 64-bit instructions, so it just had to learn the Mach-O 64 format. Again, this was simple, and soon I had the tools to examine the output of GCC.
The D compiler can generate library (.a) files directly, so the next job was to figure out that format and adjust the compiler as needed. This turned out to be trivial, as the .a format was the same for 32-bit object files; it just had to deal with Mach-O 64 files. With the knowledge I gained from dumpobj and obj2asm, this was quick work.
Tackling the compiler output involves:
- Adjusting the object file generator to output Mach-O 64 format. This was easy, now that I'd learned the format by adjusting dumpobj.
- Conforming to the 64-bit ABI. Fortunately, OS X follows the same C ABI as 64-bit Linux does. This meant that all the agony I went through figuring out how to compile variadic functions worked out of the box for OS X. I didn't have to change a thing. Phew!
- Fixups. When a reference is made in a source file to a symbol, such as
printf, the compiler doesn't know what address to use for the symbol. Instead, it outputs a "fixup record" that consists of a symbol, and a location in the object file that must be "fixed up" with the real address of that symbol when it becomes known. These become known by the linker when it combines object files and resolves symbols like
printfand later the loader to adjust those addresses to where the program actually winds up in memory.
Fixup records and schemes are different for every system. They're usually defined by the object file format, which is called Mach-O for OS X.
Fixups used to be simple. Back in the early MS DOS days, there weren't any fixups at all for COM programs. Just copy the bytes into memory and jump to it. (Any relocation was done by the hardware through the segment registers.) Those glory days didn't last long. Now we've got multiple addressing modes, multiple sections, shared libraries, position-independent code, offsets known and unknown by the compiler, etc.
Just for starters, there's:
- data referencing other data
- data referencing code
- code referencing code
- code referencing data
Throw in the other cases and there's quite a smorgasbord of quirky and obscure detail.