Targeting OS X 64 Bit
Let's check in on the Apple documentation in the document entitled "Mac OS X ABI Mach-O File Format":
For the x86-64 environment, the r_type field may contain any of these values: X86_64_RELOC_BRANCH CALL/JMP instruction with 32-bit displacement. X86_64_RELOC_GOT_LOAD MOVQ load of a GOT entry. X86_64_RELOC_GOT Other GOT references. X86_64_RELOC_SIGNED Signed 32-bit displacement. X86_64_RELOC_UNSIGNED Absolute address. X86_64_RELOC_SUBTRACTOR Must be followed by a X86_64_RELOC_UNSIGNED relocation.
Saying it's a "little sparse" is kind, and it still leaves off some other types found in the header files:
X86_64_RELOC_SIGNED_1 X86_64_RELOC_SIGNED_2 X86_64_RELOC_SIGNED_4
But figuring this stuff out is why I get paid the big bucks (!), so tally-ho. The first step is to write little bits of code in C like:
int x; int *px = &x;
compile them with GCC, and then dump the output with dumpobj. Trying various combinations is a good starting point. For example, I learned that a fixup within the same object file is one thing, a fixup to an address in another module is entirely different, and if the location to be patched is in code, an extra level of indirection is needed. (This is what the GOT is — a Global Offset Table — the code gets patched with an index into the GOT, which then provides the physical address.)
GCC is of limited value, though, as it only emits a rather small subset of the possible fixups. For example, it always uses symbolic offsets rather than section offsets, and does not have COMDAT sections.
The addressing modes can be tricky. The x86_64 can address with various signed offsets to the program counter for code, a 32-bit signed offset from the program counter for data, and an absolute 64-bit address.
Getting all this right simply requires trial and error, along with a few swags (Scientific Wild-Ass Guesses). The compiler is adjusted to put out a fixup for a certain case, and the program is run. If it crashes at that point, then the procedure is to try different fixup types and different offsets. (The offset portion may seem obvious, but it takes a bit of trial and error to determine if it is looking for the offset from the start of the section, or the vm offset, or the offset from the patch location to the referred location. And, is the offset counted from the beginning of the instruction, the end, or the location within the instruction?)
I finally figured out what those X86_64_RELOC_SIGNED_[124] fixups were for. They were for a 32-bit signed program counter offset from a code instruction, and the location of the offset was 1, 2, or 4 bytes back from the end of the instruction! (With ELF, one just subtracted 1, 2, or 4 from the offset. It took me a lot of hair pulling to determine why that didn't work on Mach-O.)
Once I figured this out, the rest was finding all the places in the compiler where fixups were dealt with. It isn't quite as simple as it sounds, as (for example) the GOT_LOAD fixup only worked with some instructions, like:
MOV reg,mem
If you tried it with:
CMP reg,mem
it failed horribly. So the latter had to be written as:
MOV r,mem CMP reg,r
Even byte MOVs didn't work; they had to be redone as full-size MOVs.
The great thing, though, is that D has built up a rather large test suite over time, so if that goes through without error, it's pretty well nailed down.
Conclusion
There has been an explosion of new programming languages being created lately. But interestingly, very few generate native code — they target a bytecode virtual machine of one sort or another. The reason is straightforward. Writing a native code generator is a lot of work. Not many people seem inclined to do that, but it's one thing I enjoy doing. Even using an existing back end (like GCC or llvm) is a lot more work than targeting a virtual machine.
The OS X 64 bit target is a perfect example. Complex code, incomprehensible documentation, what's not to like? Who needs to buy puzzles at the store?
Thanks to Andrei Alexandrescu for his helpful comments on this.

