Channels ▼

Walter Bright

Dr. Dobb's Bloggers

Targeting OS X 64 Bit

December 27, 2011

Let's check in on the Apple documentation in the document entitled "Mac OS X ABI Mach-O File Format":

For the x86-64 environment, the r_type field may contain any of these values:
X86_64_RELOC_BRANCH    CALL/JMP instruction with 32-bit displacement.
X86_64_RELOC_GOT_LOAD   MOVQ load of a GOT entry.
X86_64_RELOC_GOT        Other GOT references.
X86_64_RELOC_SIGNED     Signed 32-bit displacement.
X86_64_RELOC_UNSIGNED   Absolute address.
X86_64_RELOC_SUBTRACTOR Must be followed by a X86_64_RELOC_UNSIGNED   relocation.

Saying it's a "little sparse" is kind, and it still leaves off some other types found in the header files:

X86_64_RELOC_SIGNED_1
X86_64_RELOC_SIGNED_2
X86_64_RELOC_SIGNED_4

But figuring this stuff out is why I get paid the big bucks (!), so tally-ho. The first step is to write little bits of code in C like:

int x;
int *px = &x;

compile them with GCC, and then dump the output with dumpobj. Trying various combinations is a good starting point. For example, I learned that a fixup within the same object file is one thing, a fixup to an address in another module is entirely different, and if the location to be patched is in code, an extra level of indirection is needed. (This is what the GOT is — a Global Offset Table — the code gets patched with an index into the GOT, which then provides the physical address.)

GCC is of limited value, though, as it only emits a rather small subset of the possible fixups. For example, it always uses symbolic offsets rather than section offsets, and does not have COMDAT sections.

The addressing modes can be tricky. The x86_64 can address with various signed offsets to the program counter for code, a 32-bit signed offset from the program counter for data, and an absolute 64-bit address.

Getting all this right simply requires trial and error, along with a few swags (Scientific Wild-Ass Guesses). The compiler is adjusted to put out a fixup for a certain case, and the program is run. If it crashes at that point, then the procedure is to try different fixup types and different offsets. (The offset portion may seem obvious, but it takes a bit of trial and error to determine if it is looking for the offset from the start of the section, or the vm offset, or the offset from the patch location to the referred location. And, is the offset counted from the beginning of the instruction, the end, or the location within the instruction?)

I finally figured out what those X86_64_RELOC_SIGNED_[124] fixups were for. They were for a 32-bit signed program counter offset from a code instruction, and the location of the offset was 1, 2, or 4 bytes back from the end of the instruction! (With ELF, one just subtracted 1, 2, or 4 from the offset. It took me a lot of hair pulling to determine why that didn't work on Mach-O.)

Once I figured this out, the rest was finding all the places in the compiler where fixups were dealt with. It isn't quite as simple as it sounds, as (for example) the GOT_LOAD fixup only worked with some instructions, like:


MOV reg,mem

If you tried it with:

CMP reg,mem

it failed horribly. So the latter had to be written as:

MOV r,mem
CMP reg,r

Even byte MOVs didn't work; they had to be redone as full-size MOVs.

The great thing, though, is that D has built up a rather large test suite over time, so if that goes through without error, it's pretty well nailed down.

Conclusion

There has been an explosion of new programming languages being created lately. But interestingly, very few generate native code — they target a bytecode virtual machine of one sort or another. The reason is straightforward. Writing a native code generator is a lot of work. Not many people seem inclined to do that, but it's one thing I enjoy doing. Even using an existing back end (like GCC or llvm) is a lot more work than targeting a virtual machine.

The OS X 64 bit target is a perfect example. Complex code, incomprehensible documentation, what's not to like? Who needs to buy puzzles at the store?

Thanks to Andrei Alexandrescu for his helpful comments on this.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video