D is a systems programming language. Its focus is on combining the power and high performance of C and C++ with the programmer productivity of modern languages like Ruby and Python. Special attention is given to the needs of quality assurance, documentation, management, portability and reliability.
The D language is statically typed and compiles directly to machine code. It's multiparadigm, supporting many programming styles: imperative, object oriented, and metaprogramming. It's a member of the C syntax family, and its appearance is very similar to that of C++. For a quick comparison of the features, see this comparison of D with C, C++, C# and Java.
D is not governed by a corporate agenda or any overarching theory of programming. The needs and contributions of the D programming community form the direction it goes.
Multiple compilers exist for the D programming language, including the Digital Mars compiler (dmd), the gnu compiler (gdc), the upcoming ldc compiler, and even a .net D compiler is in the works. They are all based on the same open source front end code.
dmd has an X86 code generator. It currently targets Windows and Linux. But now that the Mac is on X86 machines, it opens the possibility for a straightforward implementation of dmd on the Mac. The only way to find out how feasible it is is to get a Mac and get to work on it.
Over Christmas I ordered a MacMini from Amazon. Compiler development doesn't require a powerful machine or lots of memory and/or disk space, so a low-end machine will do nicely. I've never used a Mac before, so I was also curious how the machine would work. It didn't take long to get it up and running. I keep spare keyboards and mice around (because they break often), and plugged them in along with an old monitor. The monitor and mouse worked perfectly, but the Mac had some trouble with the keyboard, and needed to be configured for it. The machine comes with the gnu dev tools, though they needed to be installed separately. I then figured out how to remotely connect to the Mac over the LAN, and (the Mac people will hate me for this) put the Mac in the basement and operate it remotely with a text window.
From a text window, the Mac is just another Unix machine. I put all the source code to the compiler on it and set about trying to compile it. Most of that consisted of finding all the conditional compilation:
and changing it to:
#if linux || __APPLE__
Remarkably, there were very few API differences that needed to be accounted for, a couple gcc differences, and the compiler was running. I thought I'd start by assuming that the C ABI on the Mac was the same as for Linux, and configured dmd appropriately.
The first big problem is that dmd can generate both Intel OMF and ELF object file formats, but the Mac used the Mach-O format. The best way to learn a file format is to write a dumper for it (called "dumpobj" for object files). Object file specs are usually wrong in some detail or other, and the mach-o spec is no exception. The Mac comes with an object file dumper called otool. Unfortunately, otool doesn't give a clue as to the structure of the object file, nor will it disassemble any code that is not in a __text segment. So I also got the Digital Mars disassembler, obj2asm, converted to work with mach-o files.
Fortunately, the Mac uses Dwarf for its symbolic debug information format, and dmd has a Dwarf generator, so that should be good to go. But when I first looked at the debug output of gcc, there was a "macinfo" section. Uh-oh, some undocumented Macintosh enhancement. Googling (how indispensible google has become) "macinfo" revealed my mistake -- I had forgotten that Dwarf had a special section for information on C preprocessing macros called "macinfo". I forgot about it because D doesn't have a text macro preprocessor.
Object files on the Mac are all generated as pic (Position Independent Code), necessary so that shared libraries will work. On Linux, pic is an option. Pic is done completely differently on the Mac, so this was where most of the work was so far.
Some other differences:
- Names get an '_' prepended to them, although when using gdb you have to leave off the '_'.
- There's no thread local storage mechanism in the object file format. This is a serious shortcoming, and I'll have to figure out a solution.
- There are special sections for C strings and read-only literals that the linker can compress redundancy out of.
I finally got to the point where dmd would compile "hello world" and using dumpobj to compare object files with that produced by gcc for the C "hello world", it looked like it should work. The gcc one worked fine, and the dmd generated one crashed with a segmentation fault. I was really pulling what few strands of hair I had left out over that one, as I could not find anything wrong with the fixups or object layout.
Eventually, I noticed that the gcc version was putting some unneeded space on the stack, and I suspected something was up. Put out the same extra stack, and it started working. Googling around some more, I discovered that code on the Mac that calls any library functions must align the stack to 16 bytes. Tweaking the code generator to do this, now I had "hello world" working in D on the Mac.
It Works -- Sort Of
I had "hello world" working, but I was using D as a C compiler to do it. All that existed was a main() function. There was a long way to go -- functions, data, the runtime library, the test suite, etc. First I'd try and build the runtime library.
Doing so exposed a lot of conditional compilation issues that had no case for OSX. I found that Linux has a bunch of API functions that are missing in OSX, like getline and getdelim, so some of the library functionality had to revert to more generic code for OSX. I had to be careful, because although many system macros had the same functionality and spelling, they had different expansions. Getting these wrong would cause some mysterious behavior, indeed.
After compiling the library modules, they had to be combined into a library file. Normally this is just done with ar, the standard "archive" program, but dmd has a nice feature where it can build the library directly without having to write object files out to disk. Not only is this very fast, but it makes the build process simpler by obviating the need to manage the object files. The archive file format is supposed to be a standard, but of course everyone implements it very differently. The OSX archive format spec is only about half there, and what is there is wrong on a few details. This is where building a file dumper again comes in real handy. By varying the input to ar, dumping the resulting archive, and seeing how things change, one eventually figures out the format. If you get it wrong, the OSX linker ld likes to help you out with a convenient "Bus error" message. To figure those out, it's back to puzzling out the hex dumps of the archive dmd generated against the one ar generated. I did finally figure it out; the archive format has some definite oddities like entries are aligned on 4-byte boundaries but the object file within each entry must be aligned on 8-byte boundaries, offset by 4. Weird. Anyhow, once that was working and the linker accepted the archive, it was very pleasing to see how fast dmd would chew through the code and build the library.
Unfortunately, none of the library code ran, and I mean none of it. When the compiler was originally designed, I attempted to do the right thing and abstract away the memory model and the object file format. Because I had only one example, the abstraction lines turned out to be in all the wrong places. I coined the term "premature abstraction", which, like the related term premature optimization, is coming up with all the wrong abstractions because you don't know what you're doing. (Andrei Alexandrescu suggested I write a column about that, but a quick google search showed I was not original and the term is already in use!)
For example, the data structures being written out were divided into mutable and immutable sections, the immutable stuff being destined for a read-only hardware protected memory section. But on OSX, read only data sections get marked as "TEXT" and put in with the code, where they are expected to be position independent. Position independence means no relocations are possible at program load time, meaning the read only data cannot have any pointers in it. If there was a relocation in the TEXT section, the loader didn't complain, you just wound up with garbage for a pointer. So, all the data structure generating code had to be carefully gone through and separated into "has pointers" and "no pointers" blocks, and the "has pointers" stuff gets routed to the regular mutable data segment.
Another tricky problem with position independent code on OSX is the way data variables are accessed. If they are defined within the module they are referenced in, one indirection is generated. If defined externally, two are necessary. The trouble is, the D programming language allows forward references to data, so there's a chicken-and-egg problem in deciding what to resolve first.
The last problem is D needs to generate some tables that are coalesced with other tables, such as the exception handling data. To group stuff together, it is put in a specially named segment. The problem is locating the beginning and end of that segment at run time. The solution is a trick I learned long ago in the DOS world -- to always put out that special segment as a trio of 3, always in the same order. The linker thankfully will then maintain the same order in the executable, so to find the beginning of the second segment, a global variable is put in the first and third segments, and the addresses of those globals neatly bracket the table in the second segment. This is the way I'm going to try and do thread local storage, too.
Now that dmd is generating three very different object file formats, where the abstraction lines should be is a lot clearer.
Finally, tonight, I got the runtime to start up, run, and shut down successfully. Next, I'll try to get the test suite to run.
The short version -- it's done and out!
I had thought retargeting the D programming language to the Mac was just an object file format change -- a week, maybe two. It turned out to be six weeks. I feel like Yosemite Sam walking through a mine field, managing to step on every single mine in it.
So let's pick up where I left off at the end of my last entry, where I had just gotten the library to compile and a couple sample pieces of code to link to it and run. Now it was time to run the test suite.
It's essentially impossible to develop a compiler without some sort of test suite. The one I use is an ad-hoc collection of every fixed bug and other assortments of things. The beauty of putting in every fixed bug is that the bugs stay fixed, and the quality of the compiler steadily ratchets forward. Over time, this makes for a formidable test suite. If it passes the suite, I know it's at least as good as the previous version, and if it shows to not be, that failing gets added to the suite for next time. (I've discovered that just throwing volumes of code at the compiler is fairly useless as a test suite. The test code has to be crafted to test specific things and verify correct results.)
The first thing that failed was exception handling. But surprisingly, it only took me about an hour to get that to work. Exception handling is complicated and hard to understand, and I expected a tough slog. The EH design in the back end dates back to when it was a 16-bit compiler (Digital Mars C++ was the only C++ compiler to ever implement exception handling on DOS). The support code moved to 32 bit DOS extenders, then Linux, and now OSX with very little change. The OSX support needed to tweak the assembler bits to keep the stack 16-byte aligned. (Exception handling for Win32 is completely different, using Microsoft's Structured Exception Handling scheme.)
The dmd exception handling system is completely independent of g++'s. The two do not interact with each other. D is binary compatible with C++ name mangling and single inheritance, and g++ on OSX uses the same protocol for this as on Linux which dmd is already compatible with, so that was easy.
My next problem was the floating point failed miserably. Some investigation showed that, uniquely, OSX aligns the CPU's 10-byte reals on 16-byte boundaries. This is 6 bytes of pad for each. I don't know the reason for this; Linux uses 12 and Win32 uses 10. It's just that if you have large arrays of reals, it's going to chew up 60% more space. Oh well, it was easy to account for in the code generation.
The worst problem I had was my own fault. The front end is only few years old. But the back end code generator is about 25 years old (it may be the oldest code generator still in professional use!). Although it is well debugged (sporting a thorough test suite), fast, and generates great code, it uses a lot of global variables to communicate. Problem after problem was traced back to the use of global variables. Over time I'd eliminated a lot of them, but there's a lot left. It's hard to change how a function works if there's a back channel of globals passing state around. Globals break encapsulation, making code difficult to understand.
Of course I wouldn't write it that way today, hopefully I've learned something in the last 25 years. You might ask "why not just rewrite it" and certainly that thought comes to mind. The problem is a code generator is about a zillion special cases, most of which interact with each other. Getting that all adjusted, tweaked and working right across the broad spectrum of code that it must generate is years of work. It's not something you throw away and rewrite lightly. But there is some hope. The Mach-O generating part is a lot nicer than the Elf generating part, which itself is much nicer than the very old OMF generator. And if I'm doing open heart surgery on a particular section, I'll refactor it and rely on the test suite to make sure the patient recovers.
Thanks to Sean Kelly for his invaluable help with the more complex OSX system library work. Thanks to Jason House, Andrei Alexandrescu, Bartosz Milewski, Sean Kelly, and Cristian Vlasceanu for reviewing this.