Now that I mention it, error messages are a big factor in the quality of implementation of the language. It's what the user sees, after all. If you're tempted to put out error messages like "bad syntax," perhaps you should consider taking up a career as a chartered accountant instead of writing a language. Good error messages are surprisingly hard to write, and often, you won't discover how bad the error messages are until you work the tech support emails.
The philosophies of error message handling are:
- Print the first message and quit. This is, of course, the simplest approach, and it works surprisingly well. Most compilers' follow-on messages are so bad that the practical programmer ignores all but the first one anyway. The holy grail is to find all the actual errors in one compile pass, leading to:
- Guess what the programmer intended, repair the syntax trees, and continue. This is an ever-popular approach. I've tried it indefatigably for decades, and it's just been a miserable failure. The compiler seems to always guess wrong, and subsequent messages with the "fixed" syntax trees are just ludicrously wrong.
- The poisoning approach. This is much like how floating-point NaNs are handled. Any operation with a NaN operand silently results in a NaN. Applying this to error recovery, and any constructs that have a leaf for which an error occurred, is itself considered erroneous (but no additional error messages are emitted for it). Hence, the compiler is able to detect multiple errors as long as the errors are in sections of code with no dependency between them. This is the approach we've been using in the D compiler, and are very pleased with the results.
What else does the user care about in the hidden part of the compiler? Speed. I hear it over and over compiler speed matters a lot. In fact, compile speed is often the first thing I hear when I ask a company what tipped the balance for choosing D. The reality is, most compilers are pigs. To blow people away with your language, show them that it compiles as fast as hitting the return key on the compile command.
Wanna know the secret of making your compiler fast? Use a profiler.
Sounds too easy, right? Trite, even. But raise your hands if you routinely use a profiler. Be honest, everyone says they do, but that profiler manual remains in its pristine shrink wrap. I'm just astonished at the number of programmers who never use profilers. But it's great for me as a competitive advantage that never ceases to pay dividends.
Some other tools you simply must be using:
- valgrind. I suspect valgrind has almost single-handedly saved C and C++ from oblivion. I can't heap enough praise on this tool. It has saved my error-prone sorry self untold numbers of frustrating hours.
- git and github. Not many tools are transformative, but these are. Not only do they provide an automated backup, but they enable collaborative work on the project by people all over the world. They also provide a complete history of where and from whom every line of code came from, in case there's a legal issue.
- Automated testing framework. Compilers are enormously complicated beasts. Without constant testing of revisions, the project will reach a point where it cannot advance, as more bugs than improvements will be added. Add to this a coverage analyzer, which will show if the test suite is exercising all the code or not.
- Automated documentation generator. The D project participants, of course, built our own (Ddoc), and it, too, was transformative. Before Ddoc, the documentation had only a random correlation with the code, and too often, they had nothing to do with each other. After Ddoc, the two were brought in sync.
- Bugzilla. This is an automated bug tracking tool. Bugzilla represented a great leap forward from my pathetic older scheme of emails and folders, a system that simply cannot scale. Programmers are far less tolerant of buggy compilers than they used to be; this has to be addressed aggressively head on.
One semantic technique that is obvious in hindsight (but took Andrei Alexandrescu to point out to me) is called "lowering." It consists of, internally, rewriting more complex semantic constructs in terms of simpler ones. For example,
while loops and
foreach loops can be rewritten in terms of
for loops. Then, the rest of the code only has to deal with
for loops. This turned out to uncover a couple of latent bugs in how
while loops were implemented in D, and so was a nice win. It's also used to rewrite
scope guard statements in terms of
try-finally statements, etc. Every case where this can be found in the semantic processing will be win for the implementation.
If it turns out that there are some special-case rules in the language that prevent this "lowering" rewriting, it might be a good idea to go back and revisit the language design.
Any time you can find commonality in the handling of semantic constructs, it's an opportunity to reduce implementation effort and bugs.
Rarely mentioned, but critical, is the need to write a runtime library. This is a major project. It will serve as a demonstration of how the language features work, so it had better be good. Some critical things to get right include:
- I/O performance. Most programs spend a lot of time in I/O. Slow I/O will make the whole language look bad. The benchmark is C stdio. If the language has elegant, lovely I/O APIs, but runs at only half the speed of C I/O, then it just isn't going to be attractive.
- Memory allocation. A high percentage of time in most programs is spent doing mundane memory allocation. Get this wrong at your peril.
- Transcendental functions. OK, I lied. Nobody cares about the accuracy of transcendental functions, they only care about their speed. My proof comes from trying to port the D runtime library to different platforms, and discovering that the underlying C transcendental functions often fail the accuracy tests in the D library test suite. C library functions also often do a poor job handling the arcana of the IEEE floating-point bestiary NaNs, infinities, subnormals, negative 0, etc. In D, we compensated by implementing the transcendental functions ourselves. Transcendental floating-point code is pretty tricky and arcane to write, so I'd recommend finding an existing library you can license and adapting that.
A common trap people fall into with standard libraries is filling them up with trivia. Trivia is sand clogging the gears and just dead weight that has to be carried around forever. My general rule is if the explanation for what the function does is more lines than the implementation code, then the function is likely trivia and should be booted out.
After The Prototype
You've done it, you've got a great prototype of a new language. Now what? Next comes the hardest part. This is where most new languages fail. You'll be doing what every nascent rock band does play shopping malls, high school dances, dive bars, and so on, slowly building up an audience. For languages, this means preparing presentations, articles, tutorials, and books on the language. Then, going to programmer meetings, conferences, companies, anywhere they'll have you, and showing it off. You'll get used to public speaking, and even find you enjoy it. (I enjoy it a lot.)
There's one huge thing working in your favor: With the global reach of the Internet, there's an instantly reachable global audience. Another favorable fact is that programmer meetings, conferences, etc., all are looking for great content. They love talks about new languages and new programming ideas. My experience with such audiences is that they are friendly and will give you lots of constructive feedback.
Of course, then you'll almost certainly be forced to reevaluate some cherished features of the language and reengineer them.
But hey, you went into this with your eyes open!
Thanks to Andrei Alexandrescu for his advice on this article.
Walter Bright is the designer of the D language. He regularly blogs for Dr. Dobb's.