Two Different Kinds of Optimization
My last three notes (, , ) have been about using dirty tricks for optimization — tricks of a kind that most programmers today would never consider. Indeed, there is much to be said in favor of writing programs that are easy to understand, and therefore easy to prove correct. Rules of thumb such as "Make it work before you make it faster" (Kernighan & Plauger) and "Premature optimization is the root of all evil" (Knuth) are well worth taking seriously.
However, I think that experience with SPITBOL suggests that there are at least two fundamentally different kinds of optimization, and that the advice to defer optimization applies only to one of those kinds.
The kind of optimization to defer involves local choices, and carries disadvantages — typically extra complexity — along with it. As an example, I've seen programmers multiply variables by 0.1 instead of dividing them by 10, or shift numbers to the left instead of multiplying them by a power of two. There is no reason to believe that such optimization has any real effect, and — because 0.1 is not exactly equal to 1/10 on a binary floating-point system — may even cause the program to produce incorrect results.
However, there is another kind of optimization that must be done early, because it affects so many parts of the program that doing it later means a major rewrite. I call such optimization architectural optimization. Typically, it must be done as part of a system's design, although sometimes one can encapsulate it in order to address it later.
As an example of architectural optimization, consider the early versions of the UNIX filesystem. In the interest of clarity and reliability, it stored all the files in a directory as a linear array, in no particular order. Adding a file to a directory had two steps:
- Search the directory to see whether that name was already there, noting the first empty slot, if any, along the way.
- If the name is already there, we're done. Otherwise, put an entry for that name in the first empty slot, if any, or append it to the end.
Obviously, adding a new name to a directory took time proportional to the number of entries already in the directory — including empty ones. As a result, adding n names to an initially empty directory took time proportional to n2.
Now let's turn our attention to an email handler designed to run on such a system. Users generate email messages, which get queued for delivery over a network. The queuing is necessary because the network might not be available at the moment. How do we organize the queue?
The obvious answer is to put each message in a separate file in a directory allocated for that purpose. However, this approach has a problem: If there are too many messages in the directory, the time required to add new messages to the directory eventually becomes excessive. In particular, if the network is down for an extended period, many messages pile up in the directory, thus increasing the time needed for every message. When the network comes back up, those messages are handled more slowly than they would be otherwise, because the directory that contains them is now so large as to slow down all operations on it. If the directory becomes too large, it might become impossible to empty it as fast as new messages come in.
This behavior is an example of architectural optimization on two levels. First, the operating-system designers could have used a different data structure for directories, one that took order n log(n) time rather than order n2 time to put n names in a directory. It is true that such a data structure would have been much more difficult to design and test than the one that was actually used — especially in light of the desire to make the data structure robust in the face of abrupt system shutdowns. Nevertheless, it is at least possible to imagine that early versions of UNIX might have had a directory structure that is intrinsically more efficient. A decision to implement such a structure would be an architectural optimization.
Suppose now that you're setting out to implement an email system, and you know that the directory implementation requires order n2 time to handle n files. A necessary architectural optimization on such a system is to figure out a way to avoid taking order n2 time for n messages. If you don't want to figure out immediately how to do it, at least you must realize that you will have to confront the problem eventually — assuming that your system becomes popular enough for n to become large — and therefore design the system in such a way as to encapsulate the code that relies on the directory structure.
If you do not think about this performance problem at the time you design your system, you may find yourself having to rewrite large parts of it when the performance turns out to be unacceptable. Surely it is better to anticipate that performance problems might be possible, and to design the system from the beginning in a way that makes them easy to solve when they happen.
The dirty tricks in the SPITBOL compiler were another example of architectural optimization. It is hard to imagine that any of these tricky programming techniques could have been added to the compiler after it was already working — doing so would require a total rewrite. Moreover, they made a substantial difference in the compiler's overall performance. Therefore, despite their low-level, tricky nature, these optimizations had to be considered as part of the initial design. I claim that such optimizations are fundamentally different from those about which Kernighan, Knuth, and others warn us.