Abstraction and Performance Bugs
Last week, I showed an example of a Haskell program that solves a problem in a simple way. Perhaps surprisingly, this program would not terminate at all were it not that the compiler could be counted on to evaluate only those parts of an expression that are necessary to determine the expression's value.
This example is an extreme case of a more general principle: The more an optimization gains, the more important it is for the programmer to be able to verify that the optimization is working correctly. In other words, if other aspects of the situation are equal, the more important an optimization is, and the less transparent it needs to be. If an optimization is too transparent, how can a programmer verify that it is working?
For example, I presented three different versions of a
length function in this article. The first version was iterative, which meant that it would always compute the length of a list in O(1) space. Unfortunately, it used mutable variables, a feature that we were trying to avoid. The second version was straightforwardly recursive, which meant that it would make a recursive call for each element of the list passed as its argument. In other words, finding the length of an n-element list would take O(n) space. To get around the problems of both of these versions, the article presented a third version in which all of the recursive calls were tail recursive. The compiler is assumed to be able to change tail recursive function calls into iteration. In order to be able to use this optimization with confidence, we need to be able to verify that the compiler actually does change tail recursion to iteration.
In other words: We started with a straightforward, concrete program. We rewrote it in a more abstract style, then rewrote it further to allow the compiler to remove the inefficiency of the abstract version. The end product was a program that not only was in a needlessly complicated form, but that also now relied on the compiler's ability to optimize in order to recover the runtime speed of the first, iterative version. If the compiler did not correctly implement that ability, the program's performance would suffer.
Of course, many useful abstractions do not need special pleading in order to avoid unexpected performance problems. Nevertheless, it is not uncommon for such abstractions to conceal performance bugs. Some of these bugs become apparent only in the case of unusually large programs.
One common example of such performance bugs is memory leaks. Many programming languages (Haskell being one) have garbage collection that is intended to prevent memory leaks. Nevertheless, I once encountered a compiler that had severe performance problems due to a memory leak even though it was written in a language that fully supported garbage collection.
The high-level part of that compiler was structured along the following lines:
token_sequence = read_input(); parse_tree = parser(token_sequence); internal_code = virtual_code_generator(parse_tree); optimized_code = optimizer(internal_code);
and so on. What the compiler authors did not realize was that the compiler assumed that a variable's value might be required at any point up to the end of the block in which that variable was defined. As a result, in the compiler itself, the variable
token_sequence occupied memory that was not marked as garbage until the end of the surrounding block. More generally, by the time the
optimizer function got around to running, the compiler was already using three times as much memory as it needed. The solution was to change the compiler so that variables were considered garbage after the last time those variables were used, rather than at the end of the block. That simple change reduced the compiler's memory requirements dramatically.
Here's another way to look at it: When we express a computation in an abstract way, that expression is far away from the underlying hardware. That's why we consider the expression abstract in the first place. Therefore, every abstraction comes along with some kind of connection to the hardware, and that connection has characteristics that include performance.
These performance characteristics come in at least three different flavors: what we expect, what is specified or documented, and what actually happens. If we expect better performance than really happens, it might be because our expectations are unrealistic, or it might be that the abstraction is not implemented correctly. Programmers should take responsibility for the first of these possibilities; compiler and library implementors should be responsible for the second. All too often, both programmers and implementors skip out on these responsibilities.
Next week, I'll discuss why performance bugs can be particularly hard both to avoid and to isolate.