An Extreme Example of Architectural Optimization
My previous post gave an example of a local optimization that turned out to be useful because of its context: A slightly tricky rewrite of an email system's initial processing caused the whole system to feel faster to its users, even though it saved only a couple of seconds.
Today I'd like to look at an example from the other end of the spectrum: Choosing the right data structure turned out to make more of a difference in execution time than removing a horrendous inefficiency in a runtime library.
The actual application doesn't matter. What matters is that part of it maintained a symbol table: It read strings from an input file and arranged that each occurrence of a particular string would lead eventually to the same object in memory.
The original version of this program was in C. It used a singly linked list for its symbol table: Each time it saw a new string, it would search the list for it. If the string was not found, the program would allocate a new node and put that node at the front of the list. As a result, the program's execution time typically grew as the square of its input size.
I chose to rewrite this program to demonstrate a new C++ library I had just written. To my knowledge, this was the very first version of what eventually became the map
template in the C++ standard library. Like today's map
, it used a balanced tree structure internally, which meant that the amount of time needed to process n
elements would typically be proportional to n
log(n
) rather than n
2.
The C++ library we had available at the time had terrible I/O performance. We learned the reason only later: The library would call the operating system for every character of the input file. If I remember right, it was able to read only a few thousand (not million!) characters per second.
Nevertheless, on large inputs, the C++ version of this program ran faster than the C version. The extra time that the C version required for its linked-list symbol-table lookups eventually exceeded the time required to read the input file, even with a system call per character.
This experience shows why architectural optimizations can make such a difference: If n
is large enough, replacing an n
2 algorithm with an n
log(n
algorithm will make more of a difference than making the entire program k
times faster (for constant k
). Moreover, that advantage is greatest when it is needed the most — on large inputs.