Performance Bugs Can Be Hard to Detect
Last week, I said that abstract programming styles could make complicated programs easier to solve, but they could also yield programs with performance bugs. Such bugs can be hard to avoid because the more abstract a program is, the harder it is to see the details of what happens in each individual step of that program. For example, consider an assignment statement:
a = b;
How long does this statement take to execute? How much memory does it consume? In some programming languages, these questions are trivial to answer; in others, the answer depends on the types of
b. In C, for example, a statement such as this one never consumes any memory, because the memory was already allocated before the statement began execution. Moreover, the assignment statement's execution time is bounded by the sizes of
b, perhaps augmented by the extra time needed to convert
b to the type of
struct types, it is generally possible to put an absolute upper bound on the statement's execution time on any particular machine.
In C++, estimating this statement's execution time requires looking at the types of
b. If either of them has class type, then the assignment might really be a function call, and of course that function might do anything at all. As a result, the only sensible answer to "How long will this statement take to execute?" is "It depends." For example, if
b are strings, then the assignment's execution time must at least be proportional to the number of characters in
b, because the program must copy all of those characters. In addition to copying the characters of
b, the program must free the memory previously occupied by
a, and must then allocate as much memory as is needed to hold the copy of
b. As a result, when
strings, both the time and space requirements of the assignment depend on the variables' previous values. In short, the
string type's useful abstraction carries the disadvantage of making it harder to predict the performance of programs that use
Performance becomes even harder to estimate when programs have several layers of abstractions piled on each other. For example, I noted in passing that if
strings, our assignment statement allocates enough memory to hold a copy of
b. How much time does this allocation take? In an ideal world, it would be possible to put an upper bound on this time, but in practice, the execution time probably depends on how many blocks of memory had been allocated and then freed in the past. The typical memory allocator begins by looking at such blocks to determine whether one is available that can hold the required characters, and this search takes time. Moreover, on most computers, merely looking through memory may take additional time as well, because there is the possibility that some of that memory might not have been used in a while and might therefore have migrated into a paging file.
These two pieces of hidden overhead — the time required for memory allocation and the time it takes to read from the paging file — can be major contributors to the execution time of even a simple program. In addition, other sources of overhead stem from the hardware designer's optimizations. For example, most modern processors have an on-chip cache that speeds up references to random-access memory. This cache is important because it is common for the processor to be much faster than the memory. As a result, each individual operation in a program may become dramatically slower once the program consumes enough memory that it cannot all fit in the cache at once.
At the other end of the paging operation, the amount of time it takes to fetch data from the paging file may depend on how full the disk drive is that contains that file. The read/write heads that transfer data to and from the disk are designed to write magnetic regions with a particular physical size. Because the disk rotates at a constant speed, those magnetic regions flow to and from the disk at a rate that depends on where on the disk the data happen to be located. If the disk is mostly empty, the hardware will typically use only the outer edges of the disk. The fixed rotational speed translates into a greater linear speed, which, in turn, means that data will flow to and from the disk more quickly. As the disk fills up, the inner part of the disk must be used in order to accommodate the extra data, a requirement that means that the data must be transferred more slowly. Moreover, as the tracks get closer to the center of the disk, each track consumes less physical space, which means that there is room for less data on the track, which means that the disk arm must move more frequently to cover more tracks. As a result, the access time increases along with the transfer time. Disks really do slow down as they fill up.
In short, a simple assignment in a C program on a computer without virtual memory takes a bounded amount of time. Unless the variables being assigned have
struct types, that bound depends only on the computer architecture. Adding
struct types makes it possible for the time to depend on the size of the
struct; but even so, the bound is knowable during compilation.
The added abstraction of C++ allows hidden overhead to enter in at least four ways:
- The data being assigned may have variable size, so that variable time may be needed to do the copy.
- Memory allocation and deallocation may take an amount of time that depends on how much memory has already been allocated for other purposes.
- If there is not enough physical memory, the statement may require data to be transferred to or from a paging disk.
- The transfer time may depend on how full the disk is.
This potential overhead comes from the same source as the ability to use
string variables. As a result, the overhead is hard to avoid. Instead, we must think about how to manage it. I’ll have more to say about this subject next week.