When writing software for games and data visualization, it is tempting to optimize code on the fly. That is, as you code, you make small design and implementation tweaks to the code — change a data type here, fix a data structure there, execute function B before function A, and so on. The temptation to make these small modifications in the name of performance should be resisted — its goal is at odds with the primary mission of most developers: writing and delivering reliable functionality. Later, when the code is working, true systemic optimization and parallelization can be profitably undertaken.
This article discusses how these two goals — coding and optimizing on the fly — are opposed and how performance experts approach performance improvement. It explains how they systematically prepare their code for optimization and how the optimization process is done so that their efforts are effective and don't interfere with working code.
Functionality vs. Performance
A safe rule in software development is that if an eminent computer scientist recommends not doing something, it's worth thinking about whether you too should avoid it. When two eminent computer scientists warn against the same practice, then the entire burden of proof falls on the developer who disregards the advice. Perhaps the only adage enunciated in identical words by two prominent computer scientists is: "Premature optimization is the root of all evil." This maxim, first stated by Tony Hoare, the inventor of Quicksort, became widely known when it was later quoted by Donald Knuth. (For this reason, the quotation is often attributed to Knuth.)
In Hoare's and Knuth's heyday, the temptation to cut corners for the sake of speed was nearly overwhelming because all hardware was slow, often very slow. Using shortcuts that could save processor cycles or reduce I/O was sometimes the only way to guarantee that jobs could finish within the allotted time.
Coming forward 40 years, we find that today's systems have extraordinary horsepower and that the need for performance is no longer related to completing jobs on time. Rather, as we find in the gaming industry, performance is an imperative for competitive reasons. Games that have better performance and that can leverage the hardware well fare better in the market than games that don't. And given the intensely competitive games market, this advantage can spell the difference between success and failure. That's a lot of pressure to optimize!
Unfortunately, for developers and managers who deal with this pressure, it can lead to cutting corners in the same way that Knuth and Hoare railed against — writing bad code just to go faster. Every time software is optimized or parallelized before a feature is correctly implemented, costs accumulate quickly. What are these costs?
- It impairs code readability. Optimization necessarily transforms algorithms and their implementation. Consequently, getting implementations right, especially in a parallel context in which coding is already difficult, makes the task more complex. Moreover, debugging is more difficult as it can be difficult to determine what non-intuitive code is doing.
- It's harder to maintain the code. Programmers who come after the original developer will find optimized code much more difficult to read. Even simple changes can become mysterious code snippets that subsequent developers are loath to touch. Such code is often marooned forever. Even if the code is actually slowing the performance, developers unsure of what it's doing won't touch it for fear of breaking a feature. Consequently, optimization or parallelization, when done unwisely, makes code unmaintainable.
- It generally does not improve performance. When performance engineers set to work on tuning code, the first thing they do is profile the code. They don't use their long experience in optimization to guess at what code to parallelize or to tune. So, neither should developers. Careful measurement is the only sure way to identify what needs to be optimized and how. Parallelize the wrong code or do it inexpertly and performance can suffer.
So then, how should code be parallelized and optimized?
Make It Right. Then, Make It Parallel
The first and most important step is to get the code working correctly. This means getting the algorithm right and doing all the standard testing on it to make sure it works as requested by the user and in accordance with the specs. In most cases, this requirement means writing serial code and assuring its correctness as serial code.
Because in your later optimization and parallelization, you will need to validate that you have not changed the results of the code, you will want to write plenty of unit tests and even functional tests. Running these on the parallelized code will give you confidence that your optimization has not accidentally unhinged your earlier validated implementation.
Once you're sure the code is working correctly and you've got the unit tests ready to go, you're still not quite ready. The next thing to do is make sure that the code contains no bugs, such as memory leaks. If leaks are hard to spot in regular code, they're even more difficult to find in threaded code. Debugging threaded code is much simpler than it was even a few years ago, but despite that progress, it remains considerably harder than cleaning up single-threaded code. So, this is the point at which to check for memory leaks, excess memory consumption, and the like.
In summary, you want the code to be as clean as possible before you begin surgery. One last check that you'll find well worth your time is to use Intel's Parallel Inspector to make sure your code does not contain bugs or poor designs that can cause problems in a parallel implementation (Figure 1). The tool, which is a plugin to Microsoft Visual Studio and works on C/C++ code, is a handy way to make sure you're ready to parallelize your code. Note that while Parallel Inspector is available as a standalone tool, it also comes bundled in Parallel Studio, along with Amplifier (parallel performance analysis) and Composer (parallel compiler and debugger). I'll come back to this product shortly.