Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

The Serial On-Ramp to the Multicore Highway: Preparing to Parallelize Code


When writing software for games and data visualization, it is tempting to optimize code on the fly. That is, as you code, you make small design and implementation tweaks to the code — change a data type here, fix a data structure there, execute function B before function A, and so on. The temptation to make these small modifications in the name of performance should be resisted — its goal is at odds with the primary mission of most developers: writing and delivering reliable functionality. Later, when the code is working, true systemic optimization and parallelization can be profitably undertaken.

This article discusses how these two goals — coding and optimizing on the fly — are opposed and how performance experts approach performance improvement. It explains how they systematically prepare their code for optimization and how the optimization process is done so that their efforts are effective and don't interfere with working code.

Functionality vs. Performance

A safe rule in software development is that if an eminent computer scientist recommends not doing something, it's worth thinking about whether you too should avoid it. When two eminent computer scientists warn against the same practice, then the entire burden of proof falls on the developer who disregards the advice. Perhaps the only adage enunciated in identical words by two prominent computer scientists is: "Premature optimization is the root of all evil." This maxim, first stated by Tony Hoare, the inventor of Quicksort, became widely known when it was later quoted by Donald Knuth. (For this reason, the quotation is often attributed to Knuth.)

In Hoare's and Knuth's heyday, the temptation to cut corners for the sake of speed was nearly overwhelming because all hardware was slow, often very slow. Using shortcuts that could save processor cycles or reduce I/O was sometimes the only way to guarantee that jobs could finish within the allotted time.

Coming forward 40 years, we find that today's systems have extraordinary horsepower and that the need for performance is no longer related to completing jobs on time. Rather, as we find in the gaming industry, performance is an imperative for competitive reasons. Games that have better performance and that can leverage the hardware well fare better in the market than games that don't. And given the intensely competitive games market, this advantage can spell the difference between success and failure. That's a lot of pressure to optimize!

Unfortunately, for developers and managers who deal with this pressure, it can lead to cutting corners in the same way that Knuth and Hoare railed against — writing bad code just to go faster. Every time software is optimized or parallelized before a feature is correctly implemented, costs accumulate quickly. What are these costs?

  1. It impairs code readability. Optimization necessarily transforms algorithms and their implementation. Consequently, getting implementations right, especially in a parallel context in which coding is already difficult, makes the task more complex. Moreover, debugging is more difficult as it can be difficult to determine what non-intuitive code is doing.
  2. It's harder to maintain the code. Programmers who come after the original developer will find optimized code much more difficult to read. Even simple changes can become mysterious code snippets that subsequent developers are loath to touch. Such code is often marooned forever. Even if the code is actually slowing the performance, developers unsure of what it's doing won't touch it for fear of breaking a feature. Consequently, optimization or parallelization, when done unwisely, makes code unmaintainable.
  3. It generally does not improve performance. When performance engineers set to work on tuning code, the first thing they do is profile the code. They don't use their long experience in optimization to guess at what code to parallelize or to tune. So, neither should developers. Careful measurement is the only sure way to identify what needs to be optimized and how. Parallelize the wrong code or do it inexpertly and performance can suffer.

So then, how should code be parallelized and optimized?

Make It Right. Then, Make It Parallel

The first and most important step is to get the code working correctly. This means getting the algorithm right and doing all the standard testing on it to make sure it works as requested by the user and in accordance with the specs. In most cases, this requirement means writing serial code and assuring its correctness as serial code.

Because in your later optimization and parallelization, you will need to validate that you have not changed the results of the code, you will want to write plenty of unit tests and even functional tests. Running these on the parallelized code will give you confidence that your optimization has not accidentally unhinged your earlier validated implementation.

Once you're sure the code is working correctly and you've got the unit tests ready to go, you're still not quite ready. The next thing to do is make sure that the code contains no bugs, such as memory leaks. If leaks are hard to spot in regular code, they're even more difficult to find in threaded code. Debugging threaded code is much simpler than it was even a few years ago, but despite that progress, it remains considerably harder than cleaning up single-threaded code. So, this is the point at which to check for memory leaks, excess memory consumption, and the like.

In summary, you want the code to be as clean as possible before you begin surgery. One last check that you'll find well worth your time is to use Intel's Parallel Inspector to make sure your code does not contain bugs or poor designs that can cause problems in a parallel implementation (Figure 1). The tool, which is a plugin to Microsoft Visual Studio and works on C/C++ code, is a handy way to make sure you're ready to parallelize your code. Note that while Parallel Inspector is available as a standalone tool, it also comes bundled in Parallel Studio, along with Amplifier (parallel performance analysis) and Composer (parallel compiler and debugger). I'll come back to this product shortly.

[Click image to view at full size]
Figure 1: Panels in Intel Parallel Inspector, showing a lock problem (top) with detail (bottom).

Don't touch that scalpel! The great ones speak.

Make it right before you make it fast. Make it clear before you make it faster. Keep it right when you make it faster. — Kernighan and Plauger, Elements of Programming Style.

Premature optimization is the root of all evil. — Donald Knuth, quoting C. A. R. Hoare

The key to performance is elegance, not battalions of special cases. The terrible temptation to tweak should be resisted. — Jon Bentley and Doug McIlroy

The rules boil down to: "1. Don't optimize early. 2. Don't optimize until you know that it's needed. 3. Even then, don't optimize until you know what's needed, and where." — Herb Sutter


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.