Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Optimizing a Program Means Making It Run Faster, Right?

April 10, 2013

Last week and the week before that, I wrote about optimization. Looking back on those writings, I realize that I never said just what I mean by optimization. Thinking about it more carefully, I realize that there isn't a single universal definition.

More Insights

White Papers

More >>


More >>


More >>

People often use optimization to mean changing a program in ways that they think will make it run faster, but that casual definition is far from complete. For example, is a change to a program an optimization if you don't know whether it helps the program's performance?

This question may seem silly: Why bother optimizing if you don't know whether your optimization makes any difference? But can you truly say that you've never written code the way you did because you thought it would probably be faster — but didn't bother to prove that it actually was faster? For example, at one time or another, most C++ programmers have thought about which of the following styles is most appropriate for a function:

               void do_work(Thing t) { /* … */ }
               void do_work(const Thing& t) { /* … */ }

The first of these examples copies its argument to its parameter each time you call it; the second uses a reference to avoid copying the argument. Can you truly say that you have never thought about which of these forms runs faster?

In practice, which form is faster depends on the context, and probably does so more than most programmers realize. For example, in the second of these definitions, t is a reference — which means that each time the program uses t, the compiler has to look at the object to which t refers. If t were copied once, as in the first example, that object would probably be on the computer's stack; in the second example, each use of t might be a memory reference. The performance difference, if any, between these two states of affairs is heavily implementation-dependent.

Although it might not be obvious how copying an argument affects a program's performance, copying an object or not can affect a program's behavior in ways that are much more important than mere performance. To see this effect, consider this function:

     void scale(vector<double>& v, const double& x)
           for (auto p = v.begin(); p != v.end(); ++p)
                *p /= x;

This function divides every element of a vector by a given scalar. The program's author defines the scalar parameter x as a reference in order to avoid copying it.

Suppose that you have a vector v, and you want to scale it so that element number n becomes 1 and all the other elements are changed proportionally. You might consider writing something such as

               scale(v, v[n]);

This call doesn't do what you probably think it does. See if you can figure out why before reading on.

The problem is that the parameter that corresponds to v[n] is a reference. Therefore, calling scale has the same effect as executing

     for (auto p = v.begin(); p != v.end(); ++p)
           *p /= v[n];

When this code reaches v[n], it will have the effect of executing

     v[n] /= v[n];

thereby setting v[n] to 1. All of v's elements after element n1, leaving their values unchanged. If the scale function had defined x as double rather than as const double&, this problem would not have occurred.

We have just seen two hazards of optimization. When you change a program in ways that you believe will improve an aspect of its performance:

  • you might change its behavior without realizing it; and
  • you might know for sure whether your change was actually an improvement.

We will look at more surprising aspects of optimization next week.

Related Reading

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.



Most computers these days use IEEE floating-point, which defines the basic operations down to the bit level. Regardless of the mathematical concepts that those operations are intended to represent, replacing one operation by another that sometimes yields a different result is problematic. For one thing, it turns regression testing from an objective to a subjective activity.


Ummmm... Mathematically, y/x and y*(1/x) are indeed congruent. When executed using limited bit precision, you are correct that the contents of the 8 bytes from the calculation y/x are not guaranteed to be equal to the 8 bytes resulting from the calculation y * (1.0/x) due to rounding in the calculation of 1.0/x.

But no one said that the bytes from the first equation were by definition the correct answer, so neither answer is any more correct or incorrect than the other.

In a particular application, if the difference between those two calculations was actually important, then I could argue that you shouldn't be using doubles as they do not provide sufficient numerical precision for the specific calculations being performed.


This is a wonderful example of an "optimization" that breaks the code. The point is that y * (1.0 / x) is not guaranteed to be equal to y/x, so changing the program in the way you suggest will change its results.


This is where I thought the article was heading as well. There are many different interpretations of "optimization", speed, object code size, lines of code to achieve an end result, algorithms that rely on data copies vs in-place evaluation.

I think we live in an age where economy of storage has been pushed to the wayside, at least for application code (not necessarily the data the application works upon). The explosion of disk capacity, coupled with more powerful CPUs that can better handle the workloads of compression, have contributed to this. Once again, the hardware bails out the developers to a degree.


I know this is an odd comment, but I started programming when the holy grail was a 64K machine. One now largely ignored virtuous optimization is program size. Making a program small and avoiding unnecessary calls was once a very a laudable goal.


With the latest compilers, some pretty hefty optimizations are built in.

So long as the optimized code produces the same resutls as the non-optimized code, the compiler is free to rewrite your instructions at will. This includes changing references to copies or vice versa...

In the case of the side effect of passing the double by reference, even the optimized code should produce that side effect, as the compiler can't second guess you. After all you wrote it that way. This side effect may have been your intent.


Michael, thanks, that's exactly what I'd been wondering about. I knew about pointer aliasing, and the "restrict" keyword, but I didn't know if the compiler would really be so conservative (which I guess is a good thing, and also the point of the article).


I believe that has to do with aliasing rules.

Because the compiler can't prove &*p and &x don't address the same memory it will have to be conservative and can't just store x in a register before the loop.


Nice bug. The both doubles after the last brace were automatically added. ;o)


Divisions are quite slow and therefore they should be removed from loops if possible. Following code would allow to pass the argument as reference and to speedup the loop on most platforms if the vector has more than just one entry:

void scale(vector<double>& v, const double& x)
double r = 1.0 / x;
for (std::vector<double>::iterator p = v.begin(); p != v.end(); ++p)
*p *= r;


This is what I think, but I could be wrong:

I think the standard does not say anything about how references must be implemented, so it is up to the compiler people to decide; usually they are implemented as pointers. The slightly worse access time could be due to the need to jump to the address where to pointer points to, when it is dereferenced.


This must be a herp-derp moment for me. The 2nd parameter is not a reference to a volatile double, so why can't the compiler optimize the reference by just stuffing the value into a CPU register? Or is the compiler so smart that it sees that it's a reference to an array that is within the array being manipulated?

Asked another way - let's say the function was called with the 2nd parameter just being a reference to a standalone double somewhere in memory - obviously the called function wouldn't know anything about where it lives - would things be any different then?

I'm the first to admit I'm not a compiler writer or standards wonk. It's just surprising to me how I read this - basically that a reference, for lack of a better term, needs to be de-referenced each time it's used? No stuffing away in a register?


"t is a reference — which means that each time the program uses t, the compiler has to look at the object to which t refers." -- I don´t understand in which situation this should have a negative impact on the runtime behavior? And I think it would not increase the compile time, because the static type check is done for call-by-references as well as for call-by-value access. Acces per reference is always faster than copying the whole object on current stack frame... but maybe I am wrong?? Could someone clarify