Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Measuring Parallel Performance: Optimizing a Concurrent Queue


A Word About Trendlines

Beware automatic trendlines: They are often useful, but they can also lie.

Look closely again at Figure 6: The trendlines actually obscure what's really going on by creating a false visual smoothness and continuity that our human eyes are all too willing to believe. Try to imagine the graph without any trendlines, and ask yourself: Are these really the trendlines you would draw? Yes, they're close to the data up to about 21 threads. But they under-represent throughput and scalability at the machine-saturation point of 24 threads, and then don't account for the sudden drop and increased variability beyond saturation.

One of the weaknesses of curve-fitting trendlines is that they expect all of the data to fit one consistent pattern, but that is often only true within certain regions. Trendlines don't deal well with discontinuities, where we shift from one region to another one with fundamentally different characteristics and assumptions. Clearly, trying to use fewer cores or more cores than exist on the actual hardware are two different regions, and what the data actually correctly shows is that there's a jump between these regions).

If we replace the automatically generated trendlines with hand-fitted ones, a very different and much truer picture emerges, as shown in Figure 7.

[Click image to view at full size]

Figure 7: Another view of Figure 6, without misleading automatic trendlines.

Now the discontinuity is glaring and clear: On the left-hand graph, we move from a region of consistent and tight throughput increase through a wall, beyond which we experience a sudden drop and find ourselves in a new region where throughput is both dramatically lower and far less consistent and predictable—for example, multiple runs of the same Example 4 code at >24 threads shows dramatically variable results. On the right-hand graph, we move from a region of good and linearly decreasing scalability through a sudden drop, beyond which lies a new region where scalability too is both much lower and more variable.

What Have We Learned?

To improve scalability, we need to minimize contention:

  • Reduce the size of critical sections so that we can get less contention and more concurrency among client threads.
  • Reduce sharing of the same data by isolating threads to use different parts of a data structure. In our case, we moved the responsibility for cleanup from the producer to the consumer so that consumers touch only the head and producers touch only the tail.
  • Reduce false sharing of different data on the same cache line, by adding padding to ensure that two separate variables that should be able to be used concurrently by different threads are also far enough apart in memory.

To understand our code's scalability, we need to know what to measure and what to look for in the results:

  • Identify the key different kinds of work (here, producer threads and consumer threads), and then use stress tests to measure the impact of having different quantities and combinations of these in our workload.
  • Identify the different kinds of data (here, representative "small" and "large" queue items), and then vary those to measure their impact.
  • Measure total throughput, or items handled per unit time.
  • Look for scalability, or the change in throughput as we add more threads. Does using more threads do more total work? Why or why not? In what directions, and for what combinations of workloads?
  • Look for contention, or the interference between multiple threads trying to do work concurrently.
  • Watch for the cost of oversubscription, and eliminate it either algorithmically or by limiting the actual amount of concurrency to avoid it altogether.
  • Beware of overreliance on automated trendlines. Apply them only after first examining the raw data.

Be a scientist: Gather data. Analyze it. Especially when it comes to parallelism and scalability, there's just no substitute for the advice to measure, measure, measure, and understand what the results mean. Putting together test harnesses and generating and analyzing numbers is work, but the work will reward you with a priceless understanding of how your code actually runs, especially on parallel hardware—an understanding you will never gain from just reading the code or in any other way. And then, at the end, you will ship high-quality parallel code not because you think it's fast enough, but because you know under what circumstances it is and isn't (there will always be an "isn't"), and why.

Notes

[1] H. Sutter. "Writing Lock-Free Code: A Corrected Queue" (DDJ, October 2008).

[2] H. Sutter. "Maximize Locality, Minimize Contention" (DDJ, September 2008). www.ddj.com/architect/208200273.


Herb is a bestselling author and consultant on software development topics, and a software architect at Microsoft. He can be contacted at www.gotw.ca.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.