Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Clay Breshears

Dr. Dobb's Bloggers

You've Got Serial Code in my Peanut Butter

September 07, 2012

You may be familiar with the saying "When you have a hammer, everything looks like a nail." Parallelism is a fine "hammer," but sometimes you are confronted with nuts and bolts or pins and needles. One of the most often cited examples of where parallelism does not apply is the serial process of pregnancy. You can't create a child in just one month using nine women.

A less clear case of when parallelism isn't a viable option is for data dependencies or race conditions that result from parallelization of code. Yes, we all understand the need for serializing code that accesses shared resources, but identifying those parts of the code can be less obvious. Software tools can be an indispensable assistant for locating race conditions, especially when dealing with variable accesses via pointers or data structures that have been implemented by a third party.

Due to the fixed word sizes within processors, round off error will only approximate floating-point computations. Everyone has learned to live with this restriction in a serial execution. However, when you change up the order of the same computations, the effects of round off can drastically alter the answer. (Quick example: Add up 10 million very, very small numbers and then one relatively large number. If all the small numbers are done first, they might actually sum up to something that isn't dwarfed by the remaining large number. On the other hand, starting with the large number and adding all the small values to it might eliminate the effect of all the small values in the final total.) Doing floating-point computations in parallel will change up the relative order of those operations and potentially change the final results.

If a bit-for-bit equivalent result to the serial code outcome is required from the parallel code, your only option may be to execute the critical computations in serial. If all of the computations are deemed "critical," your parallel scalability will be hobbled and the application could very easily perform worse than the serial version with all the overhead of the serial implementation in the middle of the parallel execution.

So far, so obvious, right?

I began reviewing these issues when I got a question about my implementation of Depth-First Search (DFS) on a graph that I used in The Art of Concurrency. The observation was that the results of the program were different when run in parallel than they were when run in serial. Knowing that the most critical requirement for a correct parallel code (modulo round off error) is to get the same answer as an equivalent serial code, I reviewed my DFS implementation.

The parallel algorithm starts with some node in the graph and places a reference to all other nodes that are connected to this node by an edge into a shared stack. Once the stack has been primed, all threads participating in the search will pop a node from the stack. If that node has not been visited by some previous thread, the current node is noted as having been visited and the required computation is performed on the contents of the node. Finally, any unvisited nodes connected to the current node are pushed onto the stack before the thread pops a new node from the stack.

Since my code was just for illustration purposes, there is actually no work being done at each node as it is visited. The application simply notes the order of visitation taken from a shared counter. In serial, because of the stack structure, the visitation order of nodes will be fixed based on the starting node in the graph and the order in which connected nodes are pushed onto the stack. Since I was assuming that the processing done at each node must be independent of processing any other node in the graph, the order of node visitations being different between a serial execution and a two- or four- or eight-thread execution (or even between two different runs of multiple threads) was a feature, not a problem.

It turns out that the order of visitation is actually an important aspect of some computations. I was introduced to the notion of a "topological order" of graph node processing. In a directed acyclic graph (DAG), a depth-first search defines a topological order of nodes. This order is used in solving sparse linear equations where the matrix of a sparse system can be modeled as a DAG and the topological order on the DAG determines the order of equations to be solved. Assuming that the computation at each node in the graph is still independent (or mostly independent) of all other nodes, the question was posed: How could the parallel DFS visit nodes in the same order each time the code was run and did not depend on the number of threads used?

After some thought, I came up with a "back of the envelope" solution, which I will describe in my next post. Think about this on your own and we'll compare notes.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.