Channels ▼

Clay Breshears

Dr. Dobb's Bloggers

Finding My Summer Vacation's Shortest Path

August 21, 2012

With such simple loops, the actual concurrent implementation can easily be done in OpenMP or Intel Threading Building Blocks. Below I've given a TBB solution. The nested loops have been placed into the tryK class and that is called from the parallel_for() algorithm using the intrinsic blocked_range2d range class to be able to divide up the iterations of the two innermost loops.

class tryK {
  const int k;
  float **D;


  tryK (const int k_, float **D_) : 
    k(k_), D(D_) {}

  tryK (const int k_, split) : 
    k(k_), D(D_) {}

  void operator () ( const blocked_range2d<int>& r ) const {
    for (i = r.rows().begin(); i < r.rows().end(); i++)
      for (j = r.cols().begin(); j < r.cols().end(); j++)
        D[i][j] = min(D[i][j], D[i][k]+D[k][j]);

void cFloyds(float **D, int N)
  int i, j, k;

  for (k = 0; k < N; k++) 
    parallel_for(blocked_range2d<int> (0, N, rGrainSize, 0, N, cGrainSize), tryK(k, D));

The first argument to the parallel_for() call is the range of the loop that is to be subdivided and run in parallel. Since I'm using the blocked_range2d class, once I have specified the type of the two ranges, I need to list two triples as the parameters to the Range class. The initial triplet is for the first (rows) loop and the other for the second (cols) loop. The first two parts of a triplet are the start and end points of the corresponding loop. The third part is the granularity argument: the size of a subrange that should not be further subdivided.

In the above code, I've assumed that the two constants, rGrainSize and cGrainSize, are set by the programmer. These are used to fix the number of rows and columns, respectively, for the TBB scheduler to stop subdividing that dimension of the subrange. Finding the values to use would likely involve some form of trial and error testing to see how small the array blocks can get before performance suffers.

You may have noticed that the entire array is accessed within each iteration. I began this post talking about a lower triangular matrix needed to find the distance on a map between two cities. On a map, roads go both ways, at least between cities, so why do I need to compute over the whole array? Couldn't the whole operation be further optimized by using only half the matrix? If I have a directed graph, there can be a path from Peoria to Quebec, but there may be no path from Quebec to Peoria. (Since my scenario takes place in summer, there might be road construction that closes all roads into Peoria or just the one you would normally travel when driving down from Quebec.) To compute the shortest path for all pairs of nodes in the graph, it's best to work with the whole array. Besides, the extra coding to translate the order of the i, j, and k index variables to access a valid element in the lower triangular section can add complexity to the code and overhead to the computation.

The details of the TBB code should be comprehensible if you understand the serial algorithm, the serial code, and the operation of the TBB parallel_for() algorithm. However, if you've been concentrating as you examined the code rather than simply lounging under a tree, sipping a Mint Julep, you might be asking yourself, "What about a data race on the kth row?" If you are imbibing a refreshing drink in the shade of a spreading chestnut, clear your thoughts and go back to the last code block to see if you can locate a problem.

If you examine the serial algorithm, each iteration of the k-loop accesses (read and potential update) all entries of the D matrix. The structure of the loops does this one row at a time reading values from a fixed row (k). Without loss of generality, I can assume that entire row updates are assigned to individual threads by TBB. Consider two threads, T0 and T1, where T0 is assigned some row i != k and T1 is assigned to update row k. Are there interleavings of T0 and T1 such that it can be shown that T1 could update D[k][j] both before (one interleaving) and after (another interleaving) T0 updates D[i][j]? Yes, I can do that quite easily. But, does it matter or can you return to your lounging activities?

This last question is really the crux of the matter. Let me rewrite the body of the inner loop by substituting k for i to explicitly illustrate the computation that T1 would execute. This substitution gives me

D[k][j] = min(D[k][j], D[k][k]+D[k][j])

From the original weight matrix I know that D[k][k] will be zero. The only way to get a smaller value in a main diagonal element of the distance matrix would be to have a negative cycle from the node to itself. However, it was stipulated that the graph could have no negative cycles. The result of the above operation will be to find the minimum value between D[k][j] and itself, which will result in no updates to any element of the distance matrix within row k. Thus, regardless of how TBB divides up the distance matrix for updates, there will be no data race accessing elements from (the fixed) row k.

Sometimes interleaving analysis is not quite enough. Just because there appears to be a data race, you also have to show that there will be actual adverse consequences from the problematic interleaving schedule.

And that is my tip for your travel planning process. That and don't forget to pack plenty sunscreen.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.