Channels ▼

Clay Breshears

Dr. Dobb's Bloggers

You've Got Parallel Code in My Chocolate

September 14, 2012

I use InterlockedCompareExchnge() to test and set the visited array elements whenever a valid visit is made to a previously unvisited node. Each node must be visited twice in the above code: once by the boss thread and once by a worker thread. I could easily have used two separate visited arrays, but there's not much fun in that. Instead, I simply used two different values to denote which type of thread has already visited the node. A "1" notes that the boss thread has seen the node (exchanged for the initial "0" value) and placed it into the workers queue. A "2" is the value indicating that a worker thread has processed the node, but this will only be set if the original value of the visited element was a "1".

I hope you are asking yourself, "Does Clay really need the InterlockedCompareExchnge() call in the worker thread code?" Since the boss thread will only enqueue each node once, there shouldn't be any reason to mark if a node has been visited by a worker thread. Thus, if only one copy of each node is put into the worker queue, why do I need to make this second check? It's a slick scheme (if I do say so myself), but the second use in the worker thread really is superfluous. Unfortunately, when I removed the function call and ran with more than one worker thread, some "phantom" node seemed to be slipping into the queue and, since it wasn't tested for previous visitation by a worker thread, the gCount counter was incremented past NUM_THREADS and at least one thread was put into a deadlock state.

It turns out that nothing extra was getting put back into the queue. Instead, (at least) one of the last executing threads was attempting to reprocess a node. Consider the last two nodes to be processed, say node11 and node16, on two different threads. The node16 thread increments gCount to NUM_THREADS just after the node11 thread finishes the computations on that node but just before the thread enters the inner spin-wait loop. Since there are no more nodes to process from the graph, the node11 thread doesn't pull anything from the queue (and doesn't update the local k value) and then checks the termination condition. Finding that the processing is complete, the node11 thread breaks out of the spin-wait loop. However, it proceeds into the code incrementing gCount and processing the node (again). As I noted before, the incrementing of the counter leads to the deadlock.

One quick fix to this problem, and not using the InterlockedCompareExchnge() call in the worker thread code, would be to change the break to a return on the termination test in the spin-wait loop. This will prevent the thread from simply getting out of the spin-wait and executing an additional node computation. Another possibility would be to restructure the whole worker thread function to be more like the following code.

unsigned __stdcall  pDFSearch(void *pArg)
  int k, i;

  while (!Q.try_pop(k)) { // get first item off queue 
    if (gCount == V) break;
  while(gCount != V) {
    InterlockedIncrement(&gCount);  // "visit" node k
           Do something to VISIT node k

    while (!Q.try_pop(k)) { 
      if (gCount == V) break;
	return 0;

This code assumes that there is at least one node per thread since the first thing a thread does is pull a node out of the queue. If there aren't that many nodes or a thread gets really unlucky and all nodes are processed before it gets the chance to access the queue, I use an initial spin-wait loop to prevent unwanted processing of "garbage" nodes. The main loop will be entered when the thread has a valid node to process. The code processes the node, increments the counter, and then attempts to pull another node from the queue in the spin-wait loop I've used before. If a node is available, the outer loop will process it; otherwise, after not finding anything on the queue due to the computation being finished, the spin-wait loop will break out and the outer loop test will conclude execution.

Seems pretty basic once you've seen it, right?

One more point to note: What if there is the possibility that processing of, say, node10 relies on results from previous nodes in the topological order? While I am able to enqueue each node in the right serial sequence, once a task is put into the queue the nondeterministic property of thread scheduling cannot promise that nodes will be processed in any guaranteed order. Even if the computation is exactly the same for each node in the DAG, some task may be given more compute cycles and finish before a task that was enqueued before it. Thinking of the worst case, I ask myself if the processing of the first node from the DAG could be completed after all other nodes were complete. If it can and the same criterion applies to any node in the graph, then the above scheme should work. If not, I have a "mostly independent" set of tasks and I need to add some synchronization.

The most obvious dependence that might arise in the simultaneous equations solving example is if current node processing uses results from previous nodes. For example, if the computed results of node05 and node08 are needed to process node10, I must ensure that processing of node10 only starts after the node5 task and node8 task are done. As part of the task description, I can attach a counter value that is taken from the order array or just the node ID number, effectively labeling nodes. This tag value can be used as an index into a results array or an array that simply marks whether or not the associated node has completed processing. If the dependence of nodes is computable from the current node's ID number (e.g., [id - 5] and [id/2 + 3]), then the task references the results/mark array and waits until the relevant results have been computed.

Alternatively, you might use an array of synchronization objects as the mark array. I can see where an array of condition variables, events, or semaphores could be employed, depending on the threading model used. Because I haven't actually implemented any of these ideas, the exercise is left to the reader.

Trying to keep the most parallelism in your program codes is a goal that will enhance the scalability of your applications on platforms with increasing numbers of cores. Whenever there is a need for serial execution or some other ordered dependence, look beyond the standard tricks of ordering synchronizations and seek a solution that can retain as much parallel leeway as possible. Too much serialization, of course, will build up processing overhead that can confound any parallel speedup benefit that you have created.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.