Channels ▼
RSS

C/C++

Language of the Month: Intel's Cilk Plus


It might seem intuitive to think that the spawned function is the one that is queued for later execution, and that the worker executing the parent will continue executing. But, in fact, it is the other way around. The worker that hit the _Cilk_spawn keyword will detach and be queued up to continue executing its work.

Why does this asymmetry matter? The reason is that the order of evaluation corresponds to the order of evaluation of a function call in a serial C/C++ program. The spawning function will continue to execute the child, as is the case in a regular function call, and if the continuation is not stolen by another worker. Then, it will return, pop the continuation as a work item, and continue executing it. The important thing to note is that this order of evaluation is the same as that of a function call in a serial program.

It is also important to note that Cilk Plus keywords can be elided. For example, the programmer can use the C/C++ preprocessor and #define them away using the syntax in Listing Four.

Listing Four: Cilk Plus keywords can be elided to enable correct serial execution of apps built with compilers that do not support Cilk Plus keywords. This helps keep your source code portable.

#define _Cilk_spawn
#define _Cilk_sync

Compiling/building an application that uses the Cilk Plus keywords, which elides them as shown, will produce a valid serial application. As long as the parallel version of the program does not introduce data races, the serialization of the program will be semantically equivalent to the parallel execution of the program and will produce the same results.

The _Cilk_sync Keyword

The _Cilk_sync keyword in the example of the fib() function in Listing Two is necessary to guarantee that the values in the variables x and y are updated by the asynchronous execution of fib(n-1) and fib(n-2) before they are used. In addition to the explicit form of the keyword, there is an implicit _Cilk_sync at the end of every spawning function. The implicit keyword is, in fact, inserted by the compiler.

The effect of the implicit _Cilk_sync is that the spawning function executes as long as any of the child tasks it spawned are executing. One benefit of this property is that it allows the programmer to continue viewing functions as units of work, and to see when a function returns all its work and when it is done. A more practical benefit is that if the spawning function is passing an address of any of its stack variables to a child, and the child task may write onto that location, then it is guaranteed that the stack of the parent is in memory during the execution of the child.

The _Cilk_for Keyword

The second tasking construct provides the keyword _Cilk_for. It enables programmers to parallelize C/C++ for loops. Consider the statement in Listing Five.

Listing Five: The _Cilk_for keyword.

_Cilk_for (int i = 0; i < N; ++i) {
  body;
}

When this keyword is used, the compiler enforces a few restrictions on the for construct. These include:

  • The index variable (i in Listing Five) appearing in all three expressions of the loop is initialized in the first instance.
  • It is compared to a value that does not change within the loop.
  • It is incremented by a value that does not change inside the loop.
  • It cannot be changed within the body of the loop.

The effect of these restrictions is that when the loop is about to execute, its trip count is known to the runtime scheduler.

The execution of the _Cilk_for loop uses a divide-and-conquer approach. The worker whose execution reached the _Cilk_for construct computes the trip count. It takes the upper half of the iterations and puts it in its own work queue for later evaluation. It then continues to divide and conquer the lower trip count, until the trip count becomes sufficiently small (determined heuristically at run time). It then executes the iterations sequentially. Upon completion, the worker pops the next work item, which was posted last, from its own work queue. This will be the second set of iteration of the lower half. If another worker has an empty work queue, it might steal work from the core that created the list of work items corresponding to the loop iterations. This is how concurrent work is created.

If this happens, it will steal an item from the top of the queue that corresponds to a maximal number of iterations. This order of stealing work is the least likely to interfere with the cache locality of the worker from which the item was taken. It also provides the most amount of work to the thief using the least number of steals. Using a minimal number of steals provides work for all workers with minimal overhead. As long as work does not get stolen, the order of evaluation of the loop iterations executed by a single worker is exactly the same as the order in serial execution.

Vector-Level Parallelism in Cilk Plus

The Cilk Plus language extension provides several ways to take advantage of the hardware-based parallelism available in Intel multicore processors.

A serial for loop can be changed into a parallel loop by changing the keyword for to _Cilk_for. This change tells the compiler that there is no ordering among the iterations of the loop. As described above, the compiler arranges batches of iterations to execute in parallel. In addition, it will attempt to vectorize the code within batches of consecutive iterations.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video