Channels ▼


Language of the Month: Intel's Cilk Plus

We are now well into the era in which multicore processors are commonplace, not just in servers, but also in mobile devices, laptops, and PCs. Intel processors are, increasingly, multicore in nature, offering users benefits such as improved application performance, usability, power reduction, and more. To help developers write code for these systems, Intel has worked to develop standards such as OpenMP and MPI. At the same time, Intel is offering new products for parallel programming, based on the new programming model called Intel Parallel Building Blocks (Intel PBB). Intel PBB currently consists of the following three components:

  1. Cilk Plus: This is a language extension that provides both task and data parallelism constructs. It is currently implemented and supported in Intel C++ Compiler. This article provides an introduction to Cilk Plus.

  2. Intel Threading Building Blocks (Intel TBB): Intel TBB provides both low-level tasking primitives and high-level parallel algorithms. Intel TBB is implemented as a compiler-independent template library for C++. It is available as a supported product and as open source.

  3. Intel Array Building Blocks (Intel ArBB): Intel ArBB is a C++ library that provides a generalized vector parallel programming solution that frees application developers from dependencies on particular low-level parallelism mechanisms or hardware architectures. It is currently available in beta-test form.

Figure 1: Summary of Intel Parallel Building Blocks.

Programmer Focus

The processor architecture provides multiple parallel resources, including multiple cores, vectors, hyper-threading, and caches. Different programmers have different levels of interest in writing code to take advantage of these resources. Intel PBB, and in particular Cilk Plus, support two approaches. First, programmers may choose to explicitly describe the parallel work at the core level, using tasks, and within each task write explicit vector-level parallelism. Or programmers can let their compiler and system software figure out how processing should be done. This second approach assumes that the work to be done in parallel is clearly defined and that the data on which the work will be done is clearly identified.

In addition to utilization of HW resources, a central part of value added by Intel PBB in general, and by Cilk Plus in particular, has to do with "composability." Many applications are composed of independently written components, which can be homegrown or come from third parties (such as libraries). Typically, the components (and their designers) communicate with each other through higher-level interfaces rather than at a lower, implementation-oriented level. To allow that level of collaboration in a parallel program, the task scheduler has to allow composability (that is, when modules are combined to form the application, they continue to work and perform well).

How does Cilk Plus support composability? The Cilk Plus language, as well as the other components of Intel PBB, utilize work-stealing task schedulers. Work stealing is a task-scheduling technique that is characterized by a mechanism in which units of work can migrate from the worker that generates them to another worker that will execute them — if the second worker has no work of its own to process. A worker that runs out of work steals work from a victim that has a queue of work items. In contrast, there is no mechanism by which a worker that encounters opportunities for parallel work can delegate that work to other workers. The precise details of the scheduling mechanism are out of the scope of this article, which focuses on the language aspects. However, the work-stealing technique is at the heart of the composability property, which allows for modular implementation of parallel programs.

Task Parallelism with Cilk Plus

The Cilk Plus language provides two constructs for task level parallelism. The first includes two keywords: _Cilk_spawn and _Cilk_ sync. Listings One and Two compare an implementation of the recursive implementation of the well-known Fibonacci code in serial C against a parallel implementation, which was derived from the serial code by adding the new keywords:

Listing One: Serial implementation of Fibonacci code.

int fib (int n)
  if (n <= 2)
      return n;
  else {
    int x, y;
    x = fib(n-1);
    y = fib(n-2);
    return x+y;

Listing Two: The same code made parallel using Cilk Plus keywords. )Note that no code was changed. Only the Cilk Plus keywords were inserted.)

int fib (int n)
  if (N <= 2)
      return n;
  else {
    int x, y;
    x = _Cilk_spawn fib(n-1);
    y = fib(n-2);
    return x+y;

Notice that the addition of the keywords is nonintrusive. Changing the serial program to a parallel program only requires insertion of the keywords as appropriate. It does not require changes to the underlying program. The insertion also does not impact the readability and maintainability of the program. The original serial code is easy to see and the program logic is as evident as in the original. This observation highlights one of the main design principles of Cilk Plus: It is designed for parallelizing existing serial C/C++ code. Thus, a benefit of Cilk Plus is the ability it gives developers to easily add parallelism to code by making minimal changes, while providing strong guarantees for serial equivalence.

The _Cilk_spawn Keyword

Let us now take a closer look at the _Cilk_spawn keyword, which applies to a function call, such asfib(n-1) in Listing Twp. It means that the spawned function can execute concurrently with the remainder of the enclosing function (fib(n) in this example). The enclosing function is termed a "spawning function" or the parent, and the function being spawned is the child. The _Cilk_sync keyword means that the parent has to wait until all child tasks return control to the parent.

Note that this mechanism reuses the concept of a function, which is a well-understood concept in C/C++ programming, both as a parallel region (on the parent side) and as the unit that can be spawned (on the child side). Both of these serve the purpose of bringing parallelism to existing programs with minimal effort. The reuse as a parallel region alleviates the need to introduce a new syntactic construct that would serve as a parallel region. The reuse of a function as the unit defining what can be spawned may be viewed as a limitation. In parallelizing an existing program, the programmer may want to indicate that a smaller lexical scope, such as a statement or a loop, can execute concurrently with another statement.

However, by using functions as a spawn unit, the data model remains well understood to the programmer and the language extensions do not have to introduce new rules, which have the potential to introduce programming errors. In the current model, it is clear to the programmer which variables are in the scope of the spawning functions, which variables belong to the child function, and how arguments are being passed. Indeed, argument passing is very much conformant to the way it is done in the underlying language. The values of the arguments are being evaluated by the parent task before detaching the child and enabling concurrency. In the fib(n-1) example in Listing Two, the value of n-1 is computed by the parent before the child is spawned. A less trivial example can be observed in Listing Three, where the computation of the arguments is more involved.

Listing Three: A less trivial example of _Cilk_spawn.

Val = _Cilk_spawn f(g(x),h(y));

The arguments to the function f() in Listing Three are themselves function calls. If the operation of _Cilk_spawn applied to the whole expression, then g() and h() might have been allowed to execute concurrently with respect to each other, as well as with respect to the spawning function.

In a case of a data race between the code within the function g() and h(), there is no place to insert a synchronization directive.

Arguably, the most important attribute in the behavior of _Cilk_spawn, in support of equivalence to serial execution, relates to the asymmetry between the two parts of the parent at a _Cilk_spawn point. A _Cilk_spawn does not enforce parallel execution, but instead provides an opportunity for parallel execution. The worker whose execution reached the _Cilk_spawn keyword will continue processing one of the two branches identified by the keyword, while the remaining branch is detached and put in a queue for later execution. As described earlier, the work item in the queue may or may not be picked up for execution by another worker executing on another core.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.