Channels ▼
RSS

Parallel

Language of the Month: Intel's Cilk Plus


Array Notations

A new syntax provided with Cilk Plus also allows simple operations on arrays. The C/C++ language standards do not provide ways to express operations on arrays. Instead, the programmer has to write a loop and express the operation in terms of elements of the arrays, creating serial order, which is sometimes unintended. The downside of writing loops with serial ordering is that in order to convert the serial loop to a vector loop, the compiler has to prove that the vector processing would be equivalent to the scalar processing implied by the loop and mandated by the language. In the majority of the cases, these proofs are bound to fail.

For example, when the program uses pointers to reference the array (a reasonable programming practice meant to allow for operations on any array, rather than write code to operate on a specific array), the compiler is unlikely to prove that two pointers point to areas of memory that are not intersecting.

Writing a[:] = b[:] + c[:]; indicates to the compiler that the elements of the arrays b and c need to be added and that there is no order required among the addition operations. These semantics allow the compiler to always generate vector code, instead of generating scalar code and attempting to prove that vector code would be equivalent.

Cilk Plus Array Section Operator

The Cilk Plus language extensions define an array section operator whose syntax is array_base[start:length:stride] where the following is true:

  • array_base is any array expression allowed in C/C++, including arrays, pointers, and C99 variable-length arrays.
  • "Start" is the first array element to which the section applies.
  • Length provides the number of array elements.
  • Stride is an increment between elements. The use of stride is optional. If t is not provided, the default value of 1 is used.

The section operator can be applied to multidimensional arrays and can be intermixed with array subscripts, such as in a[0:5][3][0:8:2]. In this example, a has to be a three-dimensional array or a pointer to an array. The rank of the expression is the number of dimensions in which the section operator is used, rather than a subscript. In the example provided, the rank is 2.

A few intrinsic functions are provided for commonly used operations such as summation of an elemental of an array. A dot-product operation might be expressed as in Lisitng Six.

Listing Six: A dot-product operation expression using _sec_reduce_add.

X = __sec_reduce_add(a[:] * b[:]);

The array notations might remind some programmers of arrays in Fortran 90. A significant difference is that the Cilk Plus array notations expect that the arrays on the right-hand side of an assignment statement do not partially overlap the array being assigned to on the left-hand side. A violation results in undefined behavior. Therefore, unlike in Fortran 90, the compiler does not generate temporary arrays to hold the intermediate values of the right-hand side of an array expression. This change results in higher performance.

Writing an array expression can be a natural way to write the program when the programmer expresses the algorithm and the design at the level of operations on an array. For an existing serial program, a change to array notations might be intrusive. A less intrusive change is also available in Cilk Plus in the form of #pragma simd. The pragma can be added to a loop to indicate that while the loop is written in scalar syntax, the implied serial ordering is not intended. Instead, the programmer's intent is for the compiler to generate vector code to implement the loop. As a language construct (rather than a performance hint), the pragma allows the compiler to generate vector code without having to prove that the vector code would be equivalent to scalar code, and scalar code is not the programmer's intent.

A third programming construct allows the programmer to write a scalar function in standard C/C++ and declare it as an "elemental function." When using an elemental function, the programmer's typical intent is to deploy it on many elements of arrays without prescribing an order of operations among the array elements. The simple example in Listing Seven shows the use of elemental functions to perform element-wise addition of arrays.

Listing Seven: Example of the use of elemental functions to perform element-wise addition of arrays.

__declspec(vector)
float v_add(float x1, float x2)
{
  return x1 + x2;
}
for (int j = 0; j < N; ++j) {
  res[j] = v_add(a[j],b[j]);
}

The use of __declspec(vector) indicates to the compiler that the function v_add is intended to be used as an elemental function. The Intel C++ Compiler will generate two versions of code for such a function. In addition to "standard" code, a vector version will be generated, which receives a vector of arguments for x1 and a vector of arguments for x2. It performs the operations of the function using the hardware vector registers across all the input arguments. This returns a vector of results instead of a single result. The function can be called in a scalar context. In such a circumstance, the compiler will translate the function call into a call to the standard, scalar function.

When it is called in a data parallel context, such as the example of the loop shown in Listing Seven, the compiler will call the vector version. If the target of the compilation is a CPU supporting the XMM vector register, for example with the SSE2 instruction set extension, then the compiler determines that four consecutive values of the input arrays can fit within a register. It will generate a version of the function that operates on four consecutive array elements, and the function will be called N/4 time (or 1, 2, or 3 additional time in scalar version, if N is not divisible by four). The benefit, of course, is enhanced performance.

Instead of calling the function using a for loop, the programmer may choose a _Cilk_for as the construct to call the function. In such a case, not only will the compiler call the vector version, it will also facilitate loop iteration execution by using multiple cores. With this use of the language construct, the programmer can get the combined benefit of both core- and vector-level parallelism.

Conclusion

Cilk Plus is a language extension to C/C++ whose main benefits are:

  1. It enables the application developer to use all parallel resources available in the hardware, including cores, vectors, and caches.
  2. It provides multiple levels of abstraction, enabling the developer to choose how parallel work can be done — by cores separately from work done by the vector — or they can choose not to program explicitly to the hardware and instead to indicate the intent for parallelism, allowing the compiler to map the operations to the hardware.
  3. It supports composability. By using tasking, with a work-stealing scheduler, the extensions allow integration of an application from independently developed components (possible third-party libraries, or modules developed independently by sub-groups) without the need to coordinate the parallelism architecture between the component and without risk of hardware resource oversubscriptions resulting from the integration.
  4. It simplifies adding parallelism to existing serial programs. Multiple attributes of the extensions support this, including the nonintrusive syntax, the ability to easily revert back to the serial program, the guarantee of serial semantics equivalence, and the low overhead of spawning a task.

Special note: Robert introduces the Cilk keywords with the names _Cilk_spawn, _Cilk_sync, and _Cilk_for. Most developers prefer to spell them cilk_spawn, cilk_sync, and cilk_for by having #include <cilk.h> in their program. The versions with leading underscores are the real keywords, because the compiler does not create these new keywords in the normal space where it can theoretically conflict with a valid name. Instead, it leaves the nicer names to the cilk.h include file.


Robert Geva is a Principal Engineer, Intel Software and Solutions Group. Portions of this article are copyright Intel Corp.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video