Developer interest has been healthy following the recent porting of Intel Cilk Plus into the GNU Compiler Collection (GCC). Intel's set of C and C++ constructs for task-parallel and data-parallel programming are designed to improve performance on multicore and vector processors.
The GCC community is now actively seeking developers to collaborate and share advice as this open source project proceeds. As well as providing user experience input, developers are encouraged to feedback on the project's corresponding open language specification.
According to Intel, the three Intel Cilk Plus keywords provide a "simple yet surprisingly powerful" model for parallel programming, while runtime and template libraries offer a well-tuned environment for building parallel applications.
The three Intel Cilk Plus keywords are:
Intel software engineer Balaji V. Iyer writes that Cilk Plus reducers provide a lock-free way to deal with shared data. Simple array notations including elemental functions will allow programmers to easily use data-parallelism — and pragma directives communicate SIMD information to the vectorizer to help ensure that loops are vectorized correctly.
The implementation of Intel Cilk Plus language extensions to GCC requires patches to the C and C++ front-ends, plus a copy of the Intel Cilk Plus runtime library (Cilk Plus RTL). Both of these elements have apparently now been checked into the new GCC branch. The Cilk Plus RTL is maintained using an upstream, BSD-licensed version available at http://www.cilkplus.org — changes to the Cilk Plus RTL are welcome and must be contributed to the upstream version.
Intel itself reminds us that because Intel Cilk Plus is an extension to C and C++, programmers typically do not need to restructure programs significantly in order to add parallelism. Parallelism expert and director of marketing and business for Intel's software development products James Reinders says that, "It is time to make parallelism a full First Class Citizen in C and C++. Hardware is once again ahead of software, and we need to close the gap so that application development is better able to utilize the hardware without low-level programming."
Reinders continues, "Like other popular programming languages, neither C nor C++ were designed as parallel programming languages. Parallelism is always hidden from a compiler and needs "discovery." Compilers are not good at "complex discovery" — they are much better at optimizing and packaging up things that are explicit. Explicit constructs for parallelism solve this and make compiler support more likely. The constructs do not need to be numerous, just enough for other constructs to build upon… fewer is better!"
Returning to the three Intel Cilk Plus keywords.
_Cilk_spawn— Annotates a function-call and indicates that execution may (but is not required to) continue without waiting for the function to return. The syntax is:
[ <type> <retval> = ] _Cilk_spawn <postfix_expression> (<expression-list> (optional))
_Cilk_sync— Indicates that all the statements in the current Cilk block must finish executing before any statements after the
_Cilk_syncbegin executing. The syntax is:
_Cilk_for— Is a variant of a
forstatement where any or all iterations may (but are not required to) execute in parallel. You can optionally precede
_Cilk_forwith a grainsize-pragma to specify the number of serial iterations desired for each chunk of the parallel loop. If there is no grainsize pragma or if the grainsize evaluates to '0', then the runtime will pick a grainsize using its own internal heuristics. The syntax:
[ #pragma cilk grainsize = <expression> ] _Cilk_for (<assignment_expression> ; <condition> ; <expression>) <statement>
Intel's Balaji V. Iyer details the following with regard to Cilk Plus, "The parser will accept these keywords and insert the appropriate functions to interact with the runtime library. Along with these keywords, you can use #pragma SIMD directives to communicate loop information to the vectorizer so it can generate better vectorized code. The five #pragma SIMD directives are:
vectorlength, private, linear, reduction, and
assert. The list below summarizes the five directives. For a detailed explanation please refer to the 'Intel Cilk Plus Language Specification' at http://www.cilkplus.org.
#pragma simd vectorlength (n1, n2 ...): Specify a choice vector width that the back-end may use to vectorize the loop.
#pragma SIMD private (var1, var2, ...): Specify a set of variables for which each loop iteration is independent of each other iterations.
#pragma SIMD linear (var1:stride1, var2:stride2, ...): Specify a set of variables that increase monotonically in each iteration of the loop.
#pragma SIMD reduction (operator: var1, var2...): Specify a set of variables whose value is computed by vector reduction using the specified operator.
#pragma SIMD assert: Directs the compiler to halt if the vectorizer is unable to vectorize the loop.
The current implementation of the runtime library has been tested on x86 (both 32- and 64-bit) architectures. In theory, the runtime library should not be difficult to port to other architectures.
"These language extensions provide a simple, well-structured, and powerful model for parallel programming. In this initial release, the array notations and elemental functions present in the full Intel Cilk Plus Language Specification are not yet implemented," said Intel's Iyer.
Intel does remind us, however, that we should be aware that access to shared variables currently assumes sequential consistency, so architectures that use a different memory model may require you to insert additional memory barriers.
Intel hopes that developers will find these extensions to be a useful and significant enhancement to the GCC C and C++ compiler.