Channels ▼
RSS

Parallel

Array Building Blocks: A Flexible Parallel Programming Model for Multicore and Many-Core Architectures


Function Definitions in Array Building Blocks

Functions written using in Array Building Blocks are expressed in C++. A developer simply writes a function that uses Array Building Blocks types and operations. That function is then captured by Array Building Blocks to turn it into an Array Building Blocks closure. A closure is a representation of a “recorded” computation. Closures are compiled to vectorized machine language by Array Building Blocks and can then be executed repeatedly with different parameters. Once a closure has been captured, it is immutable, which allows Array Building Blocks to generate optimized code once for a given device, and then reuse that code repeatedly. However, to support a powerful form of generic programming, it is possible to capture closures from a given C++ function repeatedly but with different parameters based on non-local C++ references.

Developers do not need to be aware of closures at first. Array Building Blocks provides a call function that can be used to invoke Array Building Blocks functions directly from within C++ or within other Array Building Blocks function definitions. The first time a given function pointer is passed to the call function, a closure is captured implicitly, compiled, cached, and called immediately. Subsequent calls of the same function pointer reuse this implicitly captured closure without any compilation overhead. Here is an example:


void foo(const dense<T>& A, dense<T>& B);
   ...
dense<T> A, B;
bind(A, addrA, N);
bind(B, addrB, N);
   ...
call(foo)(A, B);

The capture process amounts to little more than executing the provided C++ function. All Array Building Blocks types and operations are implemented so that they record their effects instead of executing immediately. Thus, when a function containing Array Building Blocks code is executed during a capture, all Array Building Blocks operations within it are collected and stored in the resulting closure as they are encountered. C++ operations that do not use Array Building Blocks types execute immediately during closure capture and are not collected as part of the resulting closure.

[Click image to view at full size]
Table 1

This design actually provides advanced developers with a powerful specialization mechanism, which can provide performance benefits multiplicative with parallelization. Array Building Blocks functions may depend on any other non-local C++ state during capture, such as the value of a global variable. The values of such state are effectively "frozen" during the capture process. By modifying such values and calling capture again as needed, any number of specialized closures parameterized by the given state is created. In addition, native C++ control flow can manipulate the capture sequence of operations in natural ways, and overhead such as virtual function calls and callbacks through function pointers can be effectively compiled out.

Kernel Processing Style

An Array Building Blocks call operation is essentially a serial operation. Functions invoked within this mechanism may contain internal parallelism however. For example, a function invoked by a call may apply an element-wise operation in parallel to a container, or a collective operation such as a reduction, but the call itself simply runs these operations in sequence. However given the extensive set of parallel operations provided in Array Building Blocks, significant parallelism can be expressed in this fashion. In addition, parallel operations inside a call are fused for greater efficiency.

The Array Building Blocks map operation provides an alternative pattern to express parallelism. The map operation can be used to apply multiple instances of a function in parallel to elements of a container, allowing even scalar functions to execute concurrently over many instances. The Array Building Blocks map construct is convenient when expressing parallel operations containing control flow, or to simplify parallelizing the body of an existing loop. In addition, mapped functions can use the neighbor function to express stencil operations, a very powerful and useful pattern. It should be noted that map operations are only used inside of Array Building Blocks functions invoked with call; if necessary, any map operation can be wrapped in a small Array Building Blocks function that can be executed using call.

Operations on containers and the map operation provide two very different ways of expressing parallelism. The first abstraction operating directly on the container types in Array Building Blocks is called "vector processing". The second abstraction is known as "kernel processing" and conceptually operates on individual elements instead of containers. However, regardless of how a developer expresses the parallelism in their application, Array Building Blocks will automate the transformation from these styles to efficient vectorized and parallelized code.

Here are some examples of these two styles expressing an equivalent computation. Note again that both of these examples need to be invoked from inside a call:

Vector Processing:


dense<f32> A, B, C, D;
A = A + B / C * D;

Kernel Processing:


void kernel(f32& result, f32 a, f32 b, f32 c, f32 d)
{
     result = a + (b / c) * d;
}
  ...
dense<f32> A, B, C, D;
map(kernel)(A, A, B, C, D);  // "aliasing" of A resolved automatically

Array Building Blocks in Action

An Array Building Blocks program consists of two parts: the C++ invoker and Array Building Blocks functions. The C++ invoker runs in the application C/C++ space, while Array Building Blocks functions run in the Array Building Blocks space. The C++ invoker invokes Array Building Blocks functions using calls as discussed above. User Array Building Blocks code is expressed using Array Building Blocks data types and Array Building Blocks operators.

The Array Building Blocks and C/C++ declarations and binding are first set up in the C/C++ space. For example:


float addrA[N], addrB[N];
dense<f32> A, B;
bind(A, addrA, N);
bind(B, addrB, N);

Next, the data, as well as an Array Building Blocks function name (implicitly taking the address of the function), will be passed to the call operation:


call(foo)(A, B);

The call operation is responsible for copying the data from the C/C++ space (addrA here) into the Array Building Blocks space, invoking the Array Building Blocks function, foo, and copying the computed results back to the C/C++ space (addrB).

In summary, the common steps in Array Building Blocks programming are:

  1. Identify the computation logic to be written in Array Building Blocks
  2. Figure out the signature of the function or the mapped function (kernel)
  3. Allocate containers specifying the size, or prepare data buffer for input/output
  4. Use the call-operator to invoke the Array Building Blocks function
  5. Implement the functionality indentified in the first step


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video