Channels ▼
RSS

Parallel

Array Building Blocks: A Flexible Parallel Programming Model for Multicore and Many-Core Architectures


The authors are members of the Intel Software and Services Group.


Microprocessor design and manufacturing process innovations continue to improve software application performance through both implicit and explicit mechanisms. In addition to improving performance, there is also now the new challenge of reducing power consumption in many applications of interest, such as in mobile devices. Multicore and many-core processors are one type of design evolution that can address both performance and power efficiency. In addition to multiple cores, such processors typically also include per-core vector units providing an additional level of parallelism.

The benefits of these architectures can only be fully realized by writing parallelized and vectorized code. Some of the existing approaches to parallelization include using Windows and POSIX thread APIs, using MPI, and using the OpenMP shared memory threaded programming model. Vectorization can be accomplished by using vector intrinsics or by depending on auto-vectorization in the compiler. However, using most existing programming models, combined with usage of threads and vector instructions requires a great deal of expertise from the programmer and often results in code that underperforms or that is overly tied to a specific processor architecture.

CPU threading APIs provide a generic programming model for multicore parallelization, but applications using this model still need fine tuning of activities such as task spawning, data distribution, and synchronization to extract the best performance. Even so, threading by itself does not provide access to per-core vector parallelism. On the other hand, GPU-derived programming models such as OpenCL provide a separate compiler and runtime to extract application parallelism and can target vectorization as well as core parallelism. However, these programming models are still fairly low-level and expect the implementation of applications written using them to be directly targeted at specific architectures.

To address these problems, Intel is introducing a suite of programming models, the Intel Parallel Building Blocks, that can target both vector and core parallelism in a general, scalable, and architecture-independent fashion. These models are intended to support future scaling so that code written today will be able to harness both today's and tomorrow's processors. Intel Array Building Blocks is one of these models, supporting data parallelism in a compiler-independent fashion. Array Building Blocks provides an abstract scalable API based on the composition of structured data-parallel constructs. It is independent of machine architecture and allows users to focus on development of scalable parallel algorithms rather than becoming experts in particular machine-dependent parallel mechanisms. In this article, we present some of the features of the Array Building Blocks programming model and will provide some code examples.

Array Building Blocks: An Overview

The goal of Array Building Blocks is to define a programming model that efficiently and portably targets s/w for multicore and many-core architectures. The design philosophy of Array Building Blocks is to get application developers to "think parallel" while hiding nuances of the underlying execution layer like hardware threads, cores, and vector ISA. The programmer can then expresses parallelism in an architecture-independent fashion.

Array Building Blocks provides a dynamic execution engine which comprises of three major services:

  • Threading Runtime dynamically adapts to the underlying architecture. The threading runtime (TRT) provides a fine-grained model for data and task parallel threading. TRT also handles complex fine-grained synchronization patterns.
  • Memory Manager segregates the Array Building Blocks memory/ vector space. It has a set of lock-free memory interfaces as well as a garbage collector. The memory manager is responsible for allocation, data formatting, and in conjunction with TRT, partitioning data for parallel operations.
  • Just-in-time Compiler/Dynamic Engine constructs an intermediate representation (IR) of the computations, performs optimizations and generates the code that is to be executed. Compilation occurs only if required; otherwise code is pulled from the code cache. The Array Building Blocks compiler has three phases: high-level (HLO), low-level (LLO) and Converged Vector ISA code generation. Converged Vector Intrinsics (CVI) is an abstracted and generalized IA32/Intel 64 vector ISA.

The HLO phase performs architecture independent code optimizations to reduce threading overhead, memory usage and redundant computation. The LLO phase does runtime-dependent optimizations. These optimizations include 1) Generation of parallel kernels using the threading runtime, 2) Translation of optimized kernels into vector code, and 3) Generation of architecture-independent CVI code. The CVI code is not bound to a particular generation of Intel vector instructions (such as SSE), thus ensuring forward-scaling and architecture independence of the overall stack. CVI code is then generated for the particular ISA version in the target machine.

Array Building Blocks offers programmers the ability to selectively target portions of their C/C++ programs to rewrite in Array Building Blocks. It allows programmers to apply a rich set of operators to a very expressive set of types that includes 1D, 2D and 3D dense containers, nested containers, and (in the future) indexed and sparse containers. It also provides safety, both by isolating its data objects from the rest of the C/C++ program and also by designing away conflicting concurrent accesses to shared objects. The isolation eliminates the need for locks, precludes data races, and obviates the need to create complex parallel data structures.

Array Building Blocks goes beyond replacing simple loops with array operations. There are several different facets of Array Building Blocks:

  • A Programming Model: To express data parallelism with sequential semantics, Array Building Blocks allows operations to be expressed at an aggregate collection level instead of at an individual element level.
  • A Language: Array Building Blocks adds new types/operations and mimics C/C++ control flow. Array Building Blocks adds to the C/C++ language through header files and a runtime library.
  • An Abstract Machine: The Array Building Blocks high-level interface abstracts the details of the underlying machine while the dynamic compiler and runtime reduce the need to code at the hardware level to extract good performance. The various degrees of parallelism supported with threads, vectors, the ISA, memory model, and cache sizes are hidden from the programmer, making Array Building Blocks code quite portable and easy to write.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video