Channels ▼

Jonathan Erickson

Dr. Dobb's Bloggers

Larrabee's New Instructions In C++: A Prototype

April 02, 2009

So you've read the article and watched the video, and you're chomping at the bit to start playing around with Larrabee, Intel's multicore architecture that boasts many cores, many threads, and a new vector instruction set -- all in the name of pushing performance. You have everything you need except -- a Larrabee.

No problem. This C++ prototype library provides a C++ implementation of the Larrabee new instructions (LRBni), making it possible for you to experiment with developing Larrabee code without a Larrabee compiler and without Larrabee hardware.

Okay, you can't go full-bore Larrabee. The library doesn't try to match the LRBni in every case, particularly when it comes to exceptions, flags, bit-precision, or memory alignment restrictions. Nor should you assume that the exact syntax and semantics of the functions presented in the library will be supported in future Larrabee-based hardware and software. Still it's enough to get you going.

As Michael Abrash explains in detail in A First Look at the Larrabee New Instructions (LRBni), the Larrabee new instructions are extensions of the existing Intel-based vector graphics streaming SIMD instructions. They operate on two new sets of registers:

  • 32 512-bit vector registers (v0-v31) that hold either 16 32-bit values or 8 64-bit values
  • 8 16-bit vector mask registers (k0-k7) that hold 16 bit masks

The LRB prototype primitives support these data types with the following C objects:

  • typedef struct { float v[16]; } _M512
  • typedef struct { double v[8]; } _M512D
  • typedef struct { int v[16]; } _M512I
  • typedef unsigned short __mmask;

Additionally, enumerated types are defined for the instructions that use immediate value operands. These are listed in the last section of this document.

Most Larrabee vector instructions have the form:

vop v1 {k1}, v2, S(v3/m)

where v1 is the destination vector register, k1 is the vector mask register, v2 is the first source vector register, and S(v3/m) is the second source -- written that way to indicate that it is the result of a swizzle/broadcast/conversion process S on either a memory location m or vector register v3. k1 is a writemask, meaning that only those elements with the corresponding bit set in k1 are computed and stored into v1. Elements in v1 with the corresponding bit clear in k1 retain their previous values, so a merging of the new element values with the previous element values is implied.

The C++ LRBni prototype primitive functions take the sources as inputs and return the destination value. To simplify the usage and to enable compiler optimizations, pairs of functions are provided for each vector instruction --- the full version that takes a mask and the destination register as arguments, and a short version when operating on the entire vector. Each Larrabee vector instruction is implemented with two functions like this:

v1 = _mm512_mask_op(v1_old, k1, v2, v3);
v1 = _mm512_op(v2, v3);

If the destination is required as a source (for example the MADD operation), it is included in the inputs as v1 or v1_old. If the instruction writes to both a vector register and a vector mask register, the vector register is returned and the vector mask is written through a pointer (listed as either k1_resor k2_res). For example:

_M512I _mm512_adc_pi(_M512I v1, __mmask k2, _M512I v3, __mmask *k2_res)
_M512  _mm512_mask_add_ps(_M512 v1_old, __mmask k1, _M512 v2, _M512 v3)

Because this programming model cannot enforce the same variable be used twice when a single register is both a source and a destination, it allows for constructs that may map to more than a single Larrabee instruction.

Instead of adding the swizzle/broadcast/conversion arguments to the vector functions, the S(v/m) operation is implemented as a set of functions that operate on either a vector register or a memory location and produce a vector result. The output of these swizzle/broadcast/conversion functions can be used as any vector source input, however each Larrabee vector instruction only supports such an operation on the last source operand. While this decoupling creates an easier programming model, it also allows for constructs that may map to more than a single Larrabee instruction.

For more details, go here

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video