Channels ▼

Gaston Hillar

Dr. Dobb's Bloggers

Intel AVX2 Will Bring Integer Instructions with 256-bit SIMD Numeric Processing Capabilities

June 24, 2011

A week ago, Intel released public details on its next generation of the x86 architecture. The forthcoming microarchitecture, codenamed "Haswell," will introduce Intel AVX2, a new SIMD instruction set that extends Intel AVX. The SIMD instruction set details are already available, but you will have to wait until 2013 to use them, when the first members of the Haswell microprocessors family become available.

The second-generation Intel Core processor family, codenamed "Sandy Bridge," introduced Intel Advanced Vector Extensions (AVX) in 2011. Intel AVX is a 256-bit instruction set extension to Intel SSE that requires explicit operating system support.

Linux kernel version 2.6.30 or higher, Windows 7 Service Pack 1, and Windows Server 2008 R2 Service Pack 1 added the necessary state management to support Intel AVX. Because Intel AVX2 is also a 256-bit instruction set extension, operating system support shouldn't be a problem. Windows 7 developers had to wait for Service Pack 1 to take full advantage of Intel AVX, but Intel AVX2 won't require additional state management changes. Thus, if an operating system already supports Intel AVX, it will provide complete access to Intel AVX2.

Intel AVX2 instructions will follow the same programming model introduced by the Intel AVX instructions. One of the most interesting enhancements is the promotion of most Intel AVX 128-bit integer SIMD instruction sets to 256 bit. Intel AVX brought 256-bit floating-point SIMD instructions, but it didn't include 256-bit integer SIMD instructions. Intel AVX2 will allow you to operate with the AVX 256-bit wide YMM register for integer data types.

For example, the PABSD instruction was part of the Supplemental Streaming SIMD Extensions 3 (SSSE 3) introduced with the Intel Core 2 architecture. The PABSD mnemonic means packed absolute value for double-word. This assembly instruction receives a 128-bit input parameter that contains four 32-bit signed integers. The instruction returns a 128-bit output that contains the absolute value for each of the four 32-bit signed integers, packed in the 128-bit output.

You can calculate the absolute values for four 32-bit signed integers with a single call to the PABSD instruction. If you have to calculate the absolute values for 1,000 32-bit signed integers, you can do it with 250 calls to this instruction instead of using a single instruction for each 32-bit signed integer. Thus, you can achieve very important speedups. However, because it is necessary to pack the data before calling the SIMD instruction and then unpack the output, it is also important to measure this overhead, which adds some code.

Intel AVX introduced the VPABSD instructio, which promoted PABSD to AVX, but didn't duplicate the number of integers that can be processed at the same time. If you have to calculate the absolute values for 1,000 32-bit signed integers, you can do it with 250 calls to the AVX VPABSD instruction.

Intel AVX2 will promote the VPABSD instruction to 256 bits because it will be possible to make it work with the YMM 256-bit register. Thus, with a AVX2 VPABSD instruction that uses the YMM register, you will be able to duplicate the number of integers that can be processed at the same time. If you have to calculate the absolute values for 1,000 32-bit signed integers, you can do it with 125 calls to the AVX2 VPABSD instruction that works with the YMM register. In addition, if you run SIMD instructions in multiple cores, you can reduce the number of necessary calls. In fact, I've already explained the advantages of running as many SIMD instructions in parallel as available physical cores in my previous post, High-Level Programming Languages Should Improve Support for SIMD Instructions.

Because you won't have to change the programming model, you will be able to achieve impressive speedups by making minor changes to code that uses Intel AVX 128-bit integer SIMD instruction sets. However, remember that Intel AVX2 won't be available until 2013.

Intel AVX2 provides other enhanced functionalities in other areas, such as:

  • Specific instructions to fetch non-contiguous data elements from memory.
  • Instructions to simplify permute operations on data elements.
  • Vector shift instructions with variable-shift count per data element.

You can download the full Intel AVX and AVX2 Programming Reference here. The PDF document is titled “Intel Advanced Vector Extensions,” but it has been updated with the forthcoming Intel AVX2 instruction set.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video