Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Open Source

Implicit CPU Vectorization with Intel OpenCL SDK 1.5

Intel OpenCL SDK 1.5 includes important performance enhancements specifically designed for the latest Intel 2nd Generation Core Processors and outlines a path for future performance improvements.

Now, the implicit CPU vectorization module takes advantage of the Intel AVX instruction set that can process 8 floating-point numbers in parallel per available physical core. In addition, future versions will tune the implicit CPU vectorization module to make it easy for developers to take advantage of the future Intel AVX2 instruction set. It is not necessary to make changes to your existing code because the Intel OpenCL SDK 1.5 compiler operates in JIT (short for Just-In-Time) mode, and therefore generates the kernel's assembly code based on the available instruction set.

The vectorization module takes scalar code and generates the most appropriate SIMD instructions according to the available instruction set. For example, if you're running a kernel on a CPU with Intel Streaming SIMD Extensions 4.1, also known as SSE 4.1, but without AVX support, the kernel's assembly code will use SSE 4.1 instructions. However, if you run the same kernel on a CPU with Intel AVX, the kernel's assembly code will use AVX instructions to take full advantage of the most powerful instruction set. This way, when you write code for kernels and you target Intel CPUs, you know that the implicit CPU vectorization will generate assembly code that's going to be optimized for the most convenient instruction set available in the future. You just have to make sure that you always install the latest version of either the Intel OpenCL SDK or the Intel OpenCL SDK runtime.

Intel OpenCL SDK 1.5 includes the Intel OpenCL offline compiler command-line client, version 1.0.2. There are two versions for the utility, a 64-bit version (ioc64.exe) and a 32-bit version (ioc32.exe). You can use the -simd option to specify the desired target instruction set architecture. You can use one of the following three options:

  • -simd=sse41: Intel Streaming SIMD Extensions 4.1
  • -simd=sse42: Intel Streaming SIMD Extensions 4.2
  • -simd=avx: Intel AVX

By using any of the aforementioned options, the offline compiler won't generate the assembly code based on the instruction set available in your workstation's CPU. For example, if your CPU supports Intel AVX instructions, but you use -simd=sse41, the assembly code will just include SSE 4.1 instructions and won't use either SSE4.2 or AVX instructions. You can analyze the differences between the assembly codes generated by each target instruction set and you'll understand the power of the implicit CPU vectorization module.

It is also possible to use the GUI provided for the Intel OpenCL SDK Offline Compiler to select the desired target instruction set architecture. You just have to select Tools | Options… Then, uncheck the "Use current platform architecture" textbox. Select the desired instruction set architecture in the dropdown list below the checkbox and click OK. Figure 1 shows the available options. Finally, click OK and your next build will consider the selected instruction set architecture.

[Click image to view at full size]
Figure 1. Intel OpenCL SDK offline compiler (64-bit) displays the available instruction set architectures.

If you want to check whether a specific piece of code in a kernel is going to take advantage of AVX, it is easy to do with the newest Intel OpenCL SDK. It is a very important feature, specifically when you want to write high-performance kernels that target the most powerful Intel CPUs. In addition, you can generate the program binaries for each of the different platforms.

If you're new to Intel OpenCL SDK, you can read my previous post about Intel OpenCL SDK 1.1 If you have Intel OpenCL SDK 1.1 installed on your computer, you can easily upgrade by downloading and running the installation for the newest version here.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.