Go Parallel
Implicit CPU Vectorization with Intel OpenCL SDK 1.5

Intel OpenCL SDK 1.5 includes important performance enhancements specifically designed for the latest Intel 2nd Generation Core Processors and outlines a path for future performance improvements.

By Gaston Hillar
November 22, 2011
URL : http://www.drdobbs.com/open-source/implicit-cpu-vectorization-with-intel-op/232200021

Intel OpenCL SDK 1.5 includes important performance enhancements specifically designed for the latest Intel 2nd Generation Core Processors and outlines a path for future performance improvements.

Now, the implicit CPU vectorization module takes advantage of the Intel AVX instruction set that can process 8 floating-point numbers in parallel per available physical core. In addition, future versions will tune the implicit CPU vectorization module to make it easy for developers to take advantage of the future Intel AVX2 instruction set. It is not necessary to make changes to your existing code because the Intel OpenCL SDK 1.5 compiler operates in JIT (short for Just-In-Time) mode, and therefore generates the kernel's assembly code based on the available instruction set.

The vectorization module takes scalar code and generates the most appropriate SIMD instructions according to the available instruction set. For example, if you're running a kernel on a CPU with Intel Streaming SIMD Extensions 4.1, also known as SSE 4.1, but without AVX support, the kernel's assembly code will use SSE 4.1 instructions. However, if you run the same kernel on a CPU with Intel AVX, the kernel's assembly code will use AVX instructions to take full advantage of the most powerful instruction set. This way, when you write code for kernels and you target Intel CPUs, you know that the implicit CPU vectorization will generate assembly code that's going to be optimized for the most convenient instruction set available in the future. You just have to make sure that you always install the latest version of either the Intel OpenCL SDK or the Intel OpenCL SDK runtime.

Intel OpenCL SDK 1.5 includes the Intel OpenCL offline compiler command-line client, version 1.0.2. There are two versions for the utility, a 64-bit version (ioc64.exe) and a 32-bit version (ioc32.exe). You can use the -simd option to specify the desired target instruction set architecture. You can use one of the following three options:

By using any of the aforementioned options, the offline compiler won't generate the assembly code based on the instruction set available in your workstation's CPU. For example, if your CPU supports Intel AVX instructions, but you use -simd=sse41, the assembly code will just include SSE 4.1 instructions and won't use either SSE4.2 or AVX instructions. You can analyze the differences between the assembly codes generated by each target instruction set and you'll understand the power of the implicit CPU vectorization module.

It is also possible to use the GUI provided for the Intel OpenCL SDK Offline Compiler to select the desired target instruction set architecture. You just have to select Tools | Options… Then, uncheck the "Use current platform architecture" textbox. Select the desired instruction set architecture in the dropdown list below the checkbox and click OK. Figure 1 shows the available options. Finally, click OK and your next build will consider the selected instruction set architecture.

[Click image to view at full size]
Figure 1. Intel OpenCL SDK offline compiler (64-bit) displays the available instruction set architectures.

If you want to check whether a specific piece of code in a kernel is going to take advantage of AVX, it is easy to do with the newest Intel OpenCL SDK. It is a very important feature, specifically when you want to write high-performance kernels that target the most powerful Intel CPUs. In addition, you can generate the program binaries for each of the different platforms.

If you're new to Intel OpenCL SDK, you can read my previous post about Intel OpenCL SDK 1.1 If you have Intel OpenCL SDK 1.1 installed on your computer, you can easily upgrade by downloading and running the installation for the newest version here.

Copyright © 2012 UBM Techweb