Intel OpenCL SDK 1.1
Powerful multicore CPUs are available in almost every modern laptop, workstation, or server. These powerful CPUs are very attractive for running OpenCL kernels that demand high throughput. An optimized OpenCL runtime that targets specific CPUs can take full advantage of both the latest enhancements found in modern microarchitectures and the powerful extended instruction sets. Intel does exactly that job by providing an OpenCL 1.1-compliant implementation that targets Core 2 Duo (Penryn) or later Intel microprocessors and provides unique features and tools.
Do you really want to use the GPU as the target device for your OpenCL kernels? What about testing your OpenCL kernels with a powerful Intel multicore CPU as its target? The Intel OpenCL SDK 1.1 runs on both 32-bit and 64-bit versions of Windows 7 and Windows Vista SP2. In addition, there is a 64-bit Linux version.
If you want to take full advantage of all the optimizations on the second generation Intel Core CPUs, you will need Windows 7 with SP1 installed. In a previous post, I explained that the second generation Intel Core processor family, codenamed "Sandy Bridge," introduced Intel Advanced Vector Extensions (AVX). The Intel OpenCL 1.1 runtime takes advantage of AVX when available, but that requires the operating system support introduced in Windows 7 SP1, as I explained in that post.
The Intel OpenCL 1.1 runtime makes extensive use of SIMD instructions in order to increase throughput; therefore, it requires support for Intel Streaming SIMD Extensions 4.1 or higher. In addition, the runtime uses the Intel Math Kernel Library (MKL), which is also optimized to take full advantage of the latest SIMD instructions available in modern Intel multicore CPUs. When you use math functions in the OpenCL kernels, you can take advantage of Intel MKL.
Intel OpenCL SDK 1.1 includes an OpenCL offline compiler. The offline compiler allows you to see both the assembly code and the LLVM code for your OpenCL kernel. This way, you can compile kernels for correctness and check the intermediate representation without having to use additional APIs. Figure 1 shows a screenshot of the 64-bit Windows version of the Intel OpenCL SDK offline compiler with the source code of one of the sample OpenCL kernels provided with the SDK.
You can build the kernel by selecting Tools -> Build in the main menu. After the build completes, you can select View -> Show Assembly Code, and the window will add a new panel with the assembly view for the kernel (see Figure 2). In addition, you can select View -> Show LLVM Code, and the window will add another new panel with the LLVM view (see Figure 3).
If you want to write optimal OpenCL code that targets an Intel CPU, I suggest reading Tips and Tricks for Kernel Development and Writing Optimal OpenCL Code with Intel OpenCL SDK (the latter is a PDF). In these two articles, you will find valuable information about the most convenient way to code your kernels when you target an Intel CPU. It is especially important to learn how to write kernels to benefit from implicit CPU vectorization and to avoid needless synchronization. This way, you will be able to achieve the highest throughput on Intel CPUs.