Intel has launched its eponymously labeled Parallel Studio XE 2015 developer toolkit for high performance computing (HPC) and technical computing applications.
This update includes what Intel calls out as "first-to-market" explicit vector programming capabilities to optimize processing cycles through concurrent operations on Intel Xeon processors and Intel Xeon Phi coprocessors.
The Intel Developer Zone explains that we should care about vectorizing applications today as this process can improve performance and so can save power. The faster an application can compute CPU-intensive regions, the faster the CPU can be set to a lower power state.
"How does vectorizing compare to scalar operations with regard to performance and power? Vectorizing consumes less power than equivalent scalar operations because it performs better: Scalar operations process less several times data per cycle and require more instructions and more cycles to complete," says the company.
Intel Parallel Studio XE 2015 also supports new standards including OpenMP 4.0 for vectorization support to developers. The updated release also gives developers comprehensive optimization reports that provide deep insights on code performance.
According to openmp.org, the OpenMP API supports multi-platform shared-memory parallel programming in C/C++ and Fortran — it defines a portable, scalable model with a simple and flexible interface for developing parallel applications on platforms from the desktop to the supercomputer.
Also key in Intel's news is C++11 compliance and upgraded profiling and threading tools.
James Reinders, director of parallel programming evangelism at Intel corporation, reminds us that unfortunately, programming a multithreaded, vectorized application that can take advantage of the resources available in modern CPUs is a complex and error-prone process.
"Even the most experienced developers can't keep track of all of the things that may happen simultaneously. The result is buggy programs with problems that are difficult to reproduce and fix, and that don't scale well when the core count is increased," said Reinders.
Reinders' comments come relation to his reference to Intel Cilk Plus, a technology designed to make it easier for developers to build, develop, and debug robust applications that take full advantage of modern processor architectures.
According to Intel Developer Zone's Robert Chesebrough, "The introduction of wider vector registers in x86 platforms and the increasing number of cores that support single instruction multiple data (SIMD) and threading parallelism now make vectorization an optimization consideration for developers. This is because vector performance gains are applied per core, so multiplicative application performance gains become possible for more applications. In the past, many developers relied heavily on the compiler to auto-vectorize some loops, but serial constraints of programming languages have hindered the compiler's ability to vectorize many different kinds of loops. "