Channels ▼
RSS

I've Fallen In Love With the Vectoriser


Stephen Blair-Chappell is a Technical Consulting Engineer at Intel, and has worked in the Intel Compiler Lab for the last 10 years.


The Intel compiler has a feature that can make some applications run much faster -- auto-vectorisation. With a flick of the switch, some code can be sped up significantly. A number of times I have seen programs run much faster just by using this option without changing a line of code. With vectorisation, the compiler uses a set of advanced Single Instruction Multiple Data (SIMD) instructions, which ornate most modern CPUs.

Love at First Sight

"I've got to have the latest Intel compiler; it just doubled the speed of my code". So wrote a developer recently having enabled the auto-vectoriser in the compiler. Over the last 18 months I've enjoyed being at the "coal-face," working with developers helping them to optimise their code. Even as recently as this last week I worked with a company whose application speeded up by a factor of ten -- that is a 1000% speedup. No wonder I'm head-over-heels. These experiences make me feel good, make the developer look good in front of his manager, and also promotes the product that helps pay my mortgage.

Now I know that some of my colleagues will be jumping up and down at this point saying "Stephen you've got to lower people's expectations -- tell them some code can never be vectorised." Glass half-empty comes to mind at this point.

So How Does It Work?

Silicon manufacturers have enhanced the CPUs they produce in each new generation, adding extra on-chip extensions to the core -- one area of interest being support for maths and floating-point instructions. In the days if the Intel 386, a maths co-processor the Intel 387 was one such extension. The latest generation of Intel processors include support for MMX, SIMD extensions SSE, SSE2, SSE3, SSSE3, and SSE4.

SIMD instructions are capable of doing the same calculation on multiple data. So, for example, it is possible to perform four floating-point operations in one instruction; see Figure 1.

Figure 1: SIMD instructions perform multiple operations.

To take advantage of these new instructions the C\C++ programmer could insert intrinsic instructions direct into their source code, but this is hard work and would make the source code non-portable.

Luckily, some clever people at the Intel Compiler Labs looked at ways of automatically using these SIMD instructions and came up with a technique known as "auto-vectorisation."

When auto-vectorisation is enabled, the compiler looks for opportunities where several traditional calculations could be replaced by a single SIMD instruction. A typical example of this would be a loop containing a floating-point operation. The compiler can effectively reduce the loop count by a factor of 4 by replacing the floating-point instruction with a SIMD instruction.

Putting It In Words

There are times when applications may not be suitable candidates for vectorisation. Sometimes code which potentially could be vectorised is not done so because of dependencies or other code anomalies. When a piece of code doesn't vectorise, the reporting feature of the vectoriser can help one understand why not. There are three levels of report, the last level being the most verbose, giving both reasons for failure and success; see Figure 2.

Figure 2: The vectorisation reporting levels.

I must admit that most times I've used vectorisation I've hardly ever modified any code, apart from perhaps inserting the odd #pragma ivdep to tell the compiler to ignore a particular loop dependency. The compiler's vectorisation reports can be a great aid to helping the developer shoe-horn vectorisation into more awkward code.

The vectoriser will try to work out if there is sufficient work to be done before it transforms some code. You can see in Figure 3 that the compiler didn't vectorise the first code snippet, however when the loop counter limit in increased to 100 the vectoriser decides that there is sufficient work to warrant transformation.

Figure 3: Vectorisation is dependant on workload.

Keeping the Family Happy

So what happens when I run my latest vectorised code on an old PC that doesn't support the newer SIMD instructions? Simple -- it crashes, or at least it will crash unless I also use another clever option that the Intel compiler brings.

The Generate Alternative Code Paths compiler option, tells the compiler to generate an alternate bypass alongside the code section that has been vectorised. More than one bypass can be added, so you could have support for several specific CPUs along with a generic version of the vectorised code that will run on all processors. At the start of the vectorised code the CPUID of the machine you're running on is determined, based on that value the most appropriate code route is then taken. The compiler options for the alternate paths are in Figure 4. There is a small increase in code size because of the duplicated code paths. Checking the CPUID also introduces a few extra instructions that have to be executed but the impact on the runtime is negligible.

Figure 4: The drop-down menu options in Visual Studio.

Why Not Give It a Go!

Let's be truthful, I work for Intel in the Compiler Labs -- so you'd expect me to plug the Intel products. Even so, why not let the compiler speak for itself?

You can download a free evaluation version from www.intel.com/software. It's a plug-and-play replacement for the Microsoft and GCC compiler and fairly easy to integrate into an existing project. Nice thing is that during the evaluation period there's free online support and access to a user forum.

While you're trying out vectorisation on you code, you could also experiment with other optimisation options such as inter procedural optimisation, profile guided optimisation and auto-parallelism all of which are well documented in the online help.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video