Channels ▼

Stephen Blair-chappell

Dr. Dobb's Bloggers

Amazing Performance Gains Using SSE Intrinsics

March 29, 2010

I'm in the middle of writing up some case studies based on interviews with users of Intel Parallel Studio. As part of the exercise I set myself the goal of duplicating every technique the project engineers used.

Developers in two of the first three case studies used SSE2 (Streaming SIMD Extensions) intrinsics to speed up their code. I must admit, I've never really used the intrinsics before (apart from inserting some pre-fetch instructions in some code), and thought they were a bit 'out-of-fashion'. I was completely amazed that by spending a couple of hours on some code I was able to get the code running nearly 20 times faster.

The code that I wrote does a lot of array manipulation, iterating through an array millions of times to test for values. By rearranging the code so that I used an array of SSE registers, rather than an array of integers, I was able to get the dramatic increase in speed.

Hard Work

Getting familiar with the different intrinsics is hard work. Most of the time spent rewriting this code was reading the description of each intrinsic in the compiler manual. Even now I'm not sure I've written code with the most effective use of the intrinsics.

The method I followed was to:

  • Write the original code
  • Use Intel Parallel Amplifier to check for Hotspots (in my case the hotspot was a function testing if the array held a certain value).
  • Rewrite the hotspot code using SSE intrinsics
  • Re-run Parallel Amplifier

In the code I used the following intrinsics:

__m128i MyArray[MY_MAX_NUM];  // array of 128bit values
      _mm_and_si128( ...)            // AND
      _mm_setzero_si128()            // init to zero
      _mm_cmpeq_epi32(...)           // IS EQUAL?
      _mm_storeu_si128(...);         // COPY SSE2 results into memory

A more complete description of these intrinsics can be found in the Parallel Studio help. These macros are also supported by the Microsoft compiler. You can find some introductory material on SSE intrinsics here.

Using SSE intrinsics now goes to near the top of my list of things you should consider when optimising code.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video