Channels ▼


Is Larrabee For the Rest of Us?


I have shown how to port a simple, sequential piece of code into its fully-parallel, SIMDized version. I realize that the kind of effort required for the data-parallel redesign is not trivial, but I don't believe that it is beyond the reach of many programmers already involved in crafting hand-optimized code.

This kind of human effort is crucial because compilers won't likely be able to deliver a similar quality of result. Furthermore, it is more important on Larrabee than on previous Intel machines because Larrabee's SIMD width is four times larger than the usual 128-bit SIMD units: As a consequence, non-SIMD code will likely leave a much higher performance fraction on the table.

On the bright side, coding with LRBni intrinsics seems more natural and less verbose than coding with SSE intrinsics. One iteration of the loop in the code I presented needs approximately 35 instructions to transition 16 finite-state machines (approximately 2.2 instructions/transition), while other 128-bit SIMD instruction sets require at least 100 instructions to transition 4 machines (approximately 25 instructions/transition).

Don't get me wrong. Without a clue on instruction latencies, you can not translate these instruction economy statistics into performance figures. At this time, nobody except Intel may estimate the amount of clock cycles taken by any LRBni instruction. Scatter/gather instructions, for example, will likely be decomposed in multiple pointer arithmetic and store/load microinstructions, which might take a high cumulative number of clock cycles to complete. The performance of this code depends on how successful Intel engineers will be in squeezing LRBni instructions into a handful of clock cycles. It's not an easy task, especially for complex instructions like scaled, masked scatter/gather with type conversion.


Thanks to Sally A. McKee, Jamin Naghmouchi, Michael Perrone and Greg Pfister for their useful comments.

Disclaimer: Any claim or information reported here on Larrabee or other Intel products may not be final or reliable. The author assumes no liability for the use or interpretation of information contained herein. This article reflects the views and the opinions solely of the author, which may not necessarily be endorsed or approved by IBM.


[1] L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, P. Hanrahan, 2008. "Larrabee: A Many-Core x86 Architecture for Visual Computing", ACM Transactions on Graphics, 27, 3, 2008.

[2] M. Abrash, "A First Look at the Larrabee New Instructions (LRBni)", Dr. Dobb's, April 1st, 2009.

[3] Intel Software Network, "C++ Larrabee Prototype Library", June 19, 2009.

[4] K. Asanovic, R. Bodik, J. Demmel, J. Kubiatowicz, K. Keutzer, E. Lee, G. Necula, D. Patterson, K. Sen, J. Shalf, J. Wawrzynek, K. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley".

[5] D. P. Scarpazza, G. F. Russell, "High-performance Regular Expression Scanning on the Cell/B.E. Processor", 23rd International Conference on Supercomputing (ICS'09), IBM T.J. Watson Research Center, Yorktown Heights, NY, USA, June 2009.$FILE/2009-06-ICS-scarpazza.pdf

[6] D. P. Scarpazza, G. W. Braudaway, "Workload Characterization and Optimization of High-performance Text Indexing on the Cell Processor", IEEE International Symposium on Workload Characterization (IISWC'09), Austin, TX, October 4, 2009.$FILE/2009-10-04-IISWC-scarpazza.pdf

[7] The Apache Software Foundation. Lucene.

[8] V. Paxson, flex: a fast lexical analyzer generator.

[9] The Free Software Foundation, Using the GNU compiler collection (GCC), "Section 5.1: Statements and Declarations in Expressions".

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.