Channels ▼


Performance Portable C++

Source Code Accompanies This Article. Download It Now.


Performance portability is likely to become a necessary part of programming in the near future. Already, some compilers will only create good SSE code for the x86 processor architecture if stride one vectors are used (also known as Array-like layouts). At the same time, older superscalar machines usually perform best when Struct-like layouts are used as often as possible. Without the flexibility to switch memory layouts, performance almost certainly suffers as code is ported, and this is more likely to be so as we head into the future.

The technique I describe here is used in a large multiphysics project that encompasses hundreds of thousands of lines of code. The technique was originally used as a refactoring tool more than a performance portability tool. At one point, the project was able to quickly refactor the hydrodynamics portion of their physics code for a 42-100 percent speedup depending on the problem being solved and the machine being used. The hydrodynamics can be the dominant portion of the runtime for many physics applications, so doubling the performance with just a change of data structures is impressive. Part of the performance gain probably came from the compiler recognizing extra optimizations that could be applied (same compiler flags), and the rest came from different cache latency characteristics.

Thanks to Brian McCandless for questioning me during a presentation on the sidebar material, and inspiring me to write this article. His group uses a variant of this technique in their software.


Triangle and Quadrilateral areas (

Grandy, Jeff. "Efficient Computation of Volume of Hexahedral Cells," Lawrence Livermore National Laboratory UCRL-ID-128886, Oct 30, 1997.

Next-Generation Performance Portability

To take full advantage of multicore, NUMA, and heterogeneous technologies, lots of code will likely need to be rewritten, especially in the High Performance Computing (HPC) arena.

To mitigate the impact of this shift to new architectures, I'm investigating the use of source-to-source translation as a possible salvation. I've used a sophisticated tool called ROSE ( to write a Tiny C language extension that implements the techniques in this article in a way that is transparent to users. All the drawbacks I mention in the article go away, and new features emerge, especially with respect to multidimensional arrays.

Users create a single schema file describing the struct/array grouping of array names used within a code. The compiler can then use the schema to generate structs or arrays as appropriate at compile time. The groupings can be hierarchical, thus subtle relationships among groups of arrays can easily be captured.

I plan to investigate generating RapidMind, Intel's Threaded Building Block, or other backend code to complement the performance portable language extension. My hope is that source translation combined with a vector-like language extension might let the power of next-generation architectures be exploited with a minimal amount of effort devoted to code rewrites.


Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.