Channels ▼
RSS

Parallel Pattern 4: Gather



Structured Patterns: An Overview

Parallel Pattern 1: Superscalar Sequences and Task Graphs

Parallel Pattern 2: Selection

Parallel Pattern 3: Map

Parallel Pattern 4: Gather

Parallel Pattern 5: Stencil

Parallel Pattern 6: Partition

Parallel Pattern 7: Reduce

Parallel Pattern 8: Scan

Parallel Pattern 9: Pack


A gather operation is a parallel random read of a set of elements from memory. It is often described as a "collective" operation, where an array and a set of indices are provided as input and the output is another array which is the result of all the reads. Gather can also be seen as the combination of a simple "serial" memory read combined with the Map pattern. Using a random read inside a function used in a map results in a gather. As with other maps, the order in which elements are read in a gather should not affect the outcome of the gather.

Gather is simple but there are a few variants which are worth thinking about, since some of the variants allow for specific optimizations.

First, are the addresses static or dynamic? If the addresses are (relatively) static, then it may be possible to do some analysis to reorder memory accesses to improve performance. For example, when performing many versions of the FFT (Fast Fourier Transform) a "scramble" operation is required that reorders the inputs or outputs. This reordering is given by taking the index of every element in the array and computing a new location by bit-reversing that index. For example, in an 8-point scramble with 3-bit indices, 0 maps to 0, 1 maps to 4, 2 maps to 2, 3 maps to 6, 4 maps to 1, 5 maps to 5, 6 maps to 3, and 7 maps to 7. When performing a map operation, we have some flexibility in the order in which elements of the map are processed. Modern memory systems are most efficient when data is accessed in large, coherent blocks. So we may want to reorder processing so that once a block is read into on-chip memory, we gather all the elements from it that we need before moving onto the next. Of course, a map operation may have multiple gather operations, and also has to write its outputs somewhere. Reordering the elements of a map needs to consider the effect on the efficiency of all inputs and the output. However, if the inputs are static AND the platform knows they are static, there may be significant opportunities for optimization.

There are also some patterns of Gather that are regular. For instance, a filtering operation may want to access certain neighbours of a pixel, but for every output pixel needs the same relative neighbourhood. This is a very common pattern in imaging and simulation, see the Stencil pattern. Another regular pattern, which I call the Partition pattern, results when input is explicitly broken into a set of coherent regions. Both of these can benefit from specific optimizations.

In general, though, addresses for a gather are computed dynamically, and are irregular. For instance, such a pattern may be needed when traversing a data structure for a ray-tracer. In this case, it is still possible to improve performance by using "extra" parallelism to hide latency. A memory system may have a high bandwidth, but may take a certain amount of time to satisfy each memory request. However, it is often possible to have many outstanding memory requests in flight at the same time. To hide the latency of memory access, we start one thread until it tries to access memory, then while that thread is waiting for memory we start another when, and when it reaches a point where it is accessing memory we start another one, and so forth until the memory request for the first thread can be satisfied. This approach may be supported in hardware: "hyperthreading", or simultaneous multithreading, is mostly useful for hiding memory access latency. However, the technique can also be implemented in software by unrolling loops and using software pipelining. These optimizations can and should be optimized by a code generator, however, since they are complex and make code hard to understand and maintain. Also, unrolling loops also increases the pressure on other resources, such as cache and registers, so a balance has to be struck.

In summary, the simple operation of gather has a couple of variants with associated optimizations. If a gather is static, we may be able to reorder it for higher performance. If it is dynamic, we may be able to use latency hiding, as long as we do not exceed other resource limits. And finally, if the gather satisfies certain regular patterns, we may be able to apply yet another set of optimizations.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video