Channels ▼
RSS

Parallel

Parallel Merge


Conclusion

Parallel hybrid merge algorithm was developed that outperformed sequential simple merge as well as STL merge by 0.9-5.8 times overall and by over 5 times for larger arrays, running on a quad-core processor. The pure parallel merge algorithm was shown to perform poorly — substantially slower than the simple sequential merge.

Microsoft PPL and Intel TBB technologies were used for parallel algorithm development, with comparable performance and identical implementations, simplifying development significantly. Lambda functions were used to simplify the parallel implementation.

A parallel algorithm may use a different strategy than a sequential algorithm. The simple sequential merge iterated through the two input arrays, while the parallel divide-and-conquer algorithm split the problem into two sub-problems recursively.

Pure divide-and-conquer sequential and parallel algorithms failed to deliver competitive performance due to large computational overhead per output element and large task-switching overhead. Simple merge has a small constant as it performs little work per output element. The hybrid algorithm approach yielded performance gains for sequential and parallel merge algorithms. For sequential merge, the hybrid divide-and-conquer algorithm outperformed the simple merge algorithm, which seems questionable.

For parallel algorithms, divide-and-conquer proved to be a good strategy, especially coupled with the hybrid approach, enabling parallel processing with high performance. A fast sequential merge was shown to be a critical component of the fast parallel hybrid algorithm.

The hybrid algorithm improved the memory access pattern of the divide-and-conquer algorithm. Pure divide-and-conquer has a nonsequential memory access pattern. The hybrid approach used the simple merge, which has incrementing memory addressing for both inputs and the output. The hybrid parallel algorithm approach provided over 20 times the performance gain over the pure parallel divide-and-conquer algorithm. Otherwise, the pure parallel algorithm was measured to be slower than the simple sequential merge.

As long as task-switching overhead continues to be fairly high, a high-performance parallel algorithm will require a high-performance sequential algorithm to use as a recursion termination case or within the minimal parallel work quanta.

The sequential simple merge algorithm measured at 1.2 GB/second of memory bandwidth. The parallel hybrid merge algorithm used about 7.1 GB/second, nearing the memory bandwidth limit for scalar code, and possibly limiting performance.

References

[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, Third Edition, MIT Press, Sept. 2009, pp. 797-805


Parallel In-Place Radix Sort Simplified

Algorithm Performance

Parallel In-Place N-bit-Radix Sort

Quality of Random Numbers versus Performance

In-Place Hybrid Binary-Radix Sort

Optimizing Performance of Sorting Algorithms (Insertion and Selection Sort)

In-Place Hybrid N-bit-Radix Sort

Counting Sort Performance

Parallel Counting Sort

Parallel Counting Sort, Part 2

Stable Hybrid MSD N-bit-Radix Sort


Victor Duvanenko is a long-time contributor to Dr. Dobb's. He can be contacted at vduvanenko1@gmail.com



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video