Channels ▼
RSS

Parallel

Parallel In-Place Merge


Introduction

In my article on parallel merge, I developed and optimized a generic parallel merge algorithm. It utilized multiple CPU cores and scaled well in performance. The algorithm was stable, leading directly to Parallel Merge Sort. However, it was not an in-place implementation. In this article, I develop an in-place parallel merge, which enables in-place parallel merge sort.

The STL implements sequential merge and in-place merge algorithms, as merge() and inplace_merge(), which run on a single CPU core. The merge algorithm is O(n), leading to O(nlgn) merge sort, while the in-place merge is O(nlgn), leading to O(n(lgn)2) in-place merge sort. Because merge is faster than in-place merge, but requires extra memory, STL favors using merge whenever memory is available, copying the result back to the input array (under the hood).

Revisit Not-In-Place

The divide-and-conquer merge algorithm described in Parallel Merge, which is not-in-place, is illustrated in Figure 1. At each step, this algorithm moves a single element X from the source array T to the destination array A, as follows.


Figure 1. A merge algorithm that is not in place.

The two input sub-arrays of T are from [p1 to r1] and from [p2 to r2]. The output is a single sub-array of A from [p3 to r3]. The divide step is done by choosing the middle element within the larger of the two input sub-arrays—at index q1. The value at this index is then used as the partition element to split the other input sub-array into two sections - less than X and greater than or equal to X. The partition value X (at index q1 of T) is copied to array A at index q3. The conquer step recursively merges the two portions that are smaller than or equal to X— indicated by light gray boxes and light arrows. It also recursively merges the two portions that are larger than or equal to X—indicated by darker gray boxes and dark arrows.

The algorithm proceeds recursively until the termination condition —when the shortest of the two input sub-arrays has no elements. At each step, the algorithm reduces the size of the array by one element. One merge is split into two smaller merges, with the output element placed in between. Each of the two smaller merges will contain at least N/4 elements, since the left input array is split in half.

This algorithm is O(n) and is stable. It was shown not to be performance competitive in its pure form. However, performance became equivalent to other merges in a hybrid form, which utilized a simple merge to terminate recursion early. The hybrid version enabled parallelism by using of divide-and-conquer method, scaled well across multiple cores, outperforming STL merge by over 5X on quad-core CPU, being limited by memory bandwidth. It was also used as the core of Parallel Merge Sort, which scaled well across multiple cores, utilizing a further hybrid approach to gain more speed, and providing a high degree of parallelism, Θ(n/lg2n).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video