Channels ▼
RSS

C/C++

Parallel In-Place Merge


Now, that the split has been found, block swap can be performed, in-place. This moves all elements less than X to the left of X, and all element greater than or equal to X to the right of X. However, elements left of X are in two segments that now need to be merged. Also the two segments of elements right of X also need to be merged. The two recursive calls take care of these two merge operations. The else portion takes care of the opposite case—when the left segment is larger than the right segment.

Note that the in-place merge algorithm used a non-optimized non-hybrid block swap algorithm, for simplicity. An optimized hybrid algorithm is definitely a performance boosting opportunity.

Performance of several in-place and not-in-place merge algorithms is shown in Table 1 and in Graph 1.

Time per Run

Number of items

50

500

5000

50000

500000

5.0E+06

5.0E+07

5.0E+08

STL inplace_merge

1.9E-06

6.6E-06

5.8E-05

9.1E-04

6.8E-03

8.2E-02

8.1E-01

8.8E+00

STL merge

3.6E-07

1.5E-06

1.2E-05

1.8E-04

2.0E-03

2.0E-02

2.0E-01

3.1E+00

In-Place Merge pure

2.9E-06

2.1E-05

2.1E-04

2.4E-03

2.4E-02

2.5E-01

2.8E+00

3.1E+01

STL inplace_merge /
STL merge

5.1

4.3

4.8

5.0

3.3

4.2

4.0

2.8

In-Place Merge pure /
STL inplace_merge

1.5

3.2

3.6

2.6

3.5

3.1

3.4

3.6

Table 1

Graph 1

STL merge is more than 2.8x faster than STL inplace_merge for all array sizes. The pure divide-and-conquer merge is more than 3X slower than STL inplace_merge for large arrays. This performance disparity may be attributed to STL inplace_merge being an adaptive algorithm, switching to STL merge when additional memory is available to allocate a temporary intermediate buffer and copying the result back to the input array. Also, nearly every algorithm that has been explored so far has performed poorly in its pure form. So, let's develop a hybrid in-place merge.

Listing Two shows such a hybrid in-place merge algorithm. STL inplace_merge is used as the recursion termination case—processing small sub-arrays.

template< class _Type >
inline void merge_in_place_1( _Type* t, int l, int m, int r )
{
	int length1 = m - l + 1;
	int length2 = r - m;
	if ( length1 >= length2 )
	{
		if ( length2 <= 0 )	return;
		if (( length1 + length2 ) <= 8192 )	{ std::inplace_merge( t + l, t + m + 1, t + r + 1 );  return; }
		int q1 = ( l + m ) / 2;								// q1 is mid-point of the larger segment
		int q2 = my_binary_search( t[ q1 ], t, m + 1, r );	// q2 is q1 partitioning element within the smaller sub-array (and q2 itself is part of the sub-array that does not move)
		int q3 = q1 + ( q2 - m - 1 );
		block_exchange_mirror( t, q1, m, q2 - 1 );		// 2X speedup
		merge_in_place_1( t, l,      q1 - 1, q3 - 1 );	// note that q3 is now in its final place and no longer participates in further processing
		merge_in_place_1( t, q3 + 1, q2 - 1, r      );	
	}
	else {
		if ( length1 <= 0 )	return;
		if (( length1 + length2 ) <= 8192 )	{ std::inplace_merge( t + l, t + m + 1, t + r + 1 );  return; }
		int q1 = ( m + 1 + r ) / 2;							// q1 is mid-point of the larger segment
		int q2 = my_binary_search( t[ q1 ], t, l, m );		// q2 is q1 partitioning element within the smaller sub-array (and q2 itself is part of the sub-array that does not move)
		int q3 = q2 + ( q1 - m - 1 );
		block_exchange_mirror( t, q2, m, q1 );			// 2X speedup
		merge_in_place_1( t, l,      q2 - 1, q3 - 1 );	// note that q3 is now in its final place and no longer participates in further processing
		merge_in_place_1( t, q3 + 1, q1,     r      );	
	}
}

Listing Two

Performance of the hybrid algorithm is shown in Table 2.

Time per Run

Number of items

50

500

5000

50000

500000

5.0E+06

5.0E+07

5.0E+08

STL inplace_merge

1.9E-06

6.6E-06

5.8E-05

9.1E-04

6.8E-03

8.2E-02

8.1E-01

8.8E+00

In-Place Merge pure

2.9E-06

2.1E-05

2.1E-04

2.4E-03

2.4E-02

2.5E-01

2.8E+00

3.1E+01

In-Place Merge hybrid

2.8E-06

7.0E-06

8.1E-05

9.9E-04

1.3E-02

1.6E-01

1.8E+00

1.9E+01

In-Place Merge hybrid /
STL inplace_merge

1.5

1.1

1.4

1.1

1.9

1.9

2.2

2.2

In-Place pure /
In-Place hybrid

1.0

3.1

2.6

2.4

1.9

1.6

1.5

1.6


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video