Channels ▼
RSS

Parallel

Parallel In-Place Merge Sort


Optimization

One performance optimization strategy successfully used for other parallel sorting algorithms has been the hybrid approach, where the non-parallel algorithm is used to terminate recursion of the parallel algorithm early. Listing Three shows the non-parallel hybrid merge sort implementation.

Listing Three

template< class _Type >
inline void merge_sort_in_place_1( _Type* a, int l, int r )
{
	if ( r <= l ) return;
	if (( r - l ) <=  32 ) { insertionSortSimilarToSTLnoSelfAssignment( a+l, r-l+1 ); return; }
	int m = (( r + l ) / 2 );
	merge_sort_in_place_1( a, l,     m );  // recursive call left  half
	merge_sort_in_place_1( a, m + 1, r );  // recursive call right half
	merge_in_place_L1( a, l, m, r );     // merge the results
}

This implementation uses Insertion Sort to terminate recursion earlier. An in-place sorting algorithm must be used in this case, and insertion sort is a good choice due to its cache-friendly disposition. It wouldn't be fair to use STL sort in this case, because STL sort is overall faster than non-parallel merge sort, and thus could entirely replace it at any level of recursion.

Listing Four shows the parallel in-place hybrid implementation, using several potential in-place non-parallel sort algorithms as early recursion terminators. All of these are fair game since they are all slower than the parallel algorithm itself.

Listing Four

template< class _Type >
inline void p_merge_sort_in_place_2( _Type* a, int l, int r )
{
	if ( r <= l ) return;
//	if (( r - l ) <= 100 ) { merge_sort_in_place( a, l, r ); return; }
//	if (( r - l ) <=  32 ) { insertionSortSimilarToSTLnoSelfAssignment( a+l, r-l+1 ); return; }
	if (( r - l ) <= 1024 ) { sort( a+l, a+r+1 ); return; }
	int m = (( r + l ) / 2 );
    //tbb::parallel_invoke(
	Concurrency::parallel_invoke(
		[&] { p_merge_sort_in_place_2( a, l,     m ); },	// parallel recursive call left  half
		[&] { p_merge_sort_in_place_2( a, m + 1, r ); }		// parallel recursive call right half
	);
	p_merge_in_place_2( a, l, m, r );	// parallel in-place merge
}

Hybrid Performance

Table 2 shows performance measurements of pure and hybrid parallel and non-parallel in-place merge sort algorithm implementations.

Time Per Run

 

 

 

 

 

 

 

 

 

Number of items

10

100

1K

10K

100K

1M

10M

100M

1B

Parallel Merge Sort pure

2.6E-06

1.1E-05

6.6E-05

5.7E-04

5.52E-03

5.35E-02

6.42E-01

7.62E+00

8.77E+01

Parallel Merge Sort hybrid

1.10E-07

2.64E-06

4.26E-05

1.87E-04

2.17E-03

2.6E-02

3.6E-01

4.8E+00

6.4E+01

Merge Sort pure

3.55E-07

9.00E-06

1.52E-04

2.21E-03

2.96E-02

3.7E-01

4.5E+00

5.4E+01

6.4E+02

Merge Sort hybrid

8.35E-08

4.54E-06

1.04E-04

1.79E-03

2.46E-02

3.2E-01

4.1E+00

5.0E+01

5.9E+02

Parallel pure/hybrid

24.1

4.2

1.5

3.1

2.5

2.1

1.8

1.6

1.4

Non-Parallel pure/hybrid

4.3

2.0

1.5

1.2

1.2

1.2

1.1

1.1

1.1

Table 2.

Parallel implementation shows the highest performance improvement of 40% from a hybrid implementation. Non-parallel hybrid implementation benefits by about 10%. Interestingly, both show diminishing gain from the hybrid approach as the array size increases. This is due to the non-parallel portion making a smaller contribution as array size increases. One possibility is that within this limit, the speed-up from the hybrid approach would diminish to zero and would not be necessary, while still being beneficial for the array sizes presented. Another possibility is that within this limit, the speed-up approaches a constant due to recursion causing a binary tree where half of the nodes are leaf nodes — the non-parallel algorithm execution.

Parallel in-place merge sort is now 2.2X faster than the STL sort. The hybrid approach shows a reduced performance differential between in-place and not-in-place from 4X to 3X, with in-place being slower.

Conclusion

Parallel in-place merge led to the development of a simple, compact and generic parallel in-place merge sort that performed competitively. Both parallel merge sort algorithms, in-place and not-in-place, utilized all the cores within the multicore CPU, leading to 100% CPU utilization (versus only 12.5% CPU utilization by STL sort). Being in-place carries a penalty of 3X performance loss versus not-in-place. Reducing this performance penalty and improving performance further will be the focus of future development.

Related Articles

Algorithm Improvement through Performance Measurement: Part 1

Parallel In-Place Radix Sort Simplified

Parallel Counting Sort

Parallel Merge Sort

Parallel Merge

Parallel In-Place Merge


Victor Duvanenko is a frequent contributor to Dr. Dobb's on algorithms for parallel programming.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.