Channels ▼
RSS

Parallel

Parallel In-Place Merge Sort


Merge sort is a wonderful, widely used sorting algorithm, with consistent data-independent performance. When not in-place merge sorting, that is when the source and destination array are not the same, performance is O(nlgn). When in-place, so the source and destination array are the same, performance is slightly slower: O(nlg2n). Because not-in-place sorting algorithms are faster, implementations frequently allocate memory for the destination array, copy the sorted result back into the source array, and proceed to deallocate the destination array. STL uses this type of a strategy to construct a fast in-place sort from a faster not-in-place sort whenever memory is available. Of course, when memory is not available, in-place sort is necessary, and STL also provides such an implementation.

However, the STL sort is a non-parallel implementation and runs only on a single core of today's multicore processors, utilizing just a fraction of the available computational resources. Several parallel sorting algorithms have been developed: Parallel Radix Sort, Parallel Counting Sort, and Parallel Merge Sort.

At the core of merge sort is the merge algorithm. Fast not-in-place Parallel Merge Sort was developed using a Parallel Merge that is not-in-place. Also, a Parallel In-Place Merge has been developed. Now, it's time to use this merge to construct an in-place merge sort that is parallel, and then see how well it performs.

Implementation

Listing One shows a simple implementation of In-Place Merge Sort, which calls itself recursively twice — once to sort the left half and once to sort the right half, followed by an in-place merge. This in-place merge merges the two sorted left and right half results into a single sorted result. The implementation is generic, being able to sort any data type that can use comparisons.

Listing One

template< class _Type >
inline void merge_sort_in_place( _Type* a, int l, int r )
{
	if ( r <= l ) return;
	int m = (( r + l ) / 2 );
	merge_sort_in_place( a, l,     m );  // recursive call left  half
	merge_sort_in_place( a, m + 1, r );  // recursive call right half
	merge_in_place_L1( a, l, m, r );     // merge the results
}

Listing Two shows a parallel implementation, which makes the two recursive calls in parallel, followed by a parallel in-place merge. It uses Intel's open-source Threading Building Blocks (TBB) or Microsoft's Parallel Pattern Library (PPL). The use of C++ lambda functions simplifies the implementation significantly. This implementation is also generic.

Listing Two

template< class _Type >
inline void p_merge_sort_in_place( _Type* a, int l, int r )
{
	if ( r <= l ) return;
	int m = (( r + l ) / 2 );
    //tbb::parallel_invoke(
	Concurrency::parallel_invoke(
		[&] { p_merge_sort_in_place( a, l,     m ); },	// parallel recursive call left  half
		[&] { p_merge_sort_in_place( a, m + 1, r ); }	// parallel recursive call right half
	);
	p_merge_in_place_2( a, l, m, r );	// parallel in-place merge
}

The non-parallel and parallel versions are strikingly similar. Both are fully functional algorithms and are compact. In-place merge can be found in my Parallel In-Place Merge article, Listing One for non-parallel and Listing Three for parallel. Having these in-place merge implementations makes these compact merge sort implementations possible. If you want to contrast in-place merge with a not-in-place merge sort, take a look at Listing One in Parallel Merge Sort.

Let's take a look at performance of these algorithms.

Performance

Table 1 and Figure 1 compare performance of the non-parallel in-place merge sort with the parallel implementation, both in their initial pure forms, as well as STL sort and parallel not-in-place merge sort. Sorting of arrays of 32-bit unsigned integers were benchmarked on Intel i7 3630QM quad-core CPU with hyperthreading running at 3.2 GHz, using 16 GB of system memory.

Time Per Run

 

 

 

 

 

 

 

 

 

Number of items

10

100

1K

10K

100K

1M

10M

100M

1B

In-Place Merge Sort

3.55E-07

9.00E-06

1.52E-04

2.21E-03

2.96E-02

3.7E-01

4.5E+00

5.4E+01

6.4E+02

Parallel In-Place Merge Sort

2.6E-06

1.1E-05

6.6E-05

5.7E-04

5.52E-03

5.35E-02

6.42E-01

7.62E+00

8.77E+01

STL sort

1.07E-07

2.62E-06

4.17E-05

5.80E-04

7.33E-03

8.9E-02

1.0E+00

1.2E+01

1.4E+02

Parallel Merge Sort hybrid

2.5E-07

3.6E-06

2.0E-05

1.8E-04

1.38E-03

1.42E-02

1.64E-01

1.81E+00

2.13E+01

Merge Sort / Parallel Merge Sort

0.1

0.8

2.3

3.9

5.4

7.0

7.1

7.1

7.3

STL sort / Parallel In-Place

0.0

0.2

0.6

1.0

1.3

1.7

1.6

1.6

1.6

Parallel In-Place / Parallel Not-In-Place

10.4

3.0

3.3

3.2

4.0

3.8

3.9

4.2

4.1

Table 1.

MergeSort
Figure 1.

Performance measurements show that the non-parallel, in-place merge sort is the slowest algorithm. Parallel implementation accelerates performance by over 7X and is faster than STL sort by 1.6X. The not-in-place parallel merge sort is the fastest merge sorting algorithm, beating the parallel in-place version by more than 4X. It also beat STL by more than 6X.

However, the parallel not-in-place merge sort is a hybrid implementation, using a non-parallel algorithm to terminate parallel recursion early, whereas the parallel in-place merge sort is not a hybrid. Let's see if the parallel in-place merge sort can be optimized.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.