### In-Place

The main idea for in-place merge, illustrated in Figure 2, is based on a modification to the not-in-place merge. Instead of moving a single element X at location q_{1} to location q_{3} at each recursive step, all elements between q_{1} and q_{2} are moved using In-Place Block Swap (Block Exchange). Let's go through this in detail.

**Figure 2 An in-place merge.**

The top row of Figure 2 shows the input array of elements and three indexes: l, m, and r. The left segment, which is between indexes l and m, inclusive, is pre-sorted. The right segment, which is between indexes m+1 and r, inclusive, is also pre-sorted. Merge algorithm uses these two segments to produce a single sorted result. In-place merge places the result back in the input array between indexes l and r (the bottom row of Figure 1), while using a small constant amount of extra memory. Note that the two input segments touch, whereas the not-in-place merge algorithm had array elements in between the two input segments.

The first step of this divide-and-conquer algorithm is to take the larger of the two input segments, divide it in half to determine the index q_{1} and its value X. Then a binary search is used within the other segment to determine the index q_{2}, at which all elements to the left are smaller than X, and all elements to the right are larger or equal to X. Thus, each of the two segments have been split into two halves, one less than X and the other greater than or equal to X. These two steps are the same as in not-in-place merge algorithm. The algorithm preserves stability because only the elements that are strictly less than X, but were to the right of X are moved to the left of X.

The next step is to in-place swap the segment q_{1} to m, inclusive, and the segment m+1 to q_{2} inclusive, as shown by the diagonal arrows in Figure 2. This swap would be simple if the segments were of equal size, but this occurs rarely. In-Place Block Swap comes to the rescue, because it's capable of swapping two segments of unequal size, which are touching, in-place. Three algorithms are described in In-Place Block Swap, and the one with the highest performance along with the best parallel scaling needs to be chosen.

The algorithm proceeds recursively by processing (l, q_{1}-1, q_{3}-1) segment and (q_{3}+1, q_{2}-1, r) segments of the bottom row of Figure 1. Note that the array element X at q_{3} is in its final location and needs no further processing.

### Sequential

Listing One shows a sequential divide-and-conquer In-Place Merge algorithm implementation. Lengths of the left and the right input array segments are computed first, determining which segment is larger. When the left (length1) segment is larger, the index of the mid-element is computed (q_{1}). Binary search is used to split the smaller segment (m+1 to r, with length2) using the value of q_{1}, into elements that are strictly smaller than the element at q1 (X in Figure 1). Being strictly less than X at q_{1}, preserves stability. This binary search produces q_{2}, which is the index of the split of the smaller segment (left segment, length2).

template< class _Type > inline int my_binary_search( _Type value, const _Type* a, int left, int right ) { long low = left; long high = __max( left, right + 1 ); while( low < high ) { long mid = ( low + high ) / 2; if ( value <= a[ mid ] ) high = mid; else low = mid + 1; // because we compared to a[mid] and the value was larger than a[mid]. // Thus, the next array element to the right from mid is the next possible // candidate for low, and a[mid] can not possibly be that candidate. } return high; } // Swaps two sequential sub-arrays ranges a[ l .. m ] and a[ m + 1 .. r ] template< class _Type > inline void block_exchange_mirror( _Type* a, int l, int m, int r ) { mirror_ptr( a, l, m ); mirror_ptr( a, m + 1, r ); mirror_ptr( a, l, r ); } // Merge two ranges of source array T[ l .. m, m+1 .. r ] in-place. // Based on not-in-place algorithm in 3rd ed. of "Introduction to Algorithms" p. 798-802 template< class _Type > inline void merge_in_place_L1( _Type* t, int l, int m, int r ) { int length1 = m - l + 1; int length2 = r - m; if ( length1 >= length2 ) { if ( length2 <= 0 ) return; // if the smaller segment has zero elements, then nothing to merge. int q1 = ( l + m ) / 2; // q1 is mid-point of the larger segment. length1 >= length2 > 0 int q2 = my_binary_search( t[ q1 ], t, m + 1, r ); // q2 is q1 partitioning element within the smaller sub-array (and q2 itself is part of the sub-array that does not move) int q3 = q1 + ( q2 - m - 1 ); block_exchange_mirror( t, q1, m, q2 - 1 ); merge_in_place_L1( t, l, q1 - 1, q3 - 1 ); // note that q3 is now in its final place and no longer participates in further processing merge_in_place_L1( t, q3 + 1, q2 - 1, r ); } else { // length1 < length2 if ( length1 <= 0 ) return; // if the smaller segment has zero elements, then nothing to merge int q1 = ( m + 1 + r ) / 2; // q1 is mid-point of the larger segment. length2 > length1 > 0 int q2 = my_binary_search( t[ q1 ], t, l, m ); // q2 is q1 partitioning element within the smaller sub-array (and q2 itself is part of the sub-array that does not move) int q3 = q2 + ( q1 - m - 1 ); block_exchange_mirror( t, q2, m, q1 ); merge_in_place_L1( t, l, q2 - 1, q3 - 1 ); // note that q3 is now in its final place and no longer participates in further processing merge_in_place_L1( t, q3 + 1, q1, r ); } }

**Listing One**