Swapping two elements within an array quickly is simple, requiring a single temporary storage element.

template < class Item > inline void exchange( Item& A, Item& B ) { Item t = A; A = B; B = t; }

Swapping two non-overlapping subarrays of the same size is also simple.

**Figure 1.**

In Figure 1, the red subarray, containing two elements, is swapped with the blue subarray, also containing two elements. The order of elements within each subarray is preserved. The two subarrays can have zero or more elements separating them. The following code implements an equal-sized non-overlapping Block Swap:

template< class _Type > inline void swap( _Type* x, unsigned a, unsigned b, unsigned m ) { for( ; m > 0; m-- ) exchange( x[ a++ ], x[ b++ ] ); }

If the two subarrays are overlapping, an issue arises.

**Figure 2.**

In Figure 2, the first subarray consists of elements 0 and 1, and the second consists of elements 1 and 2. Both subarrays contain element 1. The result should be as if we made copies of the two subarrays, swapped them, and then merged the result, as illustrated in Figure 3:

**Figure 3.**

In Figure 3, the first row shows the original array with two overlapping subarrays; the second row separates the two subarrays; the third row swaps the two subarrays; and the fourth row attempts to combine the two results, leading to an ambiguity due to two possible results in the overlapping middle element.

We could define the overlapping swap operation to favor the first or the second subarray, but in either case, some array element is lost, which is be problematic because swapping is expected to preserve array elements.

Swapping two non-overlapping subarrays of any size, equal or not, is called Block Exchange, Block Swap, or Section Swap.

**Figure 4.**

In Figure 4, in the left column, the red subarray of one element is swapped with the blue subarray of two elements, with a one separating element in grey. In the right column, the red subarray of two elements is swapped with the blue element of three elements, with no elements of separation. Note that the order of elements within each subarray is preserved.

Several algorithms have been developed to solve this problem in linear time, in-place. Some of these algorithms are easily parallelized well, while others are not. In this article, I explain these sequential algorithms and measure their performance. In upcoming articles, the algorithms will be generalized and parallelized. Finally, a general, high-performance sequential and parallel Block Exchange (Block Swap) algorithm will emerge, which will serve as a core building block for other in-place parallel algorithms.

### Juggling Algorithm

Jon Bentley describes three algorithms for Block Exchange in his book, Programming Pearls, (pp. 13-15, 209-211) and in an oft-cited article [PDF]. Figure 5 shows how the first of these, the Juggling algorithm, works. The first row is the starting array, where the red subarray of two elements and the blue subarray of four elements are to be swapped. The last row is the result, and each row is a step of the algorithm:

**Figure 5.**

The column on the left shows the single element of the extra storage space used to support its in-place capability. Element 0 is moved to this extra storage space, leaving a hole within the array (the second row of the illustration). In the third row, element 2 is moved into the hole, leaving a new hole. In row four, element 4 is moved into the hole, leaving a new hole. In row five, wrap-around occurs forming a loop, landing on element 0, which is retrieved from the extra storage space. In row six, a new loop is started at element 1, following the same steps. Bentley's Juggling algorithm implementation is shown in Listing One.

#### Listing One: Bentley's Juggling algorithm.

// Juggling algorithm for block swapping. // Greatest Common Divisor. Assumes that neither input is zero inline int gcd( int i, int j ) { if ( i == 0 ) return j; if ( j == 0 ) return i; while ( i != j ) { if ( i > j ) i -= j; else j -= i; } return i; } template< class _Type > inline void block_exchange_juggling_Bentley( _Type* a, int l, int m, int r ) { int u_length = m - l + 1; int v_length = r - m; if ( u_length <= 0 || v_length <= 0 ) return; int rotdist = m - l + 1; int n = r - l + 1; int gcd_rotdist_n = gcd( rotdist, n ); for( int i = 0; i < gcd_rotdist_n; i++ ) { // move i-th values of blocks _Type t = a[ i ]; int j = i; while( true ) { int k = j + rotdist; if ( k >= n ) k -= n; if ( k == i ) break; a[ j ] = a[ k ]; j = k; } a[ j ] = t; } }

Rotation distance (Ρ) is the greatest common devisor (`gcd`

) of the length of the first subarray and the length of the entire array. The algorithm makes Ρ number of loops — the limit on the `for`

loop (`gcd_rotdist_n`

). Within each loop *n*/Ρ elements are moved by Ρ elements to the left. The algorithm has linear time performance because each element of the array is moved once by the rotation distance. The number of loops is between one and the length of the left subarray. Note that swapping is not used, but instead, the elements are moved in a similar fashion to the efficient implementation of Insertion Sort.