Channels ▼


Algorithm Improvement through Performance Measurement: Part 1

Improving Insertion Sort

It seems that Insertion Sort performs many swaps in the inner for loop. These swaps consist of three read and three write operations. These swaps have redundancy between loop iterations. By examining the algorithm carefully, all intermediate writes of the initial a[j] value can be seen redundant and can be removed, except for the last write to the final location. STL sort(), which uses Insertion Sort internally implements such an optimization, and so do [5] and [6]. This optimization is shown in Listing Five. This improvement results in significantly fewer reads and writes (3-to-1 reduction) by using a realization that array elements can be shifted to the right and then the current element can be moved to the final position.

Listing Five

// Listing5.cpp

// Optimization borrowed from STL, which does not do swap during every iteration, and reduces the number of reads and writes.
// However, the number of reads and writes is still O( n^2 ).  This implementation is substantially faster than using swaps!
void insertionSortSimilarToSTL( float* a, unsigned long a_size )
	for ( unsigned long i = 1; i < a_size; i++ )
		unsigned long j;
		float currentElement = a[ i ];
		for ( j = i; j > 0 && currentElement < a[ j - 1 ]; j-- )
			a[ j ] = a[ j - 1 ];
		a[ j ] = currentElement;	// Notice that here we can copy an element onto itself

Performance measurements of this improved version are shown in Tables 4, 5, and 6, along with speed-up obtained, shown in the last row of each table.

Table 4

Table 5

Table 6

However, this improvement introduces a side-effect that the original version in Listing Two, which used swaps, did not contain. If the first comparison in the inner for loop failed, then the original version performed no swaps and thus no writes to the array, but the improved version does. This side-effect can be seen for presorted input data set. Because the write is outside of the inner for loop, it is performed no matter what the inner for loop does, even if it initially fails. This case results in self-assignment; for example, a[i] = a]i].

Obviously, self-assignment is redundant work, and can be eliminated using an if statement comparing indexes. However, this adds a comparison overhead that is incurred on every assignment whether self-assigning or not. The majority of cases will not be doing self-assigning, and thus it would be inefficient to have all cases suffer in performance for a rare condition of self-assignment.

Listing Six shows a new implementation which avoids self-assignment without doing any extra work, by unrolling the first loop, which is the only case where self-assignment can occur.

Listing Six

// Listing6.cpp

void insertionSortSimilarToSTLnoSelfAssignment( float* a, unsigned long a_size )
	for ( unsigned long i = 1; i < a_size; i++ )
		if ( a[ i ] < a[ i - 1 ] )		// no need to do (j > 0) compare for the first iteration
			float currentElement = a[ i ];
			a[ i ] = a[ i - 1 ];
			unsigned long j;
			for ( j = i - 1; j > 0 && currentElement < a[ j - 1 ]; j-- )
				a[ j ] = a[ j - 1 ];
			a[ j ] = currentElement;	// always necessary work/write
		// Perform no work at all if the first comparison fails - i.e. never assign an element to itself!

Otherwise, once the for loop starts rolling past the initial iteration, self-assignment cannot occur. Performance measurements of this improved version are shown in Tables 4, 5 and 6.

The majority of the performance improvements come from reducing the number of reads and writes during swaps, which shows a significant speedup (2.4X to 4X) with slight improvement due to elimination of self-assignment. Insertion Sort ability to not perform any writes for presorted input data sets has also been restored, and the measurements captured in Table 5 for presorted input data confirms it.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.