# Algorithm Improvement through Performance Measurement: Part 1

### Improving Insertion Sort

It seems that Insertion Sort performs many swaps in the inner for loop. These swaps consist of three read and three write operations. These swaps have redundancy between loop iterations. By examining the algorithm carefully, all intermediate writes of the initial a[j] value can be seen redundant and can be removed, except for the last write to the final location. STL sort(), which uses Insertion Sort internally implements such an optimization, and so do [5] and [6]. This optimization is shown in Listing Five. This improvement results in significantly fewer reads and writes (3-to-1 reduction) by using a realization that array elements can be shifted to the right and then the current element can be moved to the final position.

#### Listing Five

```
// Listing5.cpp

// Optimization borrowed from STL, which does not do swap during every iteration, and reduces the number of reads and writes.
// However, the number of reads and writes is still O( n^2 ).  This implementation is substantially faster than using swaps!
void insertionSortSimilarToSTL( float* a, unsigned long a_size )
{
for ( unsigned long i = 1; i < a_size; i++ )
{
unsigned long j;
float currentElement = a[ i ];
for ( j = i; j > 0 && currentElement < a[ j - 1 ]; j-- )
{
a[ j ] = a[ j - 1 ];
}
a[ j ] = currentElement;	// Notice that here we can copy an element onto itself
}
}

```

Performance measurements of this improved version are shown in Tables 4, 5, and 6, along with speed-up obtained, shown in the last row of each table.

Table 4

Table 5

Table 6

However, this improvement introduces a side-effect that the original version in Listing Two, which used swaps, did not contain. If the first comparison in the inner for loop failed, then the original version performed no swaps and thus no writes to the array, but the improved version does. This side-effect can be seen for presorted input data set. Because the write is outside of the inner for loop, it is performed no matter what the inner for loop does, even if it initially fails. This case results in self-assignment; for example, `a[i] = a]i]`.

Obviously, self-assignment is redundant work, and can be eliminated using an if statement comparing indexes. However, this adds a comparison overhead that is incurred on every assignment whether self-assigning or not. The majority of cases will not be doing self-assigning, and thus it would be inefficient to have all cases suffer in performance for a rare condition of self-assignment.

Listing Six shows a new implementation which avoids self-assignment without doing any extra work, by unrolling the first loop, which is the only case where self-assignment can occur.

#### Listing Six

```
// Listing6.cpp

void insertionSortSimilarToSTLnoSelfAssignment( float* a, unsigned long a_size )
{
for ( unsigned long i = 1; i < a_size; i++ )
{
if ( a[ i ] < a[ i - 1 ] )		// no need to do (j > 0) compare for the first iteration
{
float currentElement = a[ i ];
a[ i ] = a[ i - 1 ];
unsigned long j;
for ( j = i - 1; j > 0 && currentElement < a[ j - 1 ]; j-- )
{
a[ j ] = a[ j - 1 ];
}
a[ j ] = currentElement;	// always necessary work/write
}
// Perform no work at all if the first comparison fails - i.e. never assign an element to itself!
}
}

```

Otherwise, once the for loop starts rolling past the initial iteration, self-assignment cannot occur. Performance measurements of this improved version are shown in Tables 4, 5 and 6.

The majority of the performance improvements come from reducing the number of reads and writes during swaps, which shows a significant speedup (2.4X to 4X) with slight improvement due to elimination of self-assignment. Insertion Sort ability to not perform any writes for presorted input data sets has also been restored, and the measurements captured in Table 5 for presorted input data confirms it.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

# First C Compiler Now on Github

The earliest known C compiler by the legendary Dennis Ritchie has been published on the repository.

# HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

# Building Bare Metal ARM Systems with GNU

All you need to know to get up and running... and programming on ARM

# Amazon's Vogels Challenges IT: Rethink App Dev

Amazon Web Services CTO says promised land of cloud computing requires a new generation of applications that follow different principles.

# How to Select a PaaS Partner

Eventually, the vast majority of Web applications will run on a platform-as-a-service, or PaaS, vendor's infrastructure. To help sort out the options, we sent out a matrix with more than 70 decision points to a variety of PaaS providers.

More "Best of the Web" >>