# Algorithm Improvement through Performance Measurement: Part 1

### Correctness

One method for verification of correctness is to compare algorithm implementations to STL `sort()` to make sure they produce equivalent results, but that assumes STL `sort()` is correct. To not rely on STL `sort()` correctness requires implementing a correctness test for sorting algorithms. Correctness requires that `array[i] ≤ array[i+1]` for all elements of the array, which is simple to implement. Of course, comparison to results from STL `sort()` would be a nice affirmation of correctness as well. These two tests (stand alone correctness test and comparison to STL `sort()`) were used to test all implemented routines. Input arrays of size 0 and 1 were also tested.

### Performance Comparison

The performance comparison setup was as follows: Visual Studio 2008, optimization project setting is set to optimize for speed, Intel Core 2 Duo CPU E8400 at 3 GHz (64 Kbytes L1 and 6 Mbytes L2 cache) — 14 stage pipeline with 1,333 MHz front-side bus, 2 GBytes of system memory (dual-channel 64-bits per channel, 800 MHz DDR2), motherboard is DQ35JOE.

Table 1 and Figure 1 compare the four sorting algorithms discussed above with varied sizes of arrays filled with random numbers. Each element in the arrays is of type `float` — 32-bit floating-point.

Table 1

Figure 1

Since the standard C runtime library random number generator function `rand()` only creates about 32K unique values, for each array element several random numbers were generated. Then they were multiplied together to produce a single value, as shown below:

```
a[ i ] = (float)(((double)rand() / (double)RAND_MAX ) *
…
((double)rand() / (double)RAND_MAX ));

```

The reason for the division in the code above is to produce a floating-point number between 0.0 and 1.0, and multiplying several of these numbers still produces a value between 0.0 and 1.0. For each array size the number of unique values was determined, which was always within 0.4% of the number of elements in the array. Without doing this procedure the number of unique values maxed out at RAND_MAX, which is about 32K, and benchmark results of sorting algorithms were severely tainted as some of the algorithms run significantly faster for inputs containing a few unique values. This is not an optimal method of generating random numbers, but is sufficient — better methods will be explored in future articles.

Table 2 and Figure 2 compare the four sorting algorithms using pre-sorted data. For some algorithms this presents the best case for performance.

Table 2

Figure 2

Table 3 and Graph 3 use reverse data (sorted but backwards within the array), as this presents the worst case input for some algorithms.

Table 3

Figure 3

The sorting algorithms being tested are all in-place, which means they operate on the original array provided to them, and thus running the same array multiple times would be an errant measuring technique as the array will be sorted after the first run, with all subsequent runs operating on already sorted array. To measure performance multiple arrays must be used, filled with variety of data, and then operated on by the sorting algorithm. The technique that was used started with 100K arrays, each of 10 elements, followed by 10K arrays, each of 100 elements, followed by 1K arrays, each of 1K elements, and so on, until 1 array with 1M elements.

The plots may seem a little strange at first, since they are logarithmic. The beauty of logarithmic plots, however, is that very large ranges of data can be covered (this is one of the reasons human senses are logarithmic). Both X and Y axes are logarithmic (log10). The slope in the plots relates to exponents by the laws of logarithms:

Thus, on log plots exponent appears as the multiplier on the slope; for example, slope of 1 has an exponent of 1, slope of 2 has an exponent of 2, and slope of 3 has an exponent of 3, and so on. Log plots are a great way to compare linear and exponential behavior across large ranges of data — in the data sets above the exponent range is from -8 to +3, which is 11 orders of magnitude (100 million).

From these plots it's clear that two of the four algorithms (STL `sort()`, and `qsort`) have nearly linear performance; for instance, when the array size goes up by 10X, the run time goes up by 10X as well. This behavior is consistent across all three data sets. Selection Sort has double the slope; when array size goes up by 10X, the run time goes up by 100 — clearly showing `O(n2)` order for this algorithm.

Selection Sort is consistent in its performance across the three data sets, showing its data independent behavior — `O(n2)` no matter what. Selection Sort beats Insertion Sort for random and reverse input data sets. Selection Sort beats all algorithms for array of 10 elements with reverse input data set.

STL `sort()` consistently outperforms qsort, across all array sizes and all input data sets. The only cases that STL loses are for the presorted input data with Insertion Sort for all array sizes and Selection Sort with array of 10 elements. STL `sort()` and `qsort()` perform somewhat independent of data being random, presorted and reverse, but random data set performs several times worse. These measurements imply that performance is not bound by the presorted and reverse data inputs sets as the best case and the worst case inputs, but instead a particular random data input pattern will significantly exacerbate algorithm performance.

Insertion Sort shows `O(n2)` behavior for random and reverse input data sets, but `O(n)` for the presorted input data set. When the array is presorted, Insertion Sort keeps up its performance lead no matter the array size. The reason for this is that it never enters the inner for loop, thus not moving any of the array elements. Plus, in this case it does only `(n-1)` comparisons of elements in total, making its linear `O(n2)` behavior evident. Reverse data input set is worse than random or presorted input data sets.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

# First C Compiler Now on Github

The earliest known C compiler by the legendary Dennis Ritchie has been published on the repository.

# HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

# Building Bare Metal ARM Systems with GNU

All you need to know to get up and running... and programming on ARM

# Amazon's Vogels Challenges IT: Rethink App Dev

Amazon Web Services CTO says promised land of cloud computing requires a new generation of applications that follow different principles.

# How to Select a PaaS Partner

Eventually, the vast majority of Web applications will run on a platform-as-a-service, or PaaS, vendor's infrastructure. To help sort out the options, we sent out a matrix with more than 70 decision points to a variety of PaaS providers.

More "Best of the Web" >>