Speeding Up Code Without Doing Anything
Of all the techniques I use to speed up code, the one I like the most comes with just the press of a button, or more precisely at the swap of a compiler Every Intel compiler has this particular option, and I consider to be a great friend. I'm making a point of keeping you in suspense for a little while longer. Let me first tell you a couple of stories that prove the point.
I was involved some time back in supporting a company that was upgrading from version 9.0 to 10.1 of the Intel compiler (we are now on version 11.1). Actually there was no work involved, but I was 'on call' just in case they had any problems. Within a day of upgrading, the project manager wrote to me saying "we must have version 10.1 of the compiler. Our application speed has just doubled in performance." The application was to be their main 'bread winner' for the following two years.
I had a pretty good idea what had happened. In version 9.1 of the compiler, my favourite option had to be turned on explicitly by the developer. Like many users, they didn't get around to reading the user manual or experimenting with some of the compiler switches. In version 10.1 of the compiler Intel changed the default behaviour of the compiler, so my favourite option was already enabled -- hence the speed up in the customers code.
I experienced an even more significant speedup with another customer, resulting in their application running 10 times faster. The application was a car engine simulator which was used in the testing of electronic car management systems.
The engine simulator was first designed in MatLab, and then the resulting C code compiled with the Microsoft compiler. I installed the Intel C++ Compiler which is a plug and play replacement for the Microsoft C++ Compiler, and the simulation sped-up by a factor of 10.
I suppose I have to spill the beans. The option is auto-vectorisation. All recent versions of the Intel compiler have this option available. If you are already using the latest Intel compilers, such as the one that comes with Intel Parallel Studio then the auto-vectoriser is already turned on (unless you have explicitly turned it off).
In the case of the engine simulator, by upgrading the ecosystem to the latest multicore we achieved:
- An initial speedup of 20% by using the Intel® C++ Compiler on the original hardware
- A final speedup of over 76 (i.e. 7600%), which consisted of
- 10 times speedup due to enabling auto-vectorization
- 7 times speed up due to hardware upgrade
A fuller description of the Engine Simulation project can be found here
A free evaluation of Intel Parallel Studio can be downloaded from here.

