Speeding Up Code Without Doing Anything
Of all the techniques I use to speed up code, the one I like the most comes with just the press of a button, or more precisely at the swap of a compiler Every Intel compiler has this particular option, and I consider to be a great friend. I'm making a point of keeping you in suspense for a little while longer. Let me first tell you a couple of stories that prove the point.
I was involved some time back in supporting a company that was upgrading from version 9.0 to 10.1 of the Intel compiler (we are now on version 11.1). Actually there was no work involved, but I was 'on call' just in case they had any problems. Within a day of upgrading, the project manager wrote to me saying "we must have version 10.1 of the compiler. Our application speed has just doubled in performance." The application was to be their main 'bread winner' for the following two years.
I had a pretty good idea what had happened. In version 9.1 of the compiler, my favourite option had to be turned on explicitly by the developer. Like many users, they didn't get around to reading the user manual or experimenting with some of the compiler switches. In version 10.1 of the compiler Intel changed the default behaviour of the compiler, so my favourite option was already enabled -- hence the speed up in the customers code.
I experienced an even more significant speedup with another customer, resulting in their application running 10 times faster. The application was a car engine simulator which was used in the testing of electronic car management systems.
The engine simulator was first designed in MatLab, and then the resulting C code compiled with the Microsoft compiler. I installed the Intel C++ Compiler which is a plug and play replacement for the Microsoft C++ Compiler, and the simulation sped-up by a factor of 10.
I suppose I have to spill the beans. The option is auto-vectorisation. All recent versions of the Intel compiler have this option available. If you are already using the latest Intel compilers, such as the one that comes with Intel Parallel Studio then the auto-vectoriser is already turned on (unless you have explicitly turned it off).
In the case of the engine simulator, by upgrading the ecosystem to the latest multicore we achieved:
- An initial speedup of 20% by using the Intel® C++ Compiler on the original hardware
- A final speedup of over 76 (i.e. 7600%), which consisted of
- 10 times speedup due to enabling auto-vectorization
- 7 times speed up due to hardware upgrade
A fuller description of the Engine Simulation project can be found here
A free evaluation of Intel Parallel Studio can be downloaded from here.
Parallel Pattern 5: Stencil
All memory addresses used for reads are expressed as offsets
Distributing Work Across Cores Using .NET
A roll-your-own ThreadPool implementationLooking For The Lost Packets: Part 2
Looking For The Lost Packets: Part 1
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
February 18, 2010
Lock Contention, Using Intel Parallel Studio to Improve Performance
Speaker: Vasanth Tovinkere, Software Engineer, Intel Corporation (Bio)Vasanth Tovinkere is a software engineer in the Developer Products Division (DPD) at Intel. His current role involves defining novel approaches to understanding and visualizing parallel performance and consulting with strategic customers to help them prepare and deliver code for the multicore world. Vasanth has been involved in the development of automatic semantic event detectors for digital sports technologies in Intel Labs. He also has been awarded three patents and has two patents pending.
Abstract:
Discover how easy it is to use the power of Microsoft Visual Studio and Intel Parallel Studio to find performance issues due to lock contention in threaded applications. This ensures that shipped applications can take better advantage of multicore processors. In this webcast, we provide live demonstrations that show how to identify lock contentions issues with Visual Studio and Intel Parallel Studio, an add-in to Visual Studio that helps developers create fast, reliable code on multicore processors.t.



