Intel's Hyper-Threading Strikes Back
Intel's Hyper-Threading technology was introduced with the 3.06 GHz Pentium 4 microprocessor. A few years later, the new Intel Core i7 processors offer Hyper-Threading again.
I must say that I had worked with Hyper-Threading technology in the Pentium 4 era and I really liked it. Therefore, I am happy that it is back in the new microprocessors. Why? Because it usually offers a very interesting performance improvement in applications that were parallelized taking into account its possible existence.
Without Hyper-Threading, a Core i7 microprocessor is a quad-core CPU offering four (4) physical processing cores. With Hyper-Threading enabled, it offers eight (8) logical processing cores (4 x 2 = 8). This means that an operating system prepared for SMP (Symmetrical MultiProcessing) will believe that the CPU has eight processing cores.
So, here is the question: Should I use eight (8) threads, according to the total number of logical processing cores or should I use four (4) threads, according to the total number of physical processing cores? If the application was designed taking into account the possibility of running on a multi-core microprocessor with Hyper-Threading technology, it should use eight (8) threads and there should be an important performance improvement.
One of the techniques to parallelize an algorithm is to transform a sequential code that is going to be run many times into a pipelined producer-consumer waterfall. This technique is very useful when you want to use multiple cores simultaneously but you do not have a perfect symmetry in the time needed for each step. Therefore, you use concurrent collections or lists to create a chain of independent producer-consumers. Many modern programming languages offer high-level structures to work with pipelines, concurrent collections or lists. Some will offer them in future versions, like Java with JDK 7 and C# with .Net 4.0 and its Parallel Extensions). Hence, it will be simpler than ever to create this complex but highly scalable pipelines.
If you run this kind of applications in a microprocessor with Hyper-Threading technology, you will be able to see a great performance improvement. Why? Because the duplication of the processing cores will allow you to run twice the number of threads and this will allow the pipeline to have more operations running in parallel. Besides, it will help to solve the asymmetries generated by the complexity of the algorithm.
Hyper-Threading technology works great when the data necessary for each pair of threads is available in the cache shared by each pair of logical cores – the same physical core. A pipelined producer-consumer waterfall usually has the data in cache to begin working in the next producer-consumer step. However, the data flow decomposition used in this kind of designs requires special care to eliminate startup and shutdown latencies. The good news is that Hyper-Threading technology solves this problem and allows the code to run more concurrently. Hence, it can show performance improvements of more than 50% using one thread per logical core (eight threads in a quad-core i7 processor). Thus, it is nice to see Hyper-Threading is back. Good news for the parallelization fans.
If you are interested in taking full advantage of Hyper-Threading, I have two recommended articles talking about pipeline design:
* Fundamental Concepts of Parallel Programming by Shameem Akhter and Jason Roberts.
* The Challenges of Developing Multithreaded Processing Pipelines by Ryan Bloom
Parallel Pattern 5: Stencil
All memory addresses used for reads are expressed as offsets
Distributing Work Across Cores Using .NET
A roll-your-own ThreadPool implementationLooking For The Lost Packets: Part 2
Looking For The Lost Packets: Part 1
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
February 18, 2010
Lock Contention, Using Intel Parallel Studio to Improve Performance
Speaker: Vasanth Tovinkere, Software Engineer, Intel Corporation (Bio)Vasanth Tovinkere is a software engineer in the Developer Products Division (DPD) at Intel. His current role involves defining novel approaches to understanding and visualizing parallel performance and consulting with strategic customers to help them prepare and deliver code for the multicore world. Vasanth has been involved in the development of automatic semantic event detectors for digital sports technologies in Intel Labs. He also has been awarded three patents and has two patents pending.
Abstract:
Discover how easy it is to use the power of Microsoft Visual Studio and Intel Parallel Studio to find performance issues due to lock contention in threaded applications. This ensures that shipped applications can take better advantage of multicore processors. In this webcast, we provide live demonstrations that show how to identify lock contentions issues with Visual Studio and Intel Parallel Studio, an add-in to Visual Studio that helps developers create fast, reliable code on multicore processors.t.



