Intel's Hyper-Threading Strikes Back
Intel's Hyper-Threading technology was introduced with the 3.06 GHz Pentium 4 microprocessor. A few years later, the new Intel Core i7 processors offer Hyper-Threading again.I must say that I had worked with Hyper-Threading technology in the Pentium 4 era and I really liked it. Therefore, I am happy that it is back in the new microprocessors. Why? Because it usually offers a very interesting performance improvement in applications that were parallelized taking into account its possible existence.
Without Hyper-Threading, a Core i7 microprocessor is a quad-core CPU offering four (4) physical processing cores. With Hyper-Threading enabled, it offers eight (8) logical processing cores (4 x 2 = 8). This means that an operating system prepared for SMP (Symmetrical MultiProcessing) will believe that the CPU has eight processing cores.
So, here is the question: Should I use eight (8) threads, according to the total number of logical processing cores or should I use four (4) threads, according to the total number of physical processing cores? If the application was designed taking into account the possibility of running on a multi-core microprocessor with Hyper-Threading technology, it should use eight (8) threads and there should be an important performance improvement.
One of the techniques to parallelize an algorithm is to transform a sequential code that is going to be run many times into a pipelined producer-consumer waterfall. This technique is very useful when you want to use multiple cores simultaneously but you do not have a perfect symmetry in the time needed for each step. Therefore, you use concurrent collections or lists to create a chain of independent producer-consumers. Many modern programming languages offer high-level structures to work with pipelines, concurrent collections or lists. Some will offer them in future versions, like Java with JDK 7 and C# with .Net 4.0 and its Parallel Extensions). Hence, it will be simpler than ever to create this complex but highly scalable pipelines.
If you run this kind of applications in a microprocessor with Hyper-Threading technology, you will be able to see a great performance improvement. Why? Because the duplication of the processing cores will allow you to run twice the number of threads and this will allow the pipeline to have more operations running in parallel. Besides, it will help to solve the asymmetries generated by the complexity of the algorithm.
Hyper-Threading technology works great when the data necessary for each pair of threads is available in the cache shared by each pair of logical cores - the same physical core. A pipelined producer-consumer waterfall usually has the data in cache to begin working in the next producer-consumer step. However, the data flow decomposition used in this kind of designs requires special care to eliminate startup and shutdown latencies. The good news is that Hyper-Threading technology solves this problem and allows the code to run more concurrently. Hence, it can show performance improvements of more than 50% using one thread per logical core (eight threads in a quad-core i7 processor). Thus, it is nice to see Hyper-Threading is back. Good news for the parallelization fans.
If you are interested in taking full advantage of Hyper-Threading, I have two recommended articles talking about pipeline design: * Fundamental Concepts of Parallel Programming by Shameem Akhter and Jason Roberts. * The Challenges of Developing Multithreaded Processing Pipelines by Ryan Bloom

