Parallel Loops Require In-Depth Concurrency Knowledge
The new Parallel Extensions offered by .NET Framework 4 Beta 1 present developers the opportunity to use the new parallel loops. Using them, it is easier to distribute tasks in many cores. However, you don't have to forget about concurrency issues.
"Everything should be made as simple as possible, but not simpler", Albert Einstein.
In the last three months, I've been asked these questions more than fifty times:
• Why doesn't this loop converted to a Parallel.For run faster than its sequential version?
• Why does this loop converted to a Parallel.For produce errors? The sequential code runs OK.
• I converted all my loops to Parallel.For. My application doesn't work anymore. Parallel.For doesn't work. Why?
• Multicore programming is very easy. I've replaced all for loops with Parallel.For. However, my "optimized" applications run slower than the sequential version. Why?
• I had a three-level for loop and I replaced all of them with Parallel.For. The algorithm is CPU intensive. However, in my quad-core CPU, it takes more time to complete than the previous version. Why?
I could go on adding more similar questions.
There are other popular versions of these questions with a Parallel.ForEach instead of a Parallel.For.
Most of the times, a sequential algorithm running in loops isn't prepared for concurrency because it was designed to run alone. It isn't prepared to share variables and states with concurrent tasks or threads. It isn't organized to avoid certain concurrency nightmares. Thus, if you just replace a for with a Parallel.For, you are transforming a sequential algorithm into a parallelized one. However, in order to make sure the algorithm is going to be efficient while producing accurate results, you have to redesign your code to take into account the new concurrency introduced by this parallelized loop.
The most common behavior for a parallelized loop is to create as many tasks as available logical cores and automatically distribute the work to be done. These tasks will run stealing work from threads. They will run concurrently, according to the decisions taking by the operating system scheduler. You'll have to prepare the code in the loop to be able to run concurrently without generating side effects. Besides, the loop has to produce an accurate result under different concurrency situations. As you can see, it isn't as simple as just replacing a loop definition structure.
One of the worst things that could happen in a development team is to try to parallelize an algorithm using a single core microprocessor (just one hardware thread). If you replace all the for loops with Parallel.For and all the foreach loops with Parallel.ForEach, it is very probable to have an application running as expected in a single core microprocessor, as no concurrency happens. The Parallel.For and Parallel.ForEach won't run code concurrently when .NET Framework doesn't find more than one logical core (hardware thread). Therefore, you must be extremely careful when using this parallelized loops and testing your algorithms.
Please, don't try to create parallel code using single core microprocessors as your main development environment. It's really error-prone. There are tools designed to detect concurrency errors without needing the exact number of cores in which the application is going to run. However, it is very important to make some concurrency happen.
Remember, you have to understand many important concurrency issues before using parallelized loops and other structures that simplify the parallelization of code snippets.
I've focused on the new parallel loops offered by .NET Framework 4 Beta 1. However, the same happens with many other loop parallelization techniques found in other programming languages, frameworks and parallel libraries.
As always happens in software development, there is no silver bullet…
Parallel Pattern 5: Stencil
All memory addresses used for reads are expressed as offsets
Distributing Work Across Cores Using .NET
A roll-your-own ThreadPool implementationLooking For The Lost Packets: Part 2
Looking For The Lost Packets: Part 1
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
February 18, 2010
Lock Contention, Using Intel Parallel Studio to Improve Performance
Speaker: Vasanth Tovinkere, Software Engineer, Intel Corporation (Bio)Vasanth Tovinkere is a software engineer in the Developer Products Division (DPD) at Intel. His current role involves defining novel approaches to understanding and visualizing parallel performance and consulting with strategic customers to help them prepare and deliver code for the multicore world. Vasanth has been involved in the development of automatic semantic event detectors for digital sports technologies in Intel Labs. He also has been awarded three patents and has two patents pending.
Abstract:
Discover how easy it is to use the power of Microsoft Visual Studio and Intel Parallel Studio to find performance issues due to lock contention in threaded applications. This ensures that shipped applications can take better advantage of multicore processors. In this webcast, we provide live demonstrations that show how to identify lock contentions issues with Visual Studio and Intel Parallel Studio, an add-in to Visual Studio that helps developers create fast, reliable code on multicore processors.t.



