4 Conclusion
Intel Parallel Composer provides some unique new methods to express parallelism in C++ applications. Each has unique benefits for different use-cases. As you have seen in this article, some simple modifications to implement OpenMP or parallel spawn/finish, or Intel TBB can lead to some impressive performance and scaling gains through multi-threading.
Many algorithms contain optimizations that benefit from serial execution but introduce dependencies that inhibit parallelism. It is often possible to remove such dependencies through simple transformations to make use of any of the above mentioned approaches. It is important to choose an appropriate number of threads to minimize overhead due to thread creation. Creating too many threads hurts performance for many reasons, including increased system overhead, decreased granularity, increased lock contention. In order to avoid race conditions during the execution of a threaded application, mutual exclusion to shared resources is required to allow a single thread to access and change the state of shared resources. The shared resource can be a data structure, or memory in the address space. Minimizing synchronization overheads is a critical to application performance.
Experiment with the sample code provided with the Intel Parallel Composer. Refer to the product documentation or product website for more information: http://www.intel.com/software/products/.
4.1 References
[1] Hoffman, E.J., Loessi, J.C. and Moore, R.C. (1969): Constructions for the Solution of the m Queens Problem, Mathematics Magazine, p. 66-72.