In the preceeding section, two main optimizations to SMOKE were described. First the underlying data structures for object change notifications and the way worker threads accessed and processed these notifications were restructured. This gave roughly a 12-15% overall performance improvement. The other optimization was the inclusion of the Intel TBB task scheduler to improve the load balance of the parallel execution. Prior to rework, the CPU load on a 2x 4 core machine  was in the range of 55% to 65%. After rework the load on the cores was in the 90% to 95% range. Frame rate improvements due to rework also improved by roughly 45% to 60%.
Summary and Conclusions
In summary, straightforward use of Intel Thread Profiler identified that:
- The code spent a noticeable amount of time undersubscribed
- A significant amount of serialization existed in the main computational loop
- The concurrency levels did not change from iteration to iteration of the main computational loop
- Under subscription occurred as a result of synchronization between iterations
- Under-subscription was root caused to two functions responsible for object change notifications.
Examination of the source code pointed the way for the functions to be restructured and parallelized with limited points of synchronization. However, the resulting code still suffered from load imbalance. The Intel TBB task scheduler was used to improve the overall CPU load and the balance of the code. This work has demonstrated the effectiveness of the Intel Thread Profiler in conjunction with the Intel TBB library to achieve performance improvements to the SMOKE gaming demo code. One future opportunity for performance gain could be to examine the use of Intel TBB's affinity partitioner. Other avenues have been suggested and discussed in . All of these considerations are expected to apply to gaming codes in general with only limited specificity if any to SMOKE itself.
We acknowledge the generous support received from Intel's Developer Products Division, Visual Computing Software Division, and Software Solutions Group during the course of this work and the writing of this article.
 SMOKE Game-Technology Demo, Intel Software Network, http://software.intel.com/en-us/articles/smoke-game-technology-demo/.
 James Reinders, Intel Threading Building Blocks, O'Reilly Media.
 Robert D. Blumofe and Charles E. Leiserson, "Scheduling Multithreaded Computations by Work-Stealing," in Proceedings of the 35th Annual IEEE Conference on Foundations of Computer Science.
 Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall and Yuli Zhou, "Cilk: An Efficient Multithreaded Runtime System," in Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '95).
 Michael Voss, "Demystify Scalable Parallelism with Intel Threading Building Block's Generic Parallel Algorithms," Jupiter Media.
 Michael Voss, "Enable Safe, Scalable Parallelism with Intel Threading Building Block's Concurrent Containers," Jupiter Media.
 Bradley Werth, "Optimizing Game Architectures with TBB".
 The system used for testing was an Intel X5355: 2x4, 2.66 GHz, 8G RAM, Windows XP x64 Pro SP2, GeForce 8800 GTX. http://en.wikipedia.org/wiki/Xeon#5300-series_.22Clovertown.22.