Applying TOC: Cause and Effect Relation and Focusing Steps
By doing this decomposition exercise, we are simply identifying the contributors of the stall cycles in the application and we are indeed building the cause and effect relationship which in TOC is called as Current Reality Tree. The components are called 'UnDesirable Effects' (UDE)  and the UDEs are symptoms resulting from a to-be-determined cause.
The analysis and decomposition exercise can easily be performed by the help of Vtune analyzer as mentioned earlier. The Vtune analyzer will help the developers to identify the functions and source code segments causing these events. The outcome of the analysis should give an idea of what causes the stall cycles, which functions and code segments (with source line information) contribute to execution to be constrained. Once the major constraint is identified (i.e, biggest contributor to the stall cycles), the developer needs to look into removing or reducing the conditions causing the problems.
To identify the few things that need to be optimized, you should rely on cause and effect relationships. As you can imagine, there could be multiple symptoms because of the same cause or there could be multiple causes for the same symptom. The fewer the contributors that cause the problem(s) the more powerful and focused our optimization effort will be.
Having discussed how cycle decomposition can be done, it is worthwhile summarizing how the five focusing steps of TOC can be used in the methodology mentioned above; see Table 1.
It is critical to note that after constraints are identified and removed, the next steps should be to ensure others parts of the system (i.e, front end) keep up with the execution unit (Subordinate & Synchronize) so that execution unit doesn't sit idle. If this is not ensured then the solution is only sub-optimal and prevents us to reach our goal.
Think Parallel to Elevate
The "Elevate" step needs extra attention especially nowadays given that multicore processors are replacing single core processors, and already spanning from high-end servers to desktops and laptops. The expectation is that multicore processors will sooner or later become the norm. Therefore taking advantage of parallelism in the software development will help to better take advantage of the hardware resources and utilize the benefits of Core architecture.
Parallelism will help developers take advantage of the system and processor resources. Multithreading is an effective way to leverage multi-core processors and a great way to elevate the performance of the constraints. For example, the latency introduced by long floating point related ops such as divisions and their impact can be reduced by multi-threading. The VTune analyzer can help to evaluate the impact of such code regions. The number of division instructions (DIV) and the impact of the division on the execution unit (IDLE_DURING_DIV) can be used for this purpose.
Software optimization is a method of art and requires special attention right from the beginning of any software development cycle. Like Knuth said:"… premature optimization is the root of all evil…". One needs to identify the area of interest for optimization carefully by leveraging methods mentioned above. Core architecture with its improved PMU support and with the help of VTune analyzer's event based sampling makes systematic performance analysis easier. TOC can provide a new way of looking at software optimization and identifying the constraints on any system. Multicore processors provide great opportunities to increase the performance of applications but require new ways of thinking. Parallelism can be designed and implemented to avoid many potential constraints.
Think parallel when thinking of constraints.