Channels ▼
RSS

Tools

How The Theory of Constraints Can Help Software Optimization



Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
--Donald Knuth (adapted from C. A. R. Hoare)

The Theory of Constraints (TOC) is a philosophy of continuous improvement which is defined as a framework that can be applicable to many disciplines. The theory is based on the idea that in any complex system, there is usually one aspect of that system that limits its ability to achieve its goal or optimal functioning. To achieve any significant improvement of the system, the constraint must be identified and resolved.

Software optimization, in the generic sense, is the process of improving the software by eliminating bottlenecks so that it operates more efficiently on a given hardware and uses resources optimally. In keeping with Knuth's advice, TOC can help developers to identify the bottlenecks of their code for optimization that have the most return on investment (ROI). Identifying the bottlenecks in the target application and eliminating them appropriately is the key to an efficient optimization.

There are many optimization methodologies, which help developers to answer the questions of why to optimize, what to optimize and how to optimize, and these methods aid developers to reach their performance requirements. These questions are not any different than the questions asked to improve an organization, supply chain management, or manufacturing.

In this article, I propose that TOC, which is a philosophy of management and improvement, can be also used and applied for software optimization. The focus of this article is how to identify the bottlenecks and eliminate them with the help of Intel VTune Performance Analyzer while following the framework that is described by Theory of Constraints.

Introduction to Software Optimization: Hotspots and More

Performance tuning usually focuses on reducing the time it takes to complete a well-defined workload. Performance events can be used to measure the elapsed time and therefore reducing the elapsed time of completing a workload is equivalent to reducing measured processor cycles (clockticks).

Intel VTune Performance Analyzer is a powerful tool for software developers who are interested in performance analysis. The easiest way to identify the hotspots in a given application is to sample the application with processor cycles and VTune analyzer helps the developer identify where most of the CPU cycles are spent, in addition to many other processor events1, by utilizing two profiling techniques; sampling and call graph. The sampling can be in two forms: processor event based and time based sampling. Event-based sampling (EBS) relies on the performance monitoring unit (PMU) supported by the processors. From this point forward, I refer to event-based sampling (EBS) as "sampling."

When a sampling activity used, VTune analyzer by default uses processor cycles and instructions retired to analyze the application. (Recent generations of Intel 64 and IA-32 processors feature microarchitectures using an out-of-order execution engine. They are also accompanied by an in-order front " instructions retired.") The count of cycles, also known as "clockticks," forms the fundamental basis for measuring how long a program takes to execute. The total cycle measurement is the start to finish view of total number of cycles to complete the application of interest. In typical performance tuning situations, the metric Total cycles can be measured by the event CPU_CLK_UNHALTED.CORE.

The other event used by default sampling activity is instructions retired. This event indicates the number of instructions that retired or executed completely. This does not include partially processed instructions executed due to branch mis-predictions. The ratio of clockticks (non-halted) and instructions retired is called clocks per instruction retired (CPI) and it can be good indicator of performance problems (indicator of the efficiency of instructions generated by the compiler and/or low CPU utilization). (The Intel Core microarchitecture is capable of reaching CPI as low as 0.25 in ideal situations. The greater value of CPI for a given workload indicates that there are more opportunities for code tuning to improve performance.)

The approach I use here is explained in greater details in Intel 64 and IA-32 Intel Architecture Optimization Reference Manual. In this approach, it is accurate to say that the total number of cycles an application takes is the sum of the cycles issuing μops4and the cycles not issuing μops (stalls). This formula is given with the processor event names in Formula 2.


Total Cycles = Cycles issuing μops + Cycles not issuing μops 
Formula 1

It is worth noting that this decomposition is approximate in its nature so it should be taken as estimation.

Therefore, my focus is on three basic concepts:

  • Minimizing "stalls" by reducing or optimizing long latency operations such as memory accesses, floating-point operations, etc.,
  • Minimizing "non-retired" instructions by reducing the branch mis-predictions,
  • Minimizing "retired" instructions.

In Intel Core microarchitecture (Figure 1), up to four μops may be retired per cycle and 6 μops6 can be executed. The results of Intel 64 and IA-32 instructions must be committed in original program order before they are retired. Reducing the number of retired instructions will improve the execution time as will reducing the stalls.

Cycles wasted due to stalls indicate sub-optimal execution; therefore identifying them will be the main focus. Stalls can be categorized as"

  • Execution stalls
  • Front End stalls
  • Mispredicted branch pipeline flushing
  • Cycles wasted executing instructions that are not retired

With the introduction of Core architecture, the approach to identify and root-cause stall cycles became easier and more systematic. Before going into deeper analysis and explanation of the events, let's introduce the Theory of Constraints.

Figure 1: Intel Core Microarchitecture Pipeline Functionality

Applications Will Perform as Fast as Their Bottlenecks: Gentle Introduction to the Theory of Constraints

TOC is a philosophy of management and improvement developed by Eliyahu M. Goldratt. As mentioned earlier, it is a process of continuous improvement which focuses on the idea that in any complex system, there is usually one aspect of that system that limits its ability to achieve its goal.

The TOC processes defined by Goldratt are used to improve the health of an organization and these processes are very similar to the ones used by software optimization methodologies.

In this theory, five steps are defined. Before these steps are applied, a goal must be articulated. In our case, this is improving the performance of our application or meeting a performance requirement. Having articulated the goal, the steps can be given as:

  1. Identify the constraint (the root-cause that prevents achievement of the goal)
  2. Decide how to exploit the constraint (make sure the constraint identified is doing what it is designed to do. i.e: execution unit is executing but not stalling)
  3. Subordinate and synchronize everything else to the above decisions
  4. Elevate the performance of the constraint (increase capacity of the constraint. i.e: taking advantage of multiple cores, parallelism)
  5. If in any of the above steps the constraint has shifted, go back to Step 1.

The most critical piece in these steps is identifying the constraints. The questions to ask are:

  • "What to change?",
  • "To What to Change?" and
  • "How to Change?"

The TOC defines processes called thinking processes to answer these questions and provides a problem-solving framework based on cause and effect. The basic idea while trying to reach our goal (desired performance) is to identify the hotspots, understand the underlying cause and effect relation and eliminate the causes. If we take the four major stall categories given earlier as an example, we can say that most of the time execution stalls are the cause of performance problems (effect) and resource related stalls. While sub-optimal algorithms are the cause of execution stalls (effect).

In the rest of the article I show how you can identify the constraints preventing optimal application performance by using the TOC methodology as a guideline for software optimization on Core architecture.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video