Channels ▼

Gaston Hillar

Dr. Dobb's Bloggers

Dive Deeper than the CPU Utilization Graph to Check Efficiency

November 30, 2010

The CPU utilization graph provides important information that allows you to detect a load imbalance problem when parallelized code runs on a multicore CPU. However, a sustained high CPU load for all the cores doesn't mean that your parallelized code is efficient.

The main goal of a parallelized algorithm is to translate multicore power into application performance. You parallelize your algorithms because you want to run code on all the available cores. You want to take advantage of the horse power offered by modern multicore CPUs. When you expect an algorithm to use all the available cores in any Windows version, it is very common to check the CPU Usage History graph shown by Windows Task Manager.

If you use the option that allows you to see one graph per CPU, Windows Task Manager displays one graph per logical core or hardware thread. If each graph displays a sustained high CPU utilization value, it means that the algorithm is running code in all the available cores. However, this high CPU utilization might represent an unnecessary overhead added by the parallelization process.

In "Boosting Performance with Atomic Operations in .NET 4" I showed a simple example that demonstrated the importance of considering atomic operations when you want to achieve the best performance for a parallelized algorithm. This example is also useful to understand the importance of diving deeper than the CPU utilization graph to check the efficiency of the parallelization process.

Visual Studio 2010 Premium or Ultimate versions allow you to visualize the behavior of a multithreaded application. If you launch the concurrency profiling method for the lock version, Visual Studio will allow you to visualize the degree of parallelism in your application on the CPU utilization graph. If you click on CPU Utilization, the average CPU utilization value for the process could lead you to draw wrong conclusions. The next screenshot shows an average CPU utilization of 85% when the code runs in a computer with a quad-core CPU.

[Click image to view at full size]
Figure 1. The CPU utilization graph for the code that uses unnecessary locks.

The degree of parallelism in the application seems to be excellent. The application is running code on all the available cores. However, the application is running unnecessary synchronization code, and therefore, the algorithm has inefficient parallelized code. The application required 38,972 milliseconds to run, with the profiler running in the background.

If you switch to the Threads view, and you check the Synchronization blocking profile, you will realize that System.Threading.Monitor.Enter is responsible of 22,662.51 milliseconds of exclusive blocking time. The next screenshot shows the valuable information provided by the Synchronization blocking profile report within the Threads view:

[Click image to view at full size]
Figure 2. The Synchronization blocking profile provides valuable information about the exclusive blocking times.

The lock keyword calls System.Threading.Monitor.Enter to acquire the mutual-exclusion lock. Each time that the code calls System.Threading.Monitor.Enter, the application consumes CPU cycles. However, because the lock isn't necessary, these CPU cycles waste CPU horse power.

If you launch the concurrency profiling method for the atomic operations version, the new average CPU utilization value is usually lower than the value shown for the locks version. The next screenshot shows an average CPU utilization of 69% when the code runs in a computer with a quad-core CPU.

[Click image to view at full size]
Figure 3. The CPU utilization graph for the code that uses atomic operations instead of unnecessary locks.

The degree of parallelism in the application seems to be lower than the previous version. However, the algorithm is more efficient because the application required less time to run. The application required 16,056 milliseconds to run, with the profiler running in the background. The average CPU utilization is lower than the value shown by the locks version but the atomic operations version requires less time to run. You don't want to waste CPU cycles. You just want your application to run faster while providing correctness, and to scale as the number of cores increases.

If you switch to the Threads view, and you check the Synchronization blocking profile, you will realize that there are just 1.09 milliseconds of exclusive blocking time, caused by System.Threading.Tasks.Parallel.For. The next screenshot shows the valuable information provided by the Synchronization blocking profile report within the Threads view:

[Click image to view at full size]
Figure 4. The Synchronization blocking profile provides valuable information about the exclusive blocking times.

When you worked with serial code running on single-core CPUs, a sustained high CPU load didn't mean that your code was efficient. The same happens in the multicore world. Profiling tools are very useful to allow you to detect inefficient code. The Concurrency Visualizer introduced in Visual Studio 2010 Premium or Ultimate versions provides valuable information about the behavior of a multithreaded application. However, remember to dive deeper than the CPU Utilization Graph.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video