Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

Introduction to Power Debugging


Poor Man's Performance Timing

There have been times that I've written some code that I would have liked to get performance timing on. There have even been times when I wanted to get performance of an application that I didn't have the source for. The application may have been crunching numbers or talking with hardware and through debugging I had determined the locations I needed to measure. I just needed a way that I could do the measuring.

It is easy when you have the source to just modify the code with QueryPerformanceCounter and OutputDebugString and then recompile. After all, I just want an estimate and the code I'm attempting to measure isn't a few instructions but rather a more complex operation. The performance will always vary depending on the machine and the system load so to the nanosecond accuracy isn't necessary.

The flaw with modifying the code is that it would only work for that piece of code. It wouldn't work with applications I don't have the source to and I would be required to modify all locations that I want to time. I could buy a tool, but perhaps the tool is overkill for what I need and I don't want to spend the money. What about writing a debug extension?

The real question is how much accuracy is lost when using the debug extension as opposed to modifying the source code. In order to find out, we need to measure the timing of some simple code, shown in Example 1.

for(uiIndex = 0;uiIndex < g_Loops; uiIndex++)
{
	 dSampleX *= dSampleY;
}
Example 1: Double precision multiplication in a loop.

The data shown in Table 1 is a comparison of two different methods to retrieve performance data. The first listed is "QPC" which is short for "QueryPerformanceCounter" API. This is an abstraction over an operating system-implemented method for querying a performance counter. The second is RDTSC and while the first method could possibly use RDTSC as well, there would be some more overhead involved over using the instruction yourself. The first method, however, could be implemented in a variety of methods including ACPI hardware ports.

Modified Source Debug Extensions
# of Loops QPC RDTSC QPC RDTSC
10 0.006705 ms 0.005133 ms 0.569346 ms 0.564379 ms
100 0.051403 ms 0.049727 ms 0.633879 ms 0.604160 ms
1000 0.498667 ms 0.495724 ms 1.074438 ms 1.042093 ms
10000 5.010413 ms 4.993571 ms 5.577245 ms 5.498423 ms
100000 50.437975 ms 50.366829 ms 51.285289 ms 49.725790 ms
1000000 508.223811ms 507.440129 ms 508.854617 ms 495.261641 ms
Table 1: Timing comparisons

Accuracy and precision are the two factors that make up performance timing. Accuracy is how close your numbers are to the actual time and precision is the interval in which you are measuring. In the above, our precision is milliseconds and when doing performance monitoring via a debug extension I would only suggest to measure items that would require no less than a millisecond resolution.

Modified Source Debug Extensions
QPC RDTSC QPC RDTSC
Overhead 0.006705 ms 0.000051 ms 0.435530 ms 0.421219 ms
Table 2: Performance-monitoring overhead

The data listed in Table 2 is the performance timing of doing nothing and essentially illustrates how much overhead each method typically has. Example output is shown in Figure 3 when using the RDTSC debug extension.

[Click image to view at full size]

Figure 3: Using the RDTSC debug extension.

The following are some of the key factors when interpreting the performance numbers:

  1. The Hardware — The hardware is one of the factors in determining the speed of a segment of code. The performance monitoring I performed was using a Pentium 4M 1.8 GHz laptop.
  2. The Machine Load — The system load will affect the outcome of performance numbers. You can show this by moving your mouse or clicking on an application while performance numbers are being displayed. You can attempt to be more accurate by increasing the priority of the timing thread up to "real time," however depending on what that thread is doing, you may end up skewing the results or locking up the system. The thread being timed may be doing other work for I/O on another thread for example.
  3. What is being timed — The other factor is essentially what is being timed. If you are timing a hardware operation, for example, context switching may skew your results less, as your thread will essentially go into a "WAIT" period. The hardware may then crunch or move the data independent of the CPU and when it's finished the thread will be scheduled again. In an example of straight number crunching however as demonstrated here, a context switch will interrupt the processing and thus is more likely to skew the final time with these interruptions.

In my timing you will note that I generally did not get one straight time for any of these samples. In fact they can vary by several milliseconds due to context switching, for example. You will notice that as the operation you are timing becomes longer the less effect the overhead of using a debug extension skews the results. In fact you may notice that in some cases the result was actually lower with the debug extension than the modified code! This may be a result of the debug extension running in the context of another thread and as such not sharing its time slice with the thread being timed.

This example demonstrates that if you are looking simply for estimates within a few milliseconds then a debug extension may just do the trick. However, if you need more accurate timing then a more isolated environment using increased thread priorities and possibly specialized tools may be a more appropriate solution.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.