Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Win32 Performance Measurement Options


Initialized Counters

If you need to have your counter class initialized to meaningful values from the point of its construction, you can derive from the one you are interested in, and call its start() and stop() members in the constructor of your derived class. Alternatively, you can use the WinSTL class performance_counter_init template:

template <class C>
class performance_counter_init
  : public C
{
public:
  typedef C     counter_type;

// Conclusion
public:
  performance_counter_init()
  {
    counter_type  &counter  = *this;

    counter.start();
    counter.stop();
  }
};

which basically does this for you for any class on which you parameterize it, as in:

  performance_counter_init<tick_counter>  counter;

  some_timed_operation();

  counter.stop();

  dump(counter.get_milliseconds());

Call Costs

Any measurements on a system affect the behavior being measured. Therefore, an important characteristic of the performance classes (and their underlying timing functions) is the cost of the timing function calls. The first part of the analysis is to quantify the call costs of the functions.

Listing Eight shows the essentials of the counter_cost application. For each of the counter classes, the template test_cost() function is called, and the returned timing results, representing the total call costs, are printed to stdout.

The test_cost() function takes the form of an outer loop (which is executed twice in order to eliminate any caching effects, and the value of the second iteration is used), and an inner loop within which start() and stop() are called 1,000,000 times on an instance of the counter type being examined. The main application counter (which is an instance of highperformance_counter) measures the cost of the inner loop using the performance_counter_scope template.

Because the operating systems are on machines with widely different hardware, comparisons of the actual time costs over different systems are not meaningful. Since the call costs of GetTickCount() were lower than those of any other timing function (except GetSystemTimeAsFileTime() on XP), the results are expressed as a percentage of the GetTickCount() time on each platform to provide meaningful comparisons. The results are shown in Table 4.

The results clearly demonstrate that GetTickCount() has the lowest performance cost on all operating systems, except the single case of GetSystemTimeAsFileTime() on XP. Also clear is the fact that timeGetTime() costs between four and 69 times that of GetTickCount().


Table 4: Call cost of timing functions (as percentage of GetTickCount()).


On NT operating systems, GetSystemTimeAsFileTime() has barely any additional cost over GetTickCount(). It is also notable that GetSystemTimeAsFileTime() has a relatively better performance on later operating-system variants. However, on Windows 98, this call has an exceedingly high cost, nearly 8000 times that of GetTickCount(). QueryPerformanceCounter() has a high call cost on all operating systems, ranging from 49 to 2080 times that of GetTickCount().

The cost of GetThreadTimes() and GetProcessTimes() is very consistent over all flavors of NT operating systems (between 296 and 924 times that of GetTickCount()). Note that the figures are not shown for Windows 98, since these two functions are not implemented on 9x.

One final point is that QueryPerformanceCounter has a higher cost than GetThreadTimes()/GetProcessTimes() on single processor machines, but lower on multiprocessor machines. Presumably this is because access to the thread/system time infrastructure on multiprocessor machines requires synchronization, and that to the performance counter hardware does not.

Call Resolution

The other characteristic examined is that of the resolution of the various timing functions. Their documented resolutions are listed in Table 5. The second part of the analysis quantifies the actual resolutions of the functions.


Table 5: Resolution of timing functions.


Listing Nine shows the implementation of the counter_resolution application. For each of the counter classes, the test_resolution() template function is called, and the returned results, representing the minimum measured resolution for the counter class, are printed to stdout.

Listing Nine: Extract from counter_resolution.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_resolution.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>

#include <stlsoft_limit_traits.h>

/* ////////////////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* ////////////////////////////////////////////////////////////////////// */

template <ws_typename_param_k C>
inline ws_typename_type_k C::interval_type test_resolution(C &counter)
{
  typedef ws_typename_type_k C::interval_type interval_type;

  interval_type   min_inc = stlsoft::limit_traits<interval_type>::maximum();

  for(volatile int i = 0; i < C_ITERATIONS; ++i)
  {
    counter.start();

    // Execute a short inner loop, capping at 2048 repeats
    for(volatile int j = 0; j < (i & 0x7ff); ++j)
    {}

    counter.stop();

    interval_type   interval = counter.get_microseconds();

    if( interval != 0 &&
        interval < min_inc)
    {
      min_inc = interval;
    }
  }

  return min_inc;
}

int main(int /* argc */, char* /* argv */[])
{

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_resolution(x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

The test_resolution() function takes the form of an outer loop, which executes 100,000 times. Within that loop, an inner loop of a limited maximum 2048 iterations is executed, and its execution time measured. The minimum nonzero (since it is likely that some intervals will be reported to be 0) interval is recorded, and returned as the result of the function. The results are shown in Table 5.

The results mainly illustrate that every timing function save QueryPerformanceCounter() (between 1_s and 5_s) has a significantly lower actual resolution than stated. The three exceptions are GetTickCount() and timeGetTime() on Windows 98, and timeGetTime() on one particular dual-processor Windows 2000 machine (though the other SMP 2000 machine does not show this). In all other cases, the best resolution ranges from 10ms to 20ms.

It is also interesting to note that for most machines, the resolutions obtainable from GetThreadTimes(), GetProcessTimes(), GetSystemTimeAsFileTime(), and timeGetTime() are (roughly) equivalent to that of GetTickCount(), suggesting that all these functions derive their timing information from a common low-resolution source.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.