Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Win32 Performance Measurement Options


Win32 64-Bit Integers

The structure layouts of LARGE_INTEGER and FILETIME are such that it is safe to cast them to and from a 64-bit integer — ws_sint64_t (defined to be signed __int64 for Borland, Digital Mars, Visual C++ and Watcom compilers, and signed long long for Comeau, GCC, and Metrowerks compilers) — so long as the platform is little-endian (i.e., Intel), since the LowPart/dwLowDateTime member preceeds the HighPart/dwHighDateTime member. I do not have access to the Win32 headers for any big-endian systems (i.e., PowerPC, ALPHA), so I cannot presume that the layout of the structure members would be reversed (although I hope they would be), which would maintain the compatibility with the 64-bit integers. If they are not, then the systemtime_counter, highperformance_counter, threadtimes_ counter, processtimes_counter, and highperformance_ counter classes are not valid for systems that are not little-endian.

highperformance_counter — highperformance_counter records the LARGE_INTEGER values obtained from QueryPerformanceCounter() in its start() and stop() methods, converting to ws_sint64_t (see the previous section "Win32 64-Bit Integers"). get_seconds() is implemented by dividing the value returned from get_period_count() by the frequency (obtained from QueryPerformanceFrequency()). This frequency is hardware dependent, but is commonly the processor frequency or a small factor thereof. The _frequency() method obtains the frequency via a once-only (since the s_frequency variable is static) call to _query_frequency(). _query_frequency() is implemented such that if QueryPerformanceFrequency returns False, indicating the absence of high-performance counter support, the value returned is the maximum value for its type, so that future divisions will evaluate to 0, rather than crashing on a divide-by-zero error.

get_milliseconds() and get_microseconds() are implemented by multiplying get_ period_count() by 1000 and 1,000,000, respectively, and dividing by the frequency. In order to avoid truncation of the result when the period_count is low, or overflow when it is high, the multiplication is carried out first if overflow will not occur, and afterwards if it will.

Since it is required only for calculating the period, rather than measuring it, I employed a combination of late evaluation and statics to defer the expensive call to QueryPerformanceFrequency() until after the measurement is complete, as well as, of course, only doing it once per process. Indeed, if you only use get_period_count() — and not get_seconds(), get_milliseconds(), or get_microseconds() — then this cost is not incurred at all.

processtimes_counter and threadtimes_ counter — As well as providing the four period methods that all the other counter classes provide, these two classes also provide four corresponding methods each for kernel time and user time. processtimes_counter and threadtimes_counter record the values for kernel and user time obtained from GetProcessTimes() and GetThreadTimes(), respectively, in their start() and stop() methods into the m_kernelStart, m_kernelEnd, m_userStart, and m_userEnd members (casting in the same way as in systemtime_counter). In addition to this being a finer-grained level of measurement, the figures obtained from this class are not affected by other processes, which is the case for the three other classes. (I have included a program, counter_isolation, in the archive that demonstrates this behavior for the threadtimes_counter class).

The thread/process handles are specified as the current ones, via GetCurrentThread()/ GetCurrentProcess(). threadtimes_counter records the current thread handle in a member variable to avoid unnecessarily repeating this small but nonzero cost. processtimes_ counter uses a static technique such that GetCurrentProcess() is called only once per process. The creation time and exit time values obtained from GetThreadTimes()/GetProcessTimes() are ignored. (They are, in fact, fixed values, and the exit time is not actually valid until the given thread has exited.)

These two classes have the following attribute methods in addition to those they share with the three other classes:

...

  // Attributes
  public:
    ...

    interval_type get_kernel_period_count() const;
    interval_type get_kernel_seconds() const;
    interval_type get_kernel_milliseconds() const;
    interval_type get_kernel_microseconds() const;
    interval_type get_user_period_count() const;
    interval_type get_user_seconds() const;
    interval_type get_user_milliseconds() const;
    interval_type get_user_microseconds() const;

    ...
};

get_kernel_period_count() and get_user_ period_count() are implemented as returning the difference of the kernel members and user members, respectively. The implementation of get_period_count() is as the sum of get_kernel_period_count() and get_user_ period_count(). The calculations of all the seconds, milliseconds, and microseconds are performed in the same way as those of systemtime_counter.

counter_scope — The similar public interface to each class facilitates the use of a scoping template class, performance_counter_scope (shown in Listing Seven) — implementing the "Resource Acquisition Is Initialization" idiom — which may be parameterized on a particular counter class. The constructor takes a reference to a counter class instance, and then calls start(). stop() is called in the destructor, providing a scoped timing operation. It also provides access to the stop() method in order to support intermediate staged timings, and a reference to const of the managed counter class such that intermediate timing values can be obtained. An example of its use is shown in Listing Eight.

Listing Seven: Extract from winstl_performance_counter_scope.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_performance_counter_scope.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// class performance_counter_scope
template <ws_typename_param_k T>
class performance_counter_scope
{
public:
    typedef T                               counter_type;
    typedef performance_counter_scope<T>    class_type;

public:
    ws_explicit_k performance_counter_scope(counter_type &counter)
        : m_counter(counter)
    {
        m_counter.start();
    }
    ~performance_counter_scope()
    {
        m_counter.stop();
    }

    void stop()
    {
        m_counter.stop();
    }

    // This method is const, to ensure that only the stop operation 
    // (via performance_counter_scope::stop()) is accessible 
    // on the managed counter.
    const counter_type &get_counter() const
    {
        return m_counter;
    }

// Members
protected:
    T   &m_counter;

// Not to be implemented
private:
    performance_counter_scope(class_type const &rhs);
    class_type const &operator =(class_type const &rhs);
};
Listing Eight: Extract from counter_cost.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_cost.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>
#include <winstl_performance_counter_scope.h>
#include <winstl_performance_counter_init.h>

/* //////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* //////////////////////////////////////////////////////////// */

typedef highperformance_counter application_counter_type;

template<
            ws_typename_param_k C1
        ,   ws_typename_param_k C2
        >
inline ws_typename_type_k C1::interval_type
test_cost(C1 &app_counter, C2 &counter)
{
  for(int i = 0; i < 2; ++i)
  {
    performance_counter_scope<C1>   scope(app_counter);

    for(int j = 0; j < C_ITERATIONS; ++j)
    {
      counter.start();
      counter.stop();
    }
  }

  return app_counter.get_milliseconds();
}

int main(int /* argc */, char* /* argv */[])
{
  performance_counter_init<application_counter_type>  app_counter;

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_cost(app_counter, x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

The original proprietary implementations of the performance classes called their start() methods in their constructors, as well as initializing their member variables, as a syntactic convenience, such that the following would produce meaningful results:

  performance_counter counter;

  some_operation();
  counter.stop();

  printf("...", counter.get_xxx());

However, the observed use of these classes — in almost all cases — along with the strong requirement for them to be as efficient as possible, has shown this to be a mistake. Because instances are often used in a number of start()-stop() cycles, as can be seen in the test program, having start() called in the constructor complicates the semantics for no net benefit. Nor does it ensure that the instance has a coherent state, since only when a subsequent stop() call is made do the attribute calls have well-defined behavior (see the section "Initialized Counters").

Performance Analysis

The test scenarios described here were executed on the following platforms: Windows 98 (233 MHz), NT 4 (400 MHz), 2000 (650-MHz laptop), 2000 (dual 550 MHz), NT 4 (dual 933 MHz), 2000 (dual 933 MHz), and XP (2 GHz). (All program code and supporting files are included in the archive, along with Visual C++ 6 and Metrowerks CodeWarrior 8 projects.)


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.